4.4 Comments
Comments are to your program as the Three Bears' furniture was to Goldilocks; too few aren't good enough, and more are not better. Ideally, code should not need commenting; for example, instead of saying:
# If employee was hired more than a week ago:
# get today's week number, convert the employee's
# hire date using the same format, take difference.
use POSIX qw(strftime);
use Date::Parse;
chomp(my $t1 = 'date +%U');
my @t2 = strptime($employee->hired);
if (strftime("%U", @t2) - $t1 > 1)
say:
if (time - $employee->hired > $ONE_WEEK)
by storing times in UNIX seconds-since-epoch format and defining $ONE_WEEK earlier as 60 * 60 * 24 * 7. This is not only easier to read but lacks the bug in the previous code. That tortured tangle of pendulous programming may look absurd, but I have often seen worse. A developer is used to getting a date from the date command in shell scripts, does the same in a Perl program, stores the result in the same format, and it goes downhill from there.
Sometimes you can't make the code clear enough. When you add a comment the word to keep in mind is, Why? Don't just recapitulate what the code does, but say why you're doing it. This doesn't need to be a verbose justification, but can be as simple as this:
if (time - $employee->hired > $ONE_WEEK) # Past probation?
Although then you have to ask why it wasn't written as:
if (time - $employee->hired > $PROBATION_PERIOD)
So a better example would be something like:
if (my ($name) = />(.*?)
which indicates some hard-won piece of knowledge.
4.4.1 Sentinel Comments
Sometimes you need to put a reminder in your code to change something later. Perhaps you're in the middle of fixing one bug when you discover another. By putting a distinctive marker in a comment, you have something you can search for later. I use the string "XXX" because gvim highlights it especially for this purpose.[5] (I suppose it also makes the program fail to pass through naive adult content filters.)
[5] Even though I use Emacs most of the time; it helps to hedge your bets in the perennial "Favorite Editor War."
4.4.2 Block Comments
People often ask how they can comment out a large piece of code. Wrapping it in an if (0) {...} block prevents it from being run, but still requires that it be syntactically correct.
Instead, wrap the code in Plain Old Documentation (POD) directives so it's treated as documentation.[6] There are a number of ways of doing this; here I've used =begin:
[6] See Section 10.3.1.
=begin comment
# Here goes the code we want
# to be ignored by perl
=end comment
=cut
This will still be interpreted as POD, though. However, in this case, POD formatters should ignore the block because none of them know what a block labeled "comment" is. I needed the =end directive to keep the POD valid (=begin directives should be matched), and I needed the =cut directive to end POD processing so perl could resume compiling the following code. Remember that POD directives need blank lines following them because they are paragraph-based. I'll go into what POD can do for your program in more detail in Chapter 10. For a more creative solution to the multiline comment problem, see http://www.perl.com/pub/a/2002/08/13/comment.html.
4.5 Restyling
What should you do when you're presented with an existing program whose author didn't follow your favorite style guideline and whose layout looks like a Jackson Pollock painting?[7]
[7] Make than a Pollock imitation. Uri Guttman referred me to an article showing how genuine Pollocks exhibit fractal ordering: [TAYLOR02].
Several attempts have been made at writing beautifiers, but the clear winner as of publication time is perltidy,[8] by Steve Hancock. You can also download perltidy from CPAN. Use Perl::Tidy as the module name to install.
[8] http://perltidy.sourceforge.net/
perltidy writes beautified versions of its input files in files of the same name with .tdy appended. It has too many options to document exhaustively here, but they are well covered in its manual page. Its default options format according to the perlstyle recommendations. My personal style is approximately equivalent to:
perltidy -gnu -bbt=1 -i=2 -nsfs -vt=1 -vtc=1 -nbbc
As good as it is, perltidy can't make the kinds of optimizations that turn the layout of a program into a work of art. Use it once to get a program written by someone else into a "house style," but don't let your style be a straightjacket. For instance, the preceding settings generated this formatting:
return 1 if $case_sensitive[length $word]{$word};
return 1 if $case_insensitive[length $word]{lc $word};
But I could further beautify that code into:
return 1 if $case_sensitive [length $word]{ $word};
return 1 if $case_insensitive[length $word]{lc $word};
Note that you can separate subscripting brackets from their identifiers by white space if you want. In fact, Perl is incredibly permissive about white space, allowing you to write such daring constructs as:
$ puppy = ($dog ->offspring)[0];
$platypup = ($platypus->eggs) [0];
although whether you really want to mix left and right justification without any, er, justification, is questionable.
4.5.1 Just Helping Out?
If you are modifying code that belongs to someone else, but not assuming responsibility for it, follow its existing style. You'll only annoy the author by "helpfully" reformatting their code. That's equivalent to going into the bathrooms at a house you've been invited to and changing the way all the toilet paper rolls are hung.
4.6 Variable Renaming
Easily vying with misindendation for unintentional obfuscation is cryptic variable naming. If you look at a piece of code and can't tell what a variable is for, it has the wrong name. A brief mnemonic is usually sufficient: $func is as good as $function, and unless there are other function references nearby, as good as $function_for_button_press_callback. On the other hand, $f is not good enough.
Names like $i and $n are fine if their variables enjoy the commonly accepted usages of an index and a count, respectively, but only over a short scope. Single-letter or otherwise cryptic names can otherwise be acceptable if they are recapitulating a formula that is well-known or included in official documentation for the program, for example:
$e = $m * $c ** 2;
# Weighted average contribution, see formula (4):
$v_avg = sum(map $v[$_] * $v_w[$_] => 0..$#v) / sum(@v_w);
However, single-character variable names should be confined to small scopes because they are hard to find with an editor's search function if you need to do that.
Don't use $i, $j, and so on, for indices in hierarchical data structures if the structures are not truly multidimensional; use names representative of what's stored at each level. For instance:
$level = $voxels[$x][$y][$z];
Okay; these really are x, y, z coordinates.
$val = $spreadsheet[$i][$j];
Okay; accepted usage.
$val = $spreadsheet[$row][$col];
Better; more common terms.
$total += $matrix[$i][$j][$k];
Fine, if we don't know any more about @matrix (perhaps this is in a multidimensional array utility method).
$e = $o{$i}{$j}{$k};
Bad; you have to look up what those variables mean.
$employee = $orgchart{$division}{$section}{$position};
Much clearer.
$emp = $org{$div}{$sec}{$pos};
Equally useful; the variables are sufficiently mnemonic to suggest their meaning immediately.
perlstyle has some excellent recommendations for formatting of variable names:
Use underscores for separating words (e.g., $email_address), not capitalization ($EmailAddress).
Use all capital letters for constants; for example, $PI.
Use mixed case for package variables or file-scoped lexicals; for example, $Log_File.
Use lowercase for everything else; for example, $full_name. This includes subroutine names (e.g., raid_fridge) even though, yes, they live in the current package and are not lexical.
Subroutines that are internal and not intended for use by your users should have names beginning with an underscore. (This won't stop the users from calling the subroutines, but they will at least know by the naming convention that they are doing something naughty.)
4.6.1 Symbolic Constants
I seldom use the constant pragma, which allows you to define symbols at compile time; for instance:
use constant MAX_CONNECTIONS => 5;
.
.
.
if ($conns++ < MAX_CONNECTIONS) ...
It has some advantages:
You cannot overwrite the value assigned to the symbol. (Well, not easily: The constant is actually a subroutine prototyped to accept no arguments. It could be redefined, but you wouldn't do that by accident.)
The use of a constant is (usually) optimized by the Perl compiler so that its value is directly inserted without the overhead of a subroutine call.
The symbols look constant-ish; identifiers beginning with a sigil ($, @, %) tend to make us think they are mutable.
It also has some disadvantages:
Constants are visually indistinguishable from filehandles.
A constant can't be used on the left-hand side of a fat arrow (=>) because that turns it into a string. You have to put empty parentheses after the constant to ensure that Perl parses it as a subroutine call.
The same is true of using a constant as a hash key.
Constants do not interpolate into double-quoted strings, unless you use the hair-raising @{[CONSTANT_GOES_HERE]} construction.
Being subroutines, they are package-based and cannot be confined to a lexical scope; but they go out of scope when you have a new package statement (unless you fully qualify their names).
Being subroutines, they are visible from other packages, and are even inheritable. You may consider this an advantage if you wish but it is not intuitively obvious.
So I tend to define my constants as just ordinary scalars or whatever they might be:
my $PI = 3.1415926535897932384;
my @BEATLES = qw(John Paul George Ringo);
Purists argue that constants should be unmodifiable, but Perl's central philosophy is to let you assume the responsibility that goes along with freedom. Anyone who changes the value of p deserves the consequences that go along with altering the fundamental structure of the universe. On the other hand, you might have good reason at various times to add 'Stuart' or 'Pete' to @BEATLES.
Monday, December 10, 2007
Rewriting (Transplants)
< Day Day Up >
4.1 Strategizing
Perhaps the most important question you will have to answer to begin with is this: To what extent should you rewrite the code? Even if at first it looks far too long to replace, rewriting from scratch is an option you should pursue if at all possible for a host of reasons:
It allows you to psychologically "own" the code: It's no longer someone else's code; it's yours, because you typed it.
You'll be far more familiar with the code, because people remember things much better if they've written them instead of merely having read them.
The code may not need to be that long; the original code may have been accidentally or deliberately overelaborated and maybe there's a slim program in there just waiting to be liberated from the adiposity encasing it.
You'll be able to create tests incrementally as you write each new function.
Perhaps the program started as either a prototype or an application designed to handle a much smaller problem, but it became a victim of its own success and accreted new functionality as a hermit crab accumulates barnacles (the difference being that the crab can still get around). You can recode to fit the new, more complete requirements.
Similarly, the program may contain vestigial code that hasn't been used in years. If you rewrite, you'll automatically exclude it without having to hunt it down.
You can incorporate the latest technology in the new program. In particular, advanced modules are being created now at a dizzying pace; quite possibly large parts of the original code duplicate functionality that can now be left to the module maintainers.
Remember, it'll almost certainly take longer to figure out the code than you think. Consider whether increasing your estimate of the size of the task changes your mind about whether to rewrite or not.
< Day Day Up >
< Day Day Up >
4.2 Why Are You Doing This?
The goals of rewriting are similar to the goals of developing a program from scratch. You must consider these questions: Who are you writing for? How do the demands on your rewrite differ from the demands placed on the original code?
4.2.1 Who's the Audience?
Who is going to see your code? In some environments, code will be inspected by people who will not be involved in maintaining it, may not work in the same department, or may not even be programmers. But their reactions could nevertheless be influential. Determine before you start whether you are in such an environment and find out what you can about the reviewers' tastes. You may be better off coding to a low degree of sophistication so not to antagonize reviewers who are unable to understand more advanced code constructions.
Even if you're fortunate enough not to be incarcerated in such a politically distorted workplace, you must consider who is going to be maintaining this program. Assuming you're not attempting to ensure job security through creating code so obfuscated that no one else can decipher it, if someone else is going to assume responsibility for your program you will want to minimize the extent to which they have to bug you for help. Therefore, if you know who that person is going to be, you should keep the sophistication of your code to a level that they can assimilate. (Of course, this strategy has its limits; if you're creating an air traffic control system to be maintained by a recent high school graduate, accommodation may not be possible.[1])
[1] If you're creating an air traffic control system in Perl, please drop me a line. If your system is supposed to be maintained by teenagers, please also include a map of the areas it serves; rest assured I will put that map to good use. (To forestall the letters, the system described in [HOGABOOM03] is not used in operations.)
In the absence of more specific information, the most likely maintenance programmer for your program is, of course, yourself. Strange as it might seem at this point, many—if not most—programmers, in some subconscious fit of masochism, still code in a way that causes needless pain for themselves later on. You know who you are.
If you don't know who's going to take over your code, the question arises: What level of sophistication should you code to? There is much debating on this topic, and more than a few rancorous opinions. I believe you should code to your own level, and not try to satisfy the preferences of a purely hypothetical maintenance programmer. There abounds an absurd notion that there is an objective "best standard" to code to for maintenance, and people fritter away endless hours arguing whether a construction such as:
(my $new = $old) =~ s/old/new/;
fits this standard,[2] or whether it is clearer to the hypothetical maintainer to rewrite it as:
[2] Tom Christiansen dubbed this the en passant method of setting a variable to a transformation of an expression.
my $new = $old;
$new =~ s/old/new/;
If you're going to be the maintenance programmer, the only concern that need affect you is whether you will be able to understand the code. That is best determined by how long it took you to write it. If you write something quickly, it is more likely to be readable than something that takes hours of sweating and testing to get right.
If those hours are spent constructing a fiendishly compact brand-new idiom, the odds are you will spend even longer trying to figure it out when you come across it six months later. Leave it out for now and use more obvious code instead; if in some future programming effort the new idiom occurs to you without hours of toil, then you've hit on something useful; but more likely, you will be glad you resisted the temptation to use it.
(Writing something quickly doesn't guarantee that it'll make sense in a larger context later on, though, only that you'll understand the syntax. In other words, you may not be able to see the forest for the trees, but at least each tree will be somewhat familiar.)
If, on the other hand, you spend hours developing a small piece of code that doesn't have any apparently clearer representation, you have code that needs commenting. You understand it right now because of all the auxiliary notes that you took, the tests that you ran (which should be rolled into your testing scripts), the articles or texts that you read, and some complex analysis that you performed, none of which appear in the code. If you needed that supporting material to create the code, you'll need it to understand it, so put it in as comments. Later on I'll show how you can insert lengthier information that can be extracted into nicely formatted documents.
4.2.2 The Ten Levels of Perl Programmers
Several years ago, Tom Christiansen created a somewhat tongue-in-cheek list of different levels of Perl programming sophistication ([CHRISTIANSEN98]). With an eye toward classifying maintenance programmers, here's mine. Later on in this book, I'll occasionally refer to one of these levels when I want to indicate the degree of sophistication of some code or technique. It can be helpful to assess at what level the author of a program you are tasked with maintaining was.
Level 1:
Hasn't read any book (or anything else for that matter) on Perl; knows that it's a programming language but knows nothing else about it. Can run a Perl program written by someone else and realizes that they can change parts of the behavior—such as the content of printed strings—by editing parts of the program. Doesn't understand why changes to other parts of the program don't work; has no mental model to fit the language in, so can't differentiate the syntax from languages as diverse as COBOL and C++.
Level 2:
Understands the basic block structure syntax, although only to the extent of recognizing that it is similar to a language like JavaScript. Has a notion that blocks create some kind of scoping effect but does not know about lexical variables and has not encountered use strict or use warnings. Can change the sense of a conditional and use basic arithmetic and logical operators. Thinks that everything they need to do can be achieved by small modifications to programs written by others.
Level 3:
Wants to create programs from scratch and realizes that some kind of education is called for; asks people for book recommendations for learning Perl. Someone at this level may acquire the Camel book [WALL00] thinking that it is a tutorial, and attempt to read the whole thing, suffering severe neurological trauma in the process.
Level 4:
Learns for the first time about use strict and use warnings, but thinks they're more trouble than they're worth. Wonders what my means. Discovers that there are modules for solving just about any problem but doesn't know how to acquire and use them. Subscribes to a Perl newsgroup and is overwhelmed by the amount of discussion on topics that make no sense to them.
Level 5:
Has basic comprehension of regular expressions, operators, I/O, and scoping. Uses my, use strict, and use warnings because everyone says so. A large number of people never advance beyond this level because they can do pretty much anything (aside from creating reusable components), albeit often inefficiently. Learns about references either at this level or the next one.
Level 6:
May take a Perl class. Knows how to use objects and realizes that this knowledge and the Comprehensive Perl Archive Network (CPAN) enable them to create powerful programs very rapidly. Wants to advance to the next level to see how far this empowerment trend extends.
Level 7:
Learns how to create their own object-oriented modules and experiences the bliss of code reuse for the first time. Knows about packages and the difference between lexical and dynamic variables. Discovers that Perl's regular expressions aren't regular and are capable of far more than simple text manipulation.
Level 8:
Starts giving back in some fashion: either submitting bug reports or patches, modules to CPAN, documentation suggestions, or help for novices. Discovers advanced features like AUTOLOAD and starts using developer-oriented modules like Class::MethodMaker. Uses complex application modules like DBI or Tk where appropriate; comfortable using CGI.pm to create web-based applications.
Level 9:
Attends a Perl conference, participates in the Perl community in other ways; may frequent www.perlmonks.org or #perl (see Chapter 12). Comfortable with creating code on the fly with eval and manipulating the symbol table. Thinks often of performance considerations when coding, probably more than necessary. Publishes modules subclassing major modules to add significant functionality.
Level 10:
Takes part in Perl obfuscation and "golf" contests,[3] comfortable writing a single regular expression using embedded code to implement a function that most other people would have needed to write an entire program for; may submit patches to the Perl core, contribute major new modules, or otherwise become a well-known name within the Perl community.
[3] Perl golf is a contest of seeing who can solve a problem in the fewest number of characters. The winner is unlikely to be intelligible, but there again, how many other languages allow you to write an entire munitions-grade encryption algorithm (RSA) in three lines that fit on a t-shirt?
Continuing this progression for a few more levels until reaching one where the only inhabitant is Larry Wall is left as an exercise to the reader.
4.2.3 What Are the Requirements?
Before rewriting, you must find out how the requirements for the code have changed since the original program was developed. Don't assume that all you have to do is reproduce the functionality unless you've verified that. You may need to optimize the code for different goals; perhaps you're expected to improve its readability, performance, or portability, or extend its functionality.
An important question to ask the original developer is what, if anything, the code was optimized for. For instance, there may be complicated constructions in it that you would simplify if you didn't know that they're required for acceptable performance.
< Day Day Up >
< Day Day Up >
4.3 Style
In [SCOTT01], Ed Wright and I said, "The only believable comment to make about style is that you should have one." I'll go further than that here. Just as in clothing, having a consistent style doesn't necessarily mean other people won't laugh at you. So, to prevent you from accidentally picking a coding style that's equivalent to a 1970s-era pool hustler, here is a suggested style that's rather more conservative.
Of course, some people intentionally dress like pool sharks and may take umbrage at this disparagement of their sartorial tastes. By all means stick with an idiosyncratic style if you've arrived at it through conscious decision and you can live with the consequences. This section is for everyone else. Like many parts of this book, it incorporates a certain amount of personal taste that is one of a number of ways of getting a job done well; I'm not holding it up as the "One True Way."
You can find comments on style in the perlstyle documentation page that comes with Perl, parts of which are reflected here.
4.3.1 Layout
The best thing you can do for the layout of your program is to indent it properly. I continue to be amazed by how many people in my classes write programs with no indentation whatsoever, as though there were a price on the head of anyone using white space. My own tendencies are so ingrained in the other direction that I have to fight the urge to indent my one-liners! No matter how brief or short-lived the program, you're doing everyone a favor by indenting it from the beginning. Many editors make this process painless.
perlstyle recommends a four-column indent; I use two columns because it's enough for my eyes to recognize the block structure and I don't want deeply nested blocks crowding the right margin. Some people favor an eight-column indent by virtue of using the tab character for each indentation level, and I personally find this far too large. They sometimes claim that they merely need to set the tab stops in their editors to four columns to get good-looking indentation, but this forces maintenance programmers to fiddle with their editor configuration, something many programmers are so possessive of that you might as well ask them to stop breathing.
When it comes to brace styles for code blocks, programmers break out into confrontations that make Jonathan Swift's big-endian/little-endian battles ([SWIFT47]) look like Amish barn-raisings. I prefer a style called the BSD/Allman style:
sub read_dicts
{
my $dictref = shift;
my @path = split /:/, $ENV{DICTPATH} || "$ENV{HOME}:.";
local @ARGV = grep -e, ($DEFDICT, map "$_/.dict" => @path);
while (<>)
{
# Assume the line contains a single word
chomp;
$dictref->{$_}++;
}
}
whereas most of the Perl distribution uses the so-called K&R (Kernighan and Ritchie) style:
sub read_dicts {
my $dictref = shift;
my @path = split /:/, $ENV{DICTPATH} || "$ENV{HOME}:.";
local @ARGV = grep -e, ($DEFDICT, map "$_/.dict" => @path);
while (<>) {
# Assume the line contains a single word
chomp;
$dictref->{$_}++;
}
}
which saves a line per block but makes it harder—to this author—to see where the block begins, particularly because a continuation line may be indented. For example:
update($controller->connection, $record[$cur_count]->ticket,
$form_inputs->{params});
might force you to check the end of the first line to make sure it wasn't ending with a brace. Another example:
if (($best_count == 1 && $self->{single_match})
|| ($best_count > 0 && ! $self->{single_match})) {
$found_match = 1;
}
which—even with a four-column indentation—looks more confusing to me than
if (($best_count == 1 && $self->{single_match})
|| ($best_count > 0 && ! $self->{single_match}))
{
$found_match = 1;
}
with only a two-column indentation. (Although to be fair, some K&R adherents suggest this case is worth an exception.) See also how the continuation line was given a nonstandard indentation so the parallel clauses could line up nicely. Don't underestimate the value of using white space to prettify code in this way.[4]
[4] After some conversation with K&R adherents, I am starting to suspect that the difference between them and the BSD/Allman camp is that they tend to look at the ends of lines for semantic clues, whereas we look at the beginnings. Experiment with both styles and see which one you find more appealing.
Although there are at least two ways of conditionally executing a single statement without putting it in an if block, if that statement is long enough to overflow the line, using a block for it may be clearer than the other approaches (because it looks like multiple statements, even if it isn't):
if (@problems)
{
error("We found the following problems:\n"
. (join "\n" => @problems)
);
}
The non-block ways of conditional statement execution are:
condition and statement
For example:
/(banana|apple|pear)/ and push @fruit, $1;
statement if condition
For example:
warn "Reached line $." if $verbose;
Code should be as pleasant to look at as you can possibly make it without ornateness that's fragile with respect to modifications—you don't want to shy away from making necessary changes just because they would destroy a beautiful piece of formatting.
< Day Day Up >
4.1 Strategizing
Perhaps the most important question you will have to answer to begin with is this: To what extent should you rewrite the code? Even if at first it looks far too long to replace, rewriting from scratch is an option you should pursue if at all possible for a host of reasons:
It allows you to psychologically "own" the code: It's no longer someone else's code; it's yours, because you typed it.
You'll be far more familiar with the code, because people remember things much better if they've written them instead of merely having read them.
The code may not need to be that long; the original code may have been accidentally or deliberately overelaborated and maybe there's a slim program in there just waiting to be liberated from the adiposity encasing it.
You'll be able to create tests incrementally as you write each new function.
Perhaps the program started as either a prototype or an application designed to handle a much smaller problem, but it became a victim of its own success and accreted new functionality as a hermit crab accumulates barnacles (the difference being that the crab can still get around). You can recode to fit the new, more complete requirements.
Similarly, the program may contain vestigial code that hasn't been used in years. If you rewrite, you'll automatically exclude it without having to hunt it down.
You can incorporate the latest technology in the new program. In particular, advanced modules are being created now at a dizzying pace; quite possibly large parts of the original code duplicate functionality that can now be left to the module maintainers.
Remember, it'll almost certainly take longer to figure out the code than you think. Consider whether increasing your estimate of the size of the task changes your mind about whether to rewrite or not.
< Day Day Up >
< Day Day Up >
4.2 Why Are You Doing This?
The goals of rewriting are similar to the goals of developing a program from scratch. You must consider these questions: Who are you writing for? How do the demands on your rewrite differ from the demands placed on the original code?
4.2.1 Who's the Audience?
Who is going to see your code? In some environments, code will be inspected by people who will not be involved in maintaining it, may not work in the same department, or may not even be programmers. But their reactions could nevertheless be influential. Determine before you start whether you are in such an environment and find out what you can about the reviewers' tastes. You may be better off coding to a low degree of sophistication so not to antagonize reviewers who are unable to understand more advanced code constructions.
Even if you're fortunate enough not to be incarcerated in such a politically distorted workplace, you must consider who is going to be maintaining this program. Assuming you're not attempting to ensure job security through creating code so obfuscated that no one else can decipher it, if someone else is going to assume responsibility for your program you will want to minimize the extent to which they have to bug you for help. Therefore, if you know who that person is going to be, you should keep the sophistication of your code to a level that they can assimilate. (Of course, this strategy has its limits; if you're creating an air traffic control system to be maintained by a recent high school graduate, accommodation may not be possible.[1])
[1] If you're creating an air traffic control system in Perl, please drop me a line. If your system is supposed to be maintained by teenagers, please also include a map of the areas it serves; rest assured I will put that map to good use. (To forestall the letters, the system described in [HOGABOOM03] is not used in operations.)
In the absence of more specific information, the most likely maintenance programmer for your program is, of course, yourself. Strange as it might seem at this point, many—if not most—programmers, in some subconscious fit of masochism, still code in a way that causes needless pain for themselves later on. You know who you are.
If you don't know who's going to take over your code, the question arises: What level of sophistication should you code to? There is much debating on this topic, and more than a few rancorous opinions. I believe you should code to your own level, and not try to satisfy the preferences of a purely hypothetical maintenance programmer. There abounds an absurd notion that there is an objective "best standard" to code to for maintenance, and people fritter away endless hours arguing whether a construction such as:
(my $new = $old) =~ s/old/new/;
fits this standard,[2] or whether it is clearer to the hypothetical maintainer to rewrite it as:
[2] Tom Christiansen dubbed this the en passant method of setting a variable to a transformation of an expression.
my $new = $old;
$new =~ s/old/new/;
If you're going to be the maintenance programmer, the only concern that need affect you is whether you will be able to understand the code. That is best determined by how long it took you to write it. If you write something quickly, it is more likely to be readable than something that takes hours of sweating and testing to get right.
If those hours are spent constructing a fiendishly compact brand-new idiom, the odds are you will spend even longer trying to figure it out when you come across it six months later. Leave it out for now and use more obvious code instead; if in some future programming effort the new idiom occurs to you without hours of toil, then you've hit on something useful; but more likely, you will be glad you resisted the temptation to use it.
(Writing something quickly doesn't guarantee that it'll make sense in a larger context later on, though, only that you'll understand the syntax. In other words, you may not be able to see the forest for the trees, but at least each tree will be somewhat familiar.)
If, on the other hand, you spend hours developing a small piece of code that doesn't have any apparently clearer representation, you have code that needs commenting. You understand it right now because of all the auxiliary notes that you took, the tests that you ran (which should be rolled into your testing scripts), the articles or texts that you read, and some complex analysis that you performed, none of which appear in the code. If you needed that supporting material to create the code, you'll need it to understand it, so put it in as comments. Later on I'll show how you can insert lengthier information that can be extracted into nicely formatted documents.
4.2.2 The Ten Levels of Perl Programmers
Several years ago, Tom Christiansen created a somewhat tongue-in-cheek list of different levels of Perl programming sophistication ([CHRISTIANSEN98]). With an eye toward classifying maintenance programmers, here's mine. Later on in this book, I'll occasionally refer to one of these levels when I want to indicate the degree of sophistication of some code or technique. It can be helpful to assess at what level the author of a program you are tasked with maintaining was.
Level 1:
Hasn't read any book (or anything else for that matter) on Perl; knows that it's a programming language but knows nothing else about it. Can run a Perl program written by someone else and realizes that they can change parts of the behavior—such as the content of printed strings—by editing parts of the program. Doesn't understand why changes to other parts of the program don't work; has no mental model to fit the language in, so can't differentiate the syntax from languages as diverse as COBOL and C++.
Level 2:
Understands the basic block structure syntax, although only to the extent of recognizing that it is similar to a language like JavaScript. Has a notion that blocks create some kind of scoping effect but does not know about lexical variables and has not encountered use strict or use warnings. Can change the sense of a conditional and use basic arithmetic and logical operators. Thinks that everything they need to do can be achieved by small modifications to programs written by others.
Level 3:
Wants to create programs from scratch and realizes that some kind of education is called for; asks people for book recommendations for learning Perl. Someone at this level may acquire the Camel book [WALL00] thinking that it is a tutorial, and attempt to read the whole thing, suffering severe neurological trauma in the process.
Level 4:
Learns for the first time about use strict and use warnings, but thinks they're more trouble than they're worth. Wonders what my means. Discovers that there are modules for solving just about any problem but doesn't know how to acquire and use them. Subscribes to a Perl newsgroup and is overwhelmed by the amount of discussion on topics that make no sense to them.
Level 5:
Has basic comprehension of regular expressions, operators, I/O, and scoping. Uses my, use strict, and use warnings because everyone says so. A large number of people never advance beyond this level because they can do pretty much anything (aside from creating reusable components), albeit often inefficiently. Learns about references either at this level or the next one.
Level 6:
May take a Perl class. Knows how to use objects and realizes that this knowledge and the Comprehensive Perl Archive Network (CPAN) enable them to create powerful programs very rapidly. Wants to advance to the next level to see how far this empowerment trend extends.
Level 7:
Learns how to create their own object-oriented modules and experiences the bliss of code reuse for the first time. Knows about packages and the difference between lexical and dynamic variables. Discovers that Perl's regular expressions aren't regular and are capable of far more than simple text manipulation.
Level 8:
Starts giving back in some fashion: either submitting bug reports or patches, modules to CPAN, documentation suggestions, or help for novices. Discovers advanced features like AUTOLOAD and starts using developer-oriented modules like Class::MethodMaker. Uses complex application modules like DBI or Tk where appropriate; comfortable using CGI.pm to create web-based applications.
Level 9:
Attends a Perl conference, participates in the Perl community in other ways; may frequent www.perlmonks.org or #perl (see Chapter 12). Comfortable with creating code on the fly with eval and manipulating the symbol table. Thinks often of performance considerations when coding, probably more than necessary. Publishes modules subclassing major modules to add significant functionality.
Level 10:
Takes part in Perl obfuscation and "golf" contests,[3] comfortable writing a single regular expression using embedded code to implement a function that most other people would have needed to write an entire program for; may submit patches to the Perl core, contribute major new modules, or otherwise become a well-known name within the Perl community.
[3] Perl golf is a contest of seeing who can solve a problem in the fewest number of characters. The winner is unlikely to be intelligible, but there again, how many other languages allow you to write an entire munitions-grade encryption algorithm (RSA) in three lines that fit on a t-shirt?
Continuing this progression for a few more levels until reaching one where the only inhabitant is Larry Wall is left as an exercise to the reader.
4.2.3 What Are the Requirements?
Before rewriting, you must find out how the requirements for the code have changed since the original program was developed. Don't assume that all you have to do is reproduce the functionality unless you've verified that. You may need to optimize the code for different goals; perhaps you're expected to improve its readability, performance, or portability, or extend its functionality.
An important question to ask the original developer is what, if anything, the code was optimized for. For instance, there may be complicated constructions in it that you would simplify if you didn't know that they're required for acceptable performance.
< Day Day Up >
< Day Day Up >
4.3 Style
In [SCOTT01], Ed Wright and I said, "The only believable comment to make about style is that you should have one." I'll go further than that here. Just as in clothing, having a consistent style doesn't necessarily mean other people won't laugh at you. So, to prevent you from accidentally picking a coding style that's equivalent to a 1970s-era pool hustler, here is a suggested style that's rather more conservative.
Of course, some people intentionally dress like pool sharks and may take umbrage at this disparagement of their sartorial tastes. By all means stick with an idiosyncratic style if you've arrived at it through conscious decision and you can live with the consequences. This section is for everyone else. Like many parts of this book, it incorporates a certain amount of personal taste that is one of a number of ways of getting a job done well; I'm not holding it up as the "One True Way."
You can find comments on style in the perlstyle documentation page that comes with Perl, parts of which are reflected here.
4.3.1 Layout
The best thing you can do for the layout of your program is to indent it properly. I continue to be amazed by how many people in my classes write programs with no indentation whatsoever, as though there were a price on the head of anyone using white space. My own tendencies are so ingrained in the other direction that I have to fight the urge to indent my one-liners! No matter how brief or short-lived the program, you're doing everyone a favor by indenting it from the beginning. Many editors make this process painless.
perlstyle recommends a four-column indent; I use two columns because it's enough for my eyes to recognize the block structure and I don't want deeply nested blocks crowding the right margin. Some people favor an eight-column indent by virtue of using the tab character for each indentation level, and I personally find this far too large. They sometimes claim that they merely need to set the tab stops in their editors to four columns to get good-looking indentation, but this forces maintenance programmers to fiddle with their editor configuration, something many programmers are so possessive of that you might as well ask them to stop breathing.
When it comes to brace styles for code blocks, programmers break out into confrontations that make Jonathan Swift's big-endian/little-endian battles ([SWIFT47]) look like Amish barn-raisings. I prefer a style called the BSD/Allman style:
sub read_dicts
{
my $dictref = shift;
my @path = split /:/, $ENV{DICTPATH} || "$ENV{HOME}:.";
local @ARGV = grep -e, ($DEFDICT, map "$_/.dict" => @path);
while (<>)
{
# Assume the line contains a single word
chomp;
$dictref->{$_}++;
}
}
whereas most of the Perl distribution uses the so-called K&R (Kernighan and Ritchie) style:
sub read_dicts {
my $dictref = shift;
my @path = split /:/, $ENV{DICTPATH} || "$ENV{HOME}:.";
local @ARGV = grep -e, ($DEFDICT, map "$_/.dict" => @path);
while (<>) {
# Assume the line contains a single word
chomp;
$dictref->{$_}++;
}
}
which saves a line per block but makes it harder—to this author—to see where the block begins, particularly because a continuation line may be indented. For example:
update($controller->connection, $record[$cur_count]->ticket,
$form_inputs->{params});
might force you to check the end of the first line to make sure it wasn't ending with a brace. Another example:
if (($best_count == 1 && $self->{single_match})
|| ($best_count > 0 && ! $self->{single_match})) {
$found_match = 1;
}
which—even with a four-column indentation—looks more confusing to me than
if (($best_count == 1 && $self->{single_match})
|| ($best_count > 0 && ! $self->{single_match}))
{
$found_match = 1;
}
with only a two-column indentation. (Although to be fair, some K&R adherents suggest this case is worth an exception.) See also how the continuation line was given a nonstandard indentation so the parallel clauses could line up nicely. Don't underestimate the value of using white space to prettify code in this way.[4]
[4] After some conversation with K&R adherents, I am starting to suspect that the difference between them and the BSD/Allman camp is that they tend to look at the ends of lines for semantic clues, whereas we look at the beginnings. Experiment with both styles and see which one you find more appealing.
Although there are at least two ways of conditionally executing a single statement without putting it in an if block, if that statement is long enough to overflow the line, using a block for it may be clearer than the other approaches (because it looks like multiple statements, even if it isn't):
if (@problems)
{
error("We found the following problems:\n"
. (join "\n" => @problems)
);
}
The non-block ways of conditional statement execution are:
condition and statement
For example:
/(banana|apple|pear)/ and push @fruit, $1;
statement if condition
For example:
warn "Reached line $." if $verbose;
Code should be as pleasant to look at as you can possibly make it without ornateness that's fragile with respect to modifications—you don't want to shy away from making necessary changes just because they would destroy a beautiful piece of formatting.
< Day Day Up >
An Example Using Test:: Modules
< Day Day Up >
3.3 An Example Using Test:: Modules
Let's put what we've learned to use in developing an actual application. Say that we want to create a module that can limit the possible indices of an array, a bounds checker if you will. Perl's arrays won't normally do that,[6] so we need a mechanism that intercepts the day-to-day activities of an array and checks the indices being used, throwing an exception if they're outside a specified range. Fortunately, such a mechanism exists in Perl; it's called tieing, and pretty powerful it is too.
[6] If you're smart enough to bring up $[, then you're also smart enough to know that you shouldn't be using it.
Because our module will work by letting us tie an array to it, we'll call it Tie::Array::Bounded. We start by letting h2xs do the rote work of creating a new module:
% h2xs -AXn Tie::Array::Bounded
Writing Tie/Array/Bounded/Bounded.pm
Writing Tie/Array/Bounded/Makefile.PL
Writing Tie/Array/Bounded/README
Writing Tie/Array/Bounded/test.pl
Writing Tie/Array/Bounded/Changes
Writing Tie/Array/Bounded/MANIFEST
That saved a lot of time! h2xs comes with perl, so you already have it. Don't be put off by the name: h2xs was originally intended for creating perl extensions from C header files, a more or less obsolete purpose now, but by dint of copious interface extension, h2xs now enjoys a new lease on life for creating modules. (In Section 8.2.4, I'll look at a more modern alternative to h2xs.)
Don't be confused by the fact that the file Bounded.pm is in the directory Tie/Array/Bounded. It may look like there's an extra directory in there but the hierarchy that h2xs created is really just to help keep your sources straight. Everything you create will be in the bottom directory, so we could cd there. For instant gratification we can create a Makefile the way we would with any CPAN module:
% cd Tie/Array/Bounded
% perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for Tie::Array::Bounded
and now we can even run a test:
% make test
cp Bounded.pm blib/lib/Tie/Array/Bounded.pm
PERL_DL_NONLAZY=1 /usr/local/bin/perl -Iblib/arch -Iblib/lib -
I/usr/lib/perl5/5.6.1/i386-linux -I/usr/lib/perl5/5.6.1 test.pl
1..1
ok 1
It even passes! This is courtesy of the file test.pl that h2xs created for us, which contains a basic test that the module skeleton created by h2xs passes. This is very good for building our confidence. Unfortunately, test.pl is not the best way to create tests. We'll see why when we improve on it by moving test.pl into a subdirectory called "t" and rebuilding the Makefile before rerunning "make test":
% mkdir t
% mv test.pl t/01load.t
% perl Makefile.PL
Writing Makefile for Tie::Array::Bounded
% make test
PERL_DL_NONLAZY=1 /usr/local/bin/perl -Iblib/arch -Iblib/lib -
I/usr/lib/perl5/5.6.1/i386-linux -I/usr/lib/perl5/5.6.1 -e 'use
Test::Harness qw(&runtests $verbose); $verbose=0; runtests
@ARGV;' t/*.t
t/01load....ok
All tests successful.
Files=1, Tests=1, 0 wallclock secs ( 0.30 cusr + 0.05 csys =
0.35 CPU)
The big difference: "make test" knows that it should run Test::Harness over the .t files in the t subdirectory, thereby giving us a summary of the results. There's only one file in there at the moment, but we can create more if we want instead of having to pack every test into test.pl.
At this point you might want to update the MANIFEST file to remove the line for test.pl now that we have removed that file.
If you're using Perl 5.8.0 or later, then your h2xs has been modernized to create the test in t/1.t; furthermore, it will use Test::More.[7] But if you have a prior version of Perl, you'll find the test.pl file we just moved uses the deprecated Test module, so let's start from scratch and replace the contents of t/01load.t as follows:
[7] I name my tests with two leading digits so that they will sort properly; I want to run them in a predictable order, and if I have more than nine tests, test 10.t would be run before test 2.t, because of the lexicographic sorting used by the glob() function called by "make test". Having done that, I can then add text after the digits so that I can also see what the tests are meant for, winding up with test names such as 01load.t.
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 1;
use blib;
BEGIN { use_ok("Tie::Array::Bounded") }
The use blib statement causes Perl to search in parent directories for a blib directory that contains the Tie/Array/Bounded.pm module created by make. Although we'll usually run our tests by typing "make test" in the parent directory, this structure for a .t file allows us to run tests individually, which will be helpful when isolating failures.
Running this test either stand-alone ("./01load.t") or with "make test" produces the same output as before (plus a note from use blib about where it found the blib directory), so let's move on and add some code to Bounded.pm. First, delete some code that h2xs put there; we're not going to export anything, and our code will work on earlier versions of Perl 5, so remove code until the executable part of Bounded.pm looks like this:
package Tie::Array::Bounded;
use strict;
use warnings;
our $VERSION = '0.01';
1;
Now it's time to add subroutines to implement tieing. tie is how to make possessed variables with Perl: Literally anything can happen behind the scenes when the user does the most innocuous thing. A simple expression like $world_peace++ could end up launching a wave of nuclear missiles, if $world_peace happens to be tied to Mutually::Assured::Destruction. (See, you can even use Perl to make covert political statements.)
We need a TIEARRAY subroutine; perltie tells us so. So let's add an empty one to Bounded.pm:
sub TIEARRAY
{
}
and add a test to look for it in 01 load.t:
use Test::More tests => 2;
use blib;
BEGIN { use_ok("Tie::Array::Bounded") }
can_ok("Tie::Array::Bounded", "TIEARRAY");
Running "make test" copies the new Bounded.pm into blib and produces:
% make test
cp Bounded.pm blib/lib/Tie/Array/Bounded.pm
PERL_DL_NONLAZY=1 /usr/local/bin/perl -Iblib/arch -Iblib/lib -
I/usr/lib/perl5/5.6.1/i386-linux -I/usr/lib/perl5/5.6.1 -e 'use
Test::Harness qw(&runtests $verbose); $verbose=0; runtests
@ARGV;' t/*.t
t/01load....Using /home/peter/perl_Medic/Tie/Array/Bounded/blib
t/01load....ok
All tests successful.
Files=1, Tests=2, 0 wallclock secs ( 0.29 cusr + 0.02 csys =
0.31 CPU)
We have just doubled our number of regression tests!
It may seem as though we're taking ridiculously small steps here. A subroutine that doesn't do anything? What's the point in testing for that? Actually, the first time I ran that test, it failed: I had inadvertently gone into overwrite mode in the editor and made a typo in the routine name. The point in testing every little thing is to build your confidence in the code and catch even the dumbest errors right away.
So let's continue. We should decide on an interface for this module; let's say that when we tie an array we must specify an upper bound for the array indices, and optionally a lower bound. If the user employs an index out of this range, the program will die. For the sake of having small test files, we'll create a new one for this test and call it 02tie.t:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 1;
use blib;
use Tie::Array::Bounded;
my $obj = tie my @array, "Tie::Array::Bounded";
isa_ok($obj, "Tie::Array::Bounded");
So far, this just tests that the underlying object from the tie is or inherits from Tie::Array::Bounded. Run this test before you even add any code to TIEARRAY to make sure that it does indeed fail:
% ./02tie.t
1..1
Using /home/peter/perl_Medic/Tie/Array/Bounded/t/../blib
not ok 1 - The object isa Tie::Array::Bounded
# Failed test (./02tie.t at line 10)
# The object isn't defined
# Looks like you failed 1 tests of 1.
We're not checking that the module can be used or that it has a TIEARRAY method; we already did those things in 01load.t. Now we know that the test routine is working properly. Let's make a near-minimal version of TIEARRAY that will satisfy this test:
sub TIEARRAY
{
my $class = shift;
my ($upper, $lower);
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
Now the test passes. Should we test that the object is a hashref with keys upper, lower, and so on? No—that's part of the private implementation of the object and users, including tests, have no right peeking in there.
Well, it doesn't really do to have a bounded array type if the user doesn't specify any bounds. A default lower bound of 0 is obvious because most bounded arrays will start from there anyway and be limited in how many elements they can contain. It doesn't make sense to have a default upper bound because no guess could be better than any other. We want this module to die if the user doesn't specify an upper bound (italicized code):
sub TIEARRAY
{
my ($class, %arg) = @_;
my ($upper, $lower) = @arg{qw(upper lower)};
$lower ||= 0;
croak "No upper bound for array" unless $upper;
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
Note that when we want to die in a module, the proper routine to use is croak(). This results in an error message that identifies the calling line of the code, and not the current line, as the source of the error. This allows the user to locate the place in their program where they made a mistake. croak() comes from the Carp Module, so we added a use Carp statement to Bounded.pm (not shown).
Note also that we set the lower bound to a default of 0. True, if the user didn't specify a lower bound, $lower would be undefined and hence evaluate to 0 in a numeric context. But it's wise to expose our defaults explicitly, and this also avoids warnings about using an uninitialized value. Modify 02tie.t to say:
use Test::More tests => 1;
use Test::Exception;
use blib;
use Tie::Array::Bounded;
dies_ok { tie my @array, "Tie::Array::Bounded" }
"Croak with no bound specified";
If you're running 02tie.t as a stand-alone test, remember to run make in the parent directory after modifying Bounded.pm so that Bounded.pm gets copied into the blib tree.
Great! Now let's add back in the test that we can create a real object when we tie with the proper calling sequence:
my $obj;
lives_ok { $obj = tie my @array, "Tie::Array::Bounded",
upper => 42
} "Tied array okay";
isa_ok($obj, "Tie::Array::Bounded");
and increase the number of tests to 3. (Notice that there is no comma after the block of code that's the first argument to dies_ok and lives_ok.)
All this testing has gotten us in a pedantic frame of mind. The user shouldn't be allowed to specify an array bound that is negative or not an integer. Let's add a statement to TIEARRAY (in italics):
sub TIEARRAY
{
my ($class, %arg) = @_;
my ($upper, $lower) = @arg{qw(upper lower)};
$lower ||= 0;
croak "No upper bound for array" unless $upper;
/\D/ and croak "Array bound must be integer"
for ($upper, $lower);
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
and, of course, test it:
throws_ok { tie my @array, "Tie::Array::Bounded", upper => -1 }
qr/must be integer/, "Non-integral bound fails";
Now we're not only checking that the code dies, but that it dies with a message matching a particular pattern.
We're really on a roll here! Why don't we batten down the hatches on this interface and let the user know if they gave us an argument we're not expecting:
sub TIEARRAY
{
my ($class, %arg) = @_;
my ($upper, $lower) = delete @arg{qw(upper lower)};
croak "Illegal arguments in tie" if %arg;
croak "No upper bound for array" unless $upper;
$lower ||= 0;
/\D/ and croak "Array bound must be integer"
for ($upper, $lower);
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
and the test:
throws_ok { tie my @array, "Tie::Array::Bounded", frogs => 10 }
qr/Illegal arguments/, "Illegal argument fails";
The succinctness of our approach depends on the underappreciated hash slice and the delete() function. Hash slices [GUTTMAN98] are a way to get multiple elements from a hash with a single expression, and the delete() function removes those elements while returning their values. Therefore, anything left in the hash must be illegal.
We're nearly done with the pickiness. There's one final test we should apply. Have you guessed what it is? We should make sure that the user doesn't enter a lower bound that's higher than the upper one. Can you imagine what the implementation of bounded arrays would do if we didn't check for this? I can't, because I haven't written it yet, but it might be ugly. Let's head that off at the pass right now:
sub TIEARRAY
{
my ($class, %arg) = @_;
my ($upper, $lower) = delete @arg{qw(upper lower)};
croak "Illegal arguments in tie" if %arg;
$lower ||= 0;
croak "No upper bound for array" unless $upper;
/\D/ and croak "Array bound must be integer"
for ($upper, $lower);
croak "Upper bound < lower bound" if $upper < $lower;
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
and the new test goes at the end of 02tie.t (italicized):
Example 3.2. Final Version of 02tie.t
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 6;
use Test::Exception;
use blib;
use Tie::Array::Bounded;
dies_ok { tie my @array, "Tie::Array::Bounded" }
"Croak with no bound specified";
my $obj;
lives_ok { $obj = tie my @array, "Tie::Array::Bounded",
upper => 42 }
"Tied array okay";
isa_ok($obj, "Tie::Array::Bounded");
throws_ok { tie my @array, "Tie::Array::Bounded", upper => -1 }
qr/must be integer/, "Non-integral bound fails";
throws_ok { tie my @array, "Tie::Array::Bounded", frogs => 10 }
qr/Illegal arguments/, "Illegal argument fails";
throws_ok { tie my @array, "Tie::Array::Bounded",
lower => 2, upper => 1 }
qr/Upper bound < lower/, "Wrong bound order fails";
Whoopee! We're nearly there. Now we need to make the tied array behave properly, so let's start a new test file for that, called 03use.t:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 1;
use Test::Exception;
use blib;
use Tie::Array::Bounded;
my @array;
tie @array, "Tie::Array::Bounded", upper => 5;
lives_ok { $array[0] = 42 } "Store works";
As before, let's ensure that the test fails before we add the code to implement it:
% t/03use.t
1..1
Using /home/peter/perl_Medic/Tie/Array/Bounded/blib
not ok 1 - Store works
# Failed test (t/03use.t at line 13)
# died: Can't locate object method "STORE" via package
"Tie::Array::Bounded" (perhaps you forgot to load
"Tie::Array::Bounded"?) at t/03use.t line 13.
# Looks like you failed 1 tests of 1.
How about that. The test even told us what routine we need to write. perltie tells us what it should do. So let's add to Bounded.pm:
sub STORE
{
my ($self, $index, $value) = @_;
$self->_bound_check($index);
$self->{array}[$index] = $value;
}
sub _bound_check
{
my ($self, $index) = @_;
my ($upper, $lower) = @{$self}{qw(upper lower)};
croak "Index $index out of range [$lower, $upper]"
if $index < $lower || $index > $upper;
}
We've abstracted the bounds checking into a method of its own in anticipation of needing it again. Now 03use.t passes, and we can add another test to make sure that the value we stored in the array can be retrieved:
is($array[0], 42, "Fetch works");
You might think this would fail for want of the FETCH method, but in fact:
ok 1 - Store works
Can't locate object method "FETCHSIZE" via package
"Tie::Array::Bounded" (perhaps you forgot to load
"Tie::Array::Bounded"?) at t/03use.t line 14.
# Looks like you planned 2 tests but only ran 1.
# Looks like your test died just after 1.
Back to perltie to find out what FETCHSIZE is supposed to do: return the size of the array. Easy enough:
sub FETCHSIZE
{
my $self = shift;
scalar @{$self->{array}};
}
Now the test does indeed fail for want of FETCH, so we'll add that:
sub FETCH
{
my ($self, $index) = @_;
$self->_bound_check($index);
$self->{array}[$index];
}
Finally we are back in the anodyne land of complete test success. Time to add more tests:
throws_ok { $array[6] = "dog" } qr/out of range/,
"Bounds exception";
is_deeply(\@array, [ 42 ], "Array contents correct");
These work immediately. But an ugly truth emerges when we try another simple array operation:
lives_ok { push @array, 17 } "Push works";
This results in:
not ok 5 - Push works
# Failed test (t/03use.t at line 19)
# died: Can't locate object method "PUSH" via package
"Tie::Array::Bounded" (perhaps you forgot to load
"Tie::Array::Bounded"?) at t/03use.t line 19.
# Looks like you failed 1 tests of 5.
Inspecting perltie reveals that PUSH is one of several methods it looks like we're going to have to write. Do we really have to write them all? Can't we be lazier than that?
Yes, we can.[8] The Tie::Array core module defines PUSH and friends in terms of a handful of methods we have to write: FETCH, STORE, FETCHSIZE, and STORESIZE. The only one we haven't done yet is STORESIZE:
[8] Remember, if you find yourself doing something too rote or boring, look for a way to get the computer to make it easier for you. Top of the list of those ways would be finding code someone else already wrote to solve the problem.
sub STORESIZE
{
my ($self, $size) = @_;
$self->_bound_check($size-1);
$#{$self->{array}} = $size - 1;
}
We need to add near the top of Bounded.pm:
use base qw(Tie::Array);
to inherit all that array method goodness.
This is a big step to take, and if we didn't have canned tests, we might wonder what sort of unknown havoc could be wrought upon our module by a new base class if we misused it. However, our test suite allows us to determine that, in fact, nothing has broken.
Now we can add to 01load.t the methods FETCH, STORE, FETCHSIZE, and STORESIZE in the can_ok test:
Example 3.3. Final Version of 01load.t
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 2;
use blib;
BEGIN { use_ok("Tie::Array::Bounded") }
can_ok("Tie::Array::Bounded", qw(TIEARRAY STORE FETCH STORESIZE
FETCHSIZE));
Because our tests pass, let's add as many more as we can to test all the boundary conditions we can think of, leaving us with a final 03use.t file of:
Example 3.4. Final Version of 03use.t
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 15;
use Test::Exception;
use blib;
use Tie::Array::Bounded;
my $RANGE_EXCEP = qr/out of range/;
my @array;
tie @array, "Tie::Array::Bounded", upper => 5;
lives_ok { $array[0] = 42 } "Store works";
is($array[0], 42, "Fetch works");
throws_ok { $array[6] = "dog" } $RANGE_EXCEP,
"Bounds exception";
is_deeply(\@array, [ 42 ], "Array contents correct");
lives_ok { push @array, 17 } "Push works";
is($array[1], 17, "Second array element correct");
lives_ok { push @array, 2, 3 } "Push multiple elements works";
is_deeply(\@array, [ 42, 17, 2, 3 ], "Array contents correct");
lives_ok { splice(@array, 4, 0, qw(apple banana)) }
"Splice works";
is_deeply(\@array, [ 42, 17, 2, 3, 'apple', 'banana' ],
"Array contents correct");
throws_ok { push @array, "excessive" } $RANGE_EXCEP,
"Push bounds exception";
is(scalar @array, 6, "Size of array correct");
tie @array, "Tie::Array::Bounded", lower => 3, upper => 6;
throws_ok { $array[1] = "too small" } $RANGE_EXCEP,
"Lower bound check failure";
lives_ok { @array[3..6] = 3..6 } "Slice assignment works";
throws_ok { push @array, "too big" } $RANGE_EXCEP,
"Push bounds exception";
Tests are real programs, too. Because we test for the same exception repeatedly, we put its recognition pattern in a variable to be lazy.
Bounded.pm, although not exactly a model of efficiency (our internal array contains unnecessary space allocated to the first $lower elements that will never be used), is now due for documenting, and h2xs filled out some POD stubs already. We'll flesh it out to the final version you can see in the Appendix. I'll go into documentation more in Chapter 10.
Now we create 04pod.t to test that the POD is formatted correctly:
Example 3.5. Final Version of 04pod.t
#!/usr/bin/perl
use strict;
use warnings;
use Test::Pod tests => 1;
use blib;
use Tie::Array::Bounded;
pod_ok($INC{"Tie/Array/Bounded.pm"});
There's just a little trick there to allow us to run this test from any directory, since all the others can be run with the current working directory set to either the parent directory or the t directory. We load the module itself and then get Perl to tell us where it found the file by looking it up in the %INC hash, which tracks such things (see its entry in perlvar).
With a final "make test", we're done:
Files=4, Tests=24, 2 wallclock secs ( 1.58 cusr + 0.24 csys = 1.82 CPU)
We have a whole 24 tests at our fingertips ready to be repeated any time we want.
You can get more help on how to use these modules from the module Test::Tutorial, which despite the module appellation contains no code, only documentation.
With only a bit more work, this module could have been submitted to CPAN. See [TREGAR02] for full instructions.
< Day Day Up >
< Day Day Up >
3.4 Testing Legacy Code
"This is all well and good," I can hear you say, "but I just inherited a swamp of 27 programs and 14 modules and they have no tests. What do I do?"
By now you've learned that it is far more appealing to write tests as you write the code they test, so if you can possibly rewrite this application, do so. But if you're stuck with having to tweak an existing application, then adopt a top-down approach. Start by testing that the application meets its requirements . . . assuming you were given requirements or can figure out what they were. See what a successful run of the program outputs and how it may have changed its environment, then write tests that look for those effects.
3.4.1 A Simple Example
You have an inventory control program for an aquarium, and it produces output files called cetaceans.txt, crustaceans.txt, molluscs.txt, pinnipeds.txt, and so on. Capture the output files from a successful run and put them in a subdirectory called success. Then run this test:
Example 3.6. Demonstration of Testing Program Output
1 my @Success_files;
2 BEGIN {
3 @Success_files = glob "success/*.txt";
4 }
5
6 use Test::More tests => 1 + 2 * @Success_files;
7
8 is(system("aquarium"), 0, "Program succeeded");
9
10 for my $success (@Success_files)
11 {
12 (my $output = $success) =~ s#.*/##;
13
14 ok(-e $output, "$output present");
15
16 is(system("cmp $output $success > /dev/null 2>&1"),
17 0, "$output is valid");
18 }
First, we capture the names of the output files in the success subdirectory. We do that in a BEGIN block so that the number of names is available in line 6. In line 8 we run the program and check that it has a successful return code. Then for each of the required output files, in line 14 we test that it is present, and in line 16 we use the UNIX cmp utility to check that it matches the saved version. If you don't have a cmp program, you can write a Perl subroutine to perform the same test: Just read each file and compare chunks of input until finding a mismatch or hitting the ends of file.
3.4.2 Testing Web Applications
A Common Gateway Interface (CGI) program that hasn't been developed with a view toward automated testing may be a solid block of congealed code with pieces of web interface functionality sprinkled throughout it like raisins in a fruit cake. But you don't need to rip it apart to write a test for it; you can verify that it meets its requirements with an end-to-end test. All you need is a program that pretends to be a user at a web browser and checks that the response to input is correct. It doesn't matter how the CGI program is written because all the testing takes place on a different machine from the one the CGI program is stored on.
The WWW::Mechanize module by Andy Lester comes to your rescue here. It allows you to automate web site interaction by pretending to be a web browser, a function ably pulled off by Gisle Aas' LWP::UserAgent module. WWW::Mechanize goes several steps farther, however (in fact, it is a subclass of LWP::UserAgent), enabling cookie handling by default and providing methods for following hyperlinks and submitting forms easily, including transparent handling of hidden fields.[9]
[9] If you're thinking, "Hey! I could use this to write an agent that will stuff the ballot box on surveys I want to fix," forget it; it's been done before. Chris Nandor used Perl to cast thousands of votes for his choice for American League All-Star shortstop [GLOBE99]. And this was before WWW::Mechanize was even invented.
Suppose we have an application that provides a login screen. For the usual obscure reasons, the login form, login.html, contains one or more hidden fields in addition to the user-visible input fields, like this:
On successful login, the response page greets the user with "Welcome, " followed by the user's first name. We can write this test for this login function:
Example 3.7. Using WWW::Mechanize to Test a Web Application
1 #!/usr/bin/perl
2 use strict;
3 use warnings;
4
5 use WWW::Mechanize;
6 use Test::More tests => 3;
7
8 my $URL = 'http://localhost/login.html';
9 my $USERNAME = 'peter';
10 my $PASSWORD = 'secret';
11
12 my $ua = WWW::Mechanize->new;
13 ok($ua->get($URL)->is_success, "Got first page")
14 or die $ua->res->message;
15
16 $ua->set_fields(username => $USERNAME,
17 password => $PASSWORD);
18 ok($ua->submit->is_success, "Submitted form")
19 or die $ua->res->message;
20
21 like($ua->content, qr/Welcome, Peter/, "Logged in okay");
In line 12 we create a new WWW::Mechanize user agent to act as a pretend browser, and in line 13 we test to see if it was able to get the login page; the get() method returns a HTTP::Response object that has an is_success() method. If something went wrong with fetching the page the false value will be passed through the ok() function; there's no point in going further so we might as well die() (line 14). We can get at the HTTP::Response object again via the res() method of the user agent to call its message() method, which returns the text of the reason for failure.
In lines 16 and 17 we provide the form inputs by name, and in line 18 the submit() method of the user agent submits the form and reads the response, again returning an HTTP::Response object allowing us to verify success as before. Once we have a response page we check to see whether it looks like what we wanted.
Note that WWW::Mechanize can be used to test interaction with any web application, regardless of where that application is running or what it is written in.
3.4.3 What Next?
The kind of end-to-end testing we have been doing is useful and necessary; it is also a lot easier than the next step. To construct comprehensive tests for a large package, we must include unit tests; that means testing each function and method. However, unless we have descriptions of what each subroutine does, we won't know how to test them without investigative work to find out what they are supposed to do. I'll go into those kinds of techniques later.
< Day Day Up >
3.3 An Example Using Test:: Modules
Let's put what we've learned to use in developing an actual application. Say that we want to create a module that can limit the possible indices of an array, a bounds checker if you will. Perl's arrays won't normally do that,[6] so we need a mechanism that intercepts the day-to-day activities of an array and checks the indices being used, throwing an exception if they're outside a specified range. Fortunately, such a mechanism exists in Perl; it's called tieing, and pretty powerful it is too.
[6] If you're smart enough to bring up $[, then you're also smart enough to know that you shouldn't be using it.
Because our module will work by letting us tie an array to it, we'll call it Tie::Array::Bounded. We start by letting h2xs do the rote work of creating a new module:
% h2xs -AXn Tie::Array::Bounded
Writing Tie/Array/Bounded/Bounded.pm
Writing Tie/Array/Bounded/Makefile.PL
Writing Tie/Array/Bounded/README
Writing Tie/Array/Bounded/test.pl
Writing Tie/Array/Bounded/Changes
Writing Tie/Array/Bounded/MANIFEST
That saved a lot of time! h2xs comes with perl, so you already have it. Don't be put off by the name: h2xs was originally intended for creating perl extensions from C header files, a more or less obsolete purpose now, but by dint of copious interface extension, h2xs now enjoys a new lease on life for creating modules. (In Section 8.2.4, I'll look at a more modern alternative to h2xs.)
Don't be confused by the fact that the file Bounded.pm is in the directory Tie/Array/Bounded. It may look like there's an extra directory in there but the hierarchy that h2xs created is really just to help keep your sources straight. Everything you create will be in the bottom directory, so we could cd there. For instant gratification we can create a Makefile the way we would with any CPAN module:
% cd Tie/Array/Bounded
% perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for Tie::Array::Bounded
and now we can even run a test:
% make test
cp Bounded.pm blib/lib/Tie/Array/Bounded.pm
PERL_DL_NONLAZY=1 /usr/local/bin/perl -Iblib/arch -Iblib/lib -
I/usr/lib/perl5/5.6.1/i386-linux -I/usr/lib/perl5/5.6.1 test.pl
1..1
ok 1
It even passes! This is courtesy of the file test.pl that h2xs created for us, which contains a basic test that the module skeleton created by h2xs passes. This is very good for building our confidence. Unfortunately, test.pl is not the best way to create tests. We'll see why when we improve on it by moving test.pl into a subdirectory called "t" and rebuilding the Makefile before rerunning "make test":
% mkdir t
% mv test.pl t/01load.t
% perl Makefile.PL
Writing Makefile for Tie::Array::Bounded
% make test
PERL_DL_NONLAZY=1 /usr/local/bin/perl -Iblib/arch -Iblib/lib -
I/usr/lib/perl5/5.6.1/i386-linux -I/usr/lib/perl5/5.6.1 -e 'use
Test::Harness qw(&runtests $verbose); $verbose=0; runtests
@ARGV;' t/*.t
t/01load....ok
All tests successful.
Files=1, Tests=1, 0 wallclock secs ( 0.30 cusr + 0.05 csys =
0.35 CPU)
The big difference: "make test" knows that it should run Test::Harness over the .t files in the t subdirectory, thereby giving us a summary of the results. There's only one file in there at the moment, but we can create more if we want instead of having to pack every test into test.pl.
At this point you might want to update the MANIFEST file to remove the line for test.pl now that we have removed that file.
If you're using Perl 5.8.0 or later, then your h2xs has been modernized to create the test in t/1.t; furthermore, it will use Test::More.[7] But if you have a prior version of Perl, you'll find the test.pl file we just moved uses the deprecated Test module, so let's start from scratch and replace the contents of t/01load.t as follows:
[7] I name my tests with two leading digits so that they will sort properly; I want to run them in a predictable order, and if I have more than nine tests, test 10.t would be run before test 2.t, because of the lexicographic sorting used by the glob() function called by "make test". Having done that, I can then add text after the digits so that I can also see what the tests are meant for, winding up with test names such as 01load.t.
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 1;
use blib;
BEGIN { use_ok("Tie::Array::Bounded") }
The use blib statement causes Perl to search in parent directories for a blib directory that contains the Tie/Array/Bounded.pm module created by make. Although we'll usually run our tests by typing "make test" in the parent directory, this structure for a .t file allows us to run tests individually, which will be helpful when isolating failures.
Running this test either stand-alone ("./01load.t") or with "make test" produces the same output as before (plus a note from use blib about where it found the blib directory), so let's move on and add some code to Bounded.pm. First, delete some code that h2xs put there; we're not going to export anything, and our code will work on earlier versions of Perl 5, so remove code until the executable part of Bounded.pm looks like this:
package Tie::Array::Bounded;
use strict;
use warnings;
our $VERSION = '0.01';
1;
Now it's time to add subroutines to implement tieing. tie is how to make possessed variables with Perl: Literally anything can happen behind the scenes when the user does the most innocuous thing. A simple expression like $world_peace++ could end up launching a wave of nuclear missiles, if $world_peace happens to be tied to Mutually::Assured::Destruction. (See, you can even use Perl to make covert political statements.)
We need a TIEARRAY subroutine; perltie tells us so. So let's add an empty one to Bounded.pm:
sub TIEARRAY
{
}
and add a test to look for it in 01 load.t:
use Test::More tests => 2;
use blib;
BEGIN { use_ok("Tie::Array::Bounded") }
can_ok("Tie::Array::Bounded", "TIEARRAY");
Running "make test" copies the new Bounded.pm into blib and produces:
% make test
cp Bounded.pm blib/lib/Tie/Array/Bounded.pm
PERL_DL_NONLAZY=1 /usr/local/bin/perl -Iblib/arch -Iblib/lib -
I/usr/lib/perl5/5.6.1/i386-linux -I/usr/lib/perl5/5.6.1 -e 'use
Test::Harness qw(&runtests $verbose); $verbose=0; runtests
@ARGV;' t/*.t
t/01load....Using /home/peter/perl_Medic/Tie/Array/Bounded/blib
t/01load....ok
All tests successful.
Files=1, Tests=2, 0 wallclock secs ( 0.29 cusr + 0.02 csys =
0.31 CPU)
We have just doubled our number of regression tests!
It may seem as though we're taking ridiculously small steps here. A subroutine that doesn't do anything? What's the point in testing for that? Actually, the first time I ran that test, it failed: I had inadvertently gone into overwrite mode in the editor and made a typo in the routine name. The point in testing every little thing is to build your confidence in the code and catch even the dumbest errors right away.
So let's continue. We should decide on an interface for this module; let's say that when we tie an array we must specify an upper bound for the array indices, and optionally a lower bound. If the user employs an index out of this range, the program will die. For the sake of having small test files, we'll create a new one for this test and call it 02tie.t:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 1;
use blib;
use Tie::Array::Bounded;
my $obj = tie my @array, "Tie::Array::Bounded";
isa_ok($obj, "Tie::Array::Bounded");
So far, this just tests that the underlying object from the tie is or inherits from Tie::Array::Bounded. Run this test before you even add any code to TIEARRAY to make sure that it does indeed fail:
% ./02tie.t
1..1
Using /home/peter/perl_Medic/Tie/Array/Bounded/t/../blib
not ok 1 - The object isa Tie::Array::Bounded
# Failed test (./02tie.t at line 10)
# The object isn't defined
# Looks like you failed 1 tests of 1.
We're not checking that the module can be used or that it has a TIEARRAY method; we already did those things in 01load.t. Now we know that the test routine is working properly. Let's make a near-minimal version of TIEARRAY that will satisfy this test:
sub TIEARRAY
{
my $class = shift;
my ($upper, $lower);
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
Now the test passes. Should we test that the object is a hashref with keys upper, lower, and so on? No—that's part of the private implementation of the object and users, including tests, have no right peeking in there.
Well, it doesn't really do to have a bounded array type if the user doesn't specify any bounds. A default lower bound of 0 is obvious because most bounded arrays will start from there anyway and be limited in how many elements they can contain. It doesn't make sense to have a default upper bound because no guess could be better than any other. We want this module to die if the user doesn't specify an upper bound (italicized code):
sub TIEARRAY
{
my ($class, %arg) = @_;
my ($upper, $lower) = @arg{qw(upper lower)};
$lower ||= 0;
croak "No upper bound for array" unless $upper;
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
Note that when we want to die in a module, the proper routine to use is croak(). This results in an error message that identifies the calling line of the code, and not the current line, as the source of the error. This allows the user to locate the place in their program where they made a mistake. croak() comes from the Carp Module, so we added a use Carp statement to Bounded.pm (not shown).
Note also that we set the lower bound to a default of 0. True, if the user didn't specify a lower bound, $lower would be undefined and hence evaluate to 0 in a numeric context. But it's wise to expose our defaults explicitly, and this also avoids warnings about using an uninitialized value. Modify 02tie.t to say:
use Test::More tests => 1;
use Test::Exception;
use blib;
use Tie::Array::Bounded;
dies_ok { tie my @array, "Tie::Array::Bounded" }
"Croak with no bound specified";
If you're running 02tie.t as a stand-alone test, remember to run make in the parent directory after modifying Bounded.pm so that Bounded.pm gets copied into the blib tree.
Great! Now let's add back in the test that we can create a real object when we tie with the proper calling sequence:
my $obj;
lives_ok { $obj = tie my @array, "Tie::Array::Bounded",
upper => 42
} "Tied array okay";
isa_ok($obj, "Tie::Array::Bounded");
and increase the number of tests to 3. (Notice that there is no comma after the block of code that's the first argument to dies_ok and lives_ok.)
All this testing has gotten us in a pedantic frame of mind. The user shouldn't be allowed to specify an array bound that is negative or not an integer. Let's add a statement to TIEARRAY (in italics):
sub TIEARRAY
{
my ($class, %arg) = @_;
my ($upper, $lower) = @arg{qw(upper lower)};
$lower ||= 0;
croak "No upper bound for array" unless $upper;
/\D/ and croak "Array bound must be integer"
for ($upper, $lower);
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
and, of course, test it:
throws_ok { tie my @array, "Tie::Array::Bounded", upper => -1 }
qr/must be integer/, "Non-integral bound fails";
Now we're not only checking that the code dies, but that it dies with a message matching a particular pattern.
We're really on a roll here! Why don't we batten down the hatches on this interface and let the user know if they gave us an argument we're not expecting:
sub TIEARRAY
{
my ($class, %arg) = @_;
my ($upper, $lower) = delete @arg{qw(upper lower)};
croak "Illegal arguments in tie" if %arg;
croak "No upper bound for array" unless $upper;
$lower ||= 0;
/\D/ and croak "Array bound must be integer"
for ($upper, $lower);
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
and the test:
throws_ok { tie my @array, "Tie::Array::Bounded", frogs => 10 }
qr/Illegal arguments/, "Illegal argument fails";
The succinctness of our approach depends on the underappreciated hash slice and the delete() function. Hash slices [GUTTMAN98] are a way to get multiple elements from a hash with a single expression, and the delete() function removes those elements while returning their values. Therefore, anything left in the hash must be illegal.
We're nearly done with the pickiness. There's one final test we should apply. Have you guessed what it is? We should make sure that the user doesn't enter a lower bound that's higher than the upper one. Can you imagine what the implementation of bounded arrays would do if we didn't check for this? I can't, because I haven't written it yet, but it might be ugly. Let's head that off at the pass right now:
sub TIEARRAY
{
my ($class, %arg) = @_;
my ($upper, $lower) = delete @arg{qw(upper lower)};
croak "Illegal arguments in tie" if %arg;
$lower ||= 0;
croak "No upper bound for array" unless $upper;
/\D/ and croak "Array bound must be integer"
for ($upper, $lower);
croak "Upper bound < lower bound" if $upper < $lower;
return bless { upper => $upper,
lower => $lower,
array => []
}, $class;
}
and the new test goes at the end of 02tie.t (italicized):
Example 3.2. Final Version of 02tie.t
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 6;
use Test::Exception;
use blib;
use Tie::Array::Bounded;
dies_ok { tie my @array, "Tie::Array::Bounded" }
"Croak with no bound specified";
my $obj;
lives_ok { $obj = tie my @array, "Tie::Array::Bounded",
upper => 42 }
"Tied array okay";
isa_ok($obj, "Tie::Array::Bounded");
throws_ok { tie my @array, "Tie::Array::Bounded", upper => -1 }
qr/must be integer/, "Non-integral bound fails";
throws_ok { tie my @array, "Tie::Array::Bounded", frogs => 10 }
qr/Illegal arguments/, "Illegal argument fails";
throws_ok { tie my @array, "Tie::Array::Bounded",
lower => 2, upper => 1 }
qr/Upper bound < lower/, "Wrong bound order fails";
Whoopee! We're nearly there. Now we need to make the tied array behave properly, so let's start a new test file for that, called 03use.t:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 1;
use Test::Exception;
use blib;
use Tie::Array::Bounded;
my @array;
tie @array, "Tie::Array::Bounded", upper => 5;
lives_ok { $array[0] = 42 } "Store works";
As before, let's ensure that the test fails before we add the code to implement it:
% t/03use.t
1..1
Using /home/peter/perl_Medic/Tie/Array/Bounded/blib
not ok 1 - Store works
# Failed test (t/03use.t at line 13)
# died: Can't locate object method "STORE" via package
"Tie::Array::Bounded" (perhaps you forgot to load
"Tie::Array::Bounded"?) at t/03use.t line 13.
# Looks like you failed 1 tests of 1.
How about that. The test even told us what routine we need to write. perltie tells us what it should do. So let's add to Bounded.pm:
sub STORE
{
my ($self, $index, $value) = @_;
$self->_bound_check($index);
$self->{array}[$index] = $value;
}
sub _bound_check
{
my ($self, $index) = @_;
my ($upper, $lower) = @{$self}{qw(upper lower)};
croak "Index $index out of range [$lower, $upper]"
if $index < $lower || $index > $upper;
}
We've abstracted the bounds checking into a method of its own in anticipation of needing it again. Now 03use.t passes, and we can add another test to make sure that the value we stored in the array can be retrieved:
is($array[0], 42, "Fetch works");
You might think this would fail for want of the FETCH method, but in fact:
ok 1 - Store works
Can't locate object method "FETCHSIZE" via package
"Tie::Array::Bounded" (perhaps you forgot to load
"Tie::Array::Bounded"?) at t/03use.t line 14.
# Looks like you planned 2 tests but only ran 1.
# Looks like your test died just after 1.
Back to perltie to find out what FETCHSIZE is supposed to do: return the size of the array. Easy enough:
sub FETCHSIZE
{
my $self = shift;
scalar @{$self->{array}};
}
Now the test does indeed fail for want of FETCH, so we'll add that:
sub FETCH
{
my ($self, $index) = @_;
$self->_bound_check($index);
$self->{array}[$index];
}
Finally we are back in the anodyne land of complete test success. Time to add more tests:
throws_ok { $array[6] = "dog" } qr/out of range/,
"Bounds exception";
is_deeply(\@array, [ 42 ], "Array contents correct");
These work immediately. But an ugly truth emerges when we try another simple array operation:
lives_ok { push @array, 17 } "Push works";
This results in:
not ok 5 - Push works
# Failed test (t/03use.t at line 19)
# died: Can't locate object method "PUSH" via package
"Tie::Array::Bounded" (perhaps you forgot to load
"Tie::Array::Bounded"?) at t/03use.t line 19.
# Looks like you failed 1 tests of 5.
Inspecting perltie reveals that PUSH is one of several methods it looks like we're going to have to write. Do we really have to write them all? Can't we be lazier than that?
Yes, we can.[8] The Tie::Array core module defines PUSH and friends in terms of a handful of methods we have to write: FETCH, STORE, FETCHSIZE, and STORESIZE. The only one we haven't done yet is STORESIZE:
[8] Remember, if you find yourself doing something too rote or boring, look for a way to get the computer to make it easier for you. Top of the list of those ways would be finding code someone else already wrote to solve the problem.
sub STORESIZE
{
my ($self, $size) = @_;
$self->_bound_check($size-1);
$#{$self->{array}} = $size - 1;
}
We need to add near the top of Bounded.pm:
use base qw(Tie::Array);
to inherit all that array method goodness.
This is a big step to take, and if we didn't have canned tests, we might wonder what sort of unknown havoc could be wrought upon our module by a new base class if we misused it. However, our test suite allows us to determine that, in fact, nothing has broken.
Now we can add to 01load.t the methods FETCH, STORE, FETCHSIZE, and STORESIZE in the can_ok test:
Example 3.3. Final Version of 01load.t
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 2;
use blib;
BEGIN { use_ok("Tie::Array::Bounded") }
can_ok("Tie::Array::Bounded", qw(TIEARRAY STORE FETCH STORESIZE
FETCHSIZE));
Because our tests pass, let's add as many more as we can to test all the boundary conditions we can think of, leaving us with a final 03use.t file of:
Example 3.4. Final Version of 03use.t
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 15;
use Test::Exception;
use blib;
use Tie::Array::Bounded;
my $RANGE_EXCEP = qr/out of range/;
my @array;
tie @array, "Tie::Array::Bounded", upper => 5;
lives_ok { $array[0] = 42 } "Store works";
is($array[0], 42, "Fetch works");
throws_ok { $array[6] = "dog" } $RANGE_EXCEP,
"Bounds exception";
is_deeply(\@array, [ 42 ], "Array contents correct");
lives_ok { push @array, 17 } "Push works";
is($array[1], 17, "Second array element correct");
lives_ok { push @array, 2, 3 } "Push multiple elements works";
is_deeply(\@array, [ 42, 17, 2, 3 ], "Array contents correct");
lives_ok { splice(@array, 4, 0, qw(apple banana)) }
"Splice works";
is_deeply(\@array, [ 42, 17, 2, 3, 'apple', 'banana' ],
"Array contents correct");
throws_ok { push @array, "excessive" } $RANGE_EXCEP,
"Push bounds exception";
is(scalar @array, 6, "Size of array correct");
tie @array, "Tie::Array::Bounded", lower => 3, upper => 6;
throws_ok { $array[1] = "too small" } $RANGE_EXCEP,
"Lower bound check failure";
lives_ok { @array[3..6] = 3..6 } "Slice assignment works";
throws_ok { push @array, "too big" } $RANGE_EXCEP,
"Push bounds exception";
Tests are real programs, too. Because we test for the same exception repeatedly, we put its recognition pattern in a variable to be lazy.
Bounded.pm, although not exactly a model of efficiency (our internal array contains unnecessary space allocated to the first $lower elements that will never be used), is now due for documenting, and h2xs filled out some POD stubs already. We'll flesh it out to the final version you can see in the Appendix. I'll go into documentation more in Chapter 10.
Now we create 04pod.t to test that the POD is formatted correctly:
Example 3.5. Final Version of 04pod.t
#!/usr/bin/perl
use strict;
use warnings;
use Test::Pod tests => 1;
use blib;
use Tie::Array::Bounded;
pod_ok($INC{"Tie/Array/Bounded.pm"});
There's just a little trick there to allow us to run this test from any directory, since all the others can be run with the current working directory set to either the parent directory or the t directory. We load the module itself and then get Perl to tell us where it found the file by looking it up in the %INC hash, which tracks such things (see its entry in perlvar).
With a final "make test", we're done:
Files=4, Tests=24, 2 wallclock secs ( 1.58 cusr + 0.24 csys = 1.82 CPU)
We have a whole 24 tests at our fingertips ready to be repeated any time we want.
You can get more help on how to use these modules from the module Test::Tutorial, which despite the module appellation contains no code, only documentation.
With only a bit more work, this module could have been submitted to CPAN. See [TREGAR02] for full instructions.
< Day Day Up >
< Day Day Up >
3.4 Testing Legacy Code
"This is all well and good," I can hear you say, "but I just inherited a swamp of 27 programs and 14 modules and they have no tests. What do I do?"
By now you've learned that it is far more appealing to write tests as you write the code they test, so if you can possibly rewrite this application, do so. But if you're stuck with having to tweak an existing application, then adopt a top-down approach. Start by testing that the application meets its requirements . . . assuming you were given requirements or can figure out what they were. See what a successful run of the program outputs and how it may have changed its environment, then write tests that look for those effects.
3.4.1 A Simple Example
You have an inventory control program for an aquarium, and it produces output files called cetaceans.txt, crustaceans.txt, molluscs.txt, pinnipeds.txt, and so on. Capture the output files from a successful run and put them in a subdirectory called success. Then run this test:
Example 3.6. Demonstration of Testing Program Output
1 my @Success_files;
2 BEGIN {
3 @Success_files = glob "success/*.txt";
4 }
5
6 use Test::More tests => 1 + 2 * @Success_files;
7
8 is(system("aquarium"), 0, "Program succeeded");
9
10 for my $success (@Success_files)
11 {
12 (my $output = $success) =~ s#.*/##;
13
14 ok(-e $output, "$output present");
15
16 is(system("cmp $output $success > /dev/null 2>&1"),
17 0, "$output is valid");
18 }
First, we capture the names of the output files in the success subdirectory. We do that in a BEGIN block so that the number of names is available in line 6. In line 8 we run the program and check that it has a successful return code. Then for each of the required output files, in line 14 we test that it is present, and in line 16 we use the UNIX cmp utility to check that it matches the saved version. If you don't have a cmp program, you can write a Perl subroutine to perform the same test: Just read each file and compare chunks of input until finding a mismatch or hitting the ends of file.
3.4.2 Testing Web Applications
A Common Gateway Interface (CGI) program that hasn't been developed with a view toward automated testing may be a solid block of congealed code with pieces of web interface functionality sprinkled throughout it like raisins in a fruit cake. But you don't need to rip it apart to write a test for it; you can verify that it meets its requirements with an end-to-end test. All you need is a program that pretends to be a user at a web browser and checks that the response to input is correct. It doesn't matter how the CGI program is written because all the testing takes place on a different machine from the one the CGI program is stored on.
The WWW::Mechanize module by Andy Lester comes to your rescue here. It allows you to automate web site interaction by pretending to be a web browser, a function ably pulled off by Gisle Aas' LWP::UserAgent module. WWW::Mechanize goes several steps farther, however (in fact, it is a subclass of LWP::UserAgent), enabling cookie handling by default and providing methods for following hyperlinks and submitting forms easily, including transparent handling of hidden fields.[9]
[9] If you're thinking, "Hey! I could use this to write an agent that will stuff the ballot box on surveys I want to fix," forget it; it's been done before. Chris Nandor used Perl to cast thousands of votes for his choice for American League All-Star shortstop [GLOBE99]. And this was before WWW::Mechanize was even invented.
Suppose we have an application that provides a login screen. For the usual obscure reasons, the login form, login.html, contains one or more hidden fields in addition to the user-visible input fields, like this:
On successful login, the response page greets the user with "Welcome, " followed by the user's first name. We can write this test for this login function:
Example 3.7. Using WWW::Mechanize to Test a Web Application
1 #!/usr/bin/perl
2 use strict;
3 use warnings;
4
5 use WWW::Mechanize;
6 use Test::More tests => 3;
7
8 my $URL = 'http://localhost/login.html';
9 my $USERNAME = 'peter';
10 my $PASSWORD = 'secret';
11
12 my $ua = WWW::Mechanize->new;
13 ok($ua->get($URL)->is_success, "Got first page")
14 or die $ua->res->message;
15
16 $ua->set_fields(username => $USERNAME,
17 password => $PASSWORD);
18 ok($ua->submit->is_success, "Submitted form")
19 or die $ua->res->message;
20
21 like($ua->content, qr/Welcome, Peter/, "Logged in okay");
In line 12 we create a new WWW::Mechanize user agent to act as a pretend browser, and in line 13 we test to see if it was able to get the login page; the get() method returns a HTTP::Response object that has an is_success() method. If something went wrong with fetching the page the false value will be passed through the ok() function; there's no point in going further so we might as well die() (line 14). We can get at the HTTP::Response object again via the res() method of the user agent to call its message() method, which returns the text of the reason for failure.
In lines 16 and 17 we provide the form inputs by name, and in line 18 the submit() method of the user agent submits the form and reads the response, again returning an HTTP::Response object allowing us to verify success as before. Once we have a response page we check to see whether it looks like what we wanted.
Note that WWW::Mechanize can be used to test interaction with any web application, regardless of where that application is running or what it is written in.
3.4.3 What Next?
The kind of end-to-end testing we have been doing is useful and necessary; it is also a lot easier than the next step. To construct comprehensive tests for a large package, we must include unit tests; that means testing each function and method. However, unless we have descriptions of what each subroutine does, we won't know how to test them without investigative work to find out what they are supposed to do. I'll go into those kinds of techniques later.
< Day Day Up >
Test Now, Test Forever (Diagnosis)
Testing Your Patience
Here's the hard part. Creating tests while you're writing the code for the first time is far, far easier than adding them later on. I know it looks like it should be exactly the same amount of work, but the issue is motivation. When robots are invented that can create code, they won't have this problem, and the rest of us can mull over this injustice while we're collecting unemployment pay (except for the guy who invented the robot, who'll be sipping margaritas on a beach somewhere, counting his royalties and hoping that none of the other programmers recognize him).
But we humans don't like creating tests because it's not in our nature; we became programmers to exercise our creativity, but "testing" conjures up images of slack-jawed drones looking for defects in bolts passing by them on a conveyor belt.
The good news is that the Test:: modules make it easy enough to overcome this natural aversion to writing tests at the time you're developing code. The point at which you've just finished a new function is when your antitesting hormones are at their lowest ebb because you want to know whether or not it works. Instead of running a test that gets thrown away, or just staring at the code long enough to convince yourself that it must work, you can instead write a real test for it, because it may not require much more effort than typing:
is(some_func("some", "inputs"), qr/some outputs/,
"some_func works");
The bad news is that retrofitting tests onto an already complete application requires much more discipline. And if anything could be worse than that, it would be retrofitting tests onto an already complete application that you didn't write.
There's no magic bullet that'll make this problem disappear. The only course of action that'll take more time in the long run than writing tests for your inherited code is not writing them. If you've already discovered the benefits of creating automated tests while writing an application from scratch then at least you're aware of how much they can benefit you. I'll explore one way to make the test writing more palatable in the next chapter.
< Day Day Up >
3.2 Extreme Testing
This testing philosophy is best articulated by the Extreme Programming (XP) methodology, wherein it is fundamental (see [BECK00]). On the subject of testing, XP says:
Development of tests should precede development of code.
All requirements should be turned into tests.
All tests should be automated.
The software should pass all its tests at the end of every day.
All bugs should get turned into tests.
If you've not yet applied these principles to the development of a new project, you're in for a life-altering experience when you first give them an honest try. Because your development speed will take off like a termite in a lumberyard.
Perl wholeheartedly embraces this philosophy, thanks largely to the efforts in recent years of a group of people including Michael Schwern, chromatic, and others. Because of their enthusiasm and commitment to the testing process, the number of tests that are run when you build Perl from the source and type "make test" has increased from 5,000 in Perl 5.004_04 (1997) to 70,000 in the current development version of Perl 5.9.0 (2004). That's right, a 14-fold increase.
True to the Perl philosophy, these developers exercised extreme laziness in adding those thousands of tests (see sidebar). To make it easier to create tests for Perl, they created a number of modules that can in fact be used to test anything. We'll take a look at them shortly.
What is it about this technology that brings such joy to the developer's heart? It provides a safety net, that's what. Instead of perennially wondering whether you've accidentally broken some code while working on an unrelated piece, you can make certain at any time. If you want to make a radical change to some interface, you can be sure that you've fixed all the dependencies because every scenario that you care about will have been captured in a test case, and running all the tests is as simple as typing "make test". One month into creating a new system that comprised more than a dozen modules and as many programs, I had built up a test suite that ran nearly 600 tests with that one command, all by adding the tests as I created the code they tested. When I made a radical change to convert one interface from functional to object-oriented, it took only a couple of hours because the tests told me when I was done.
This technique has been around for many years, but under the label regression testing, which sounds boring to anyone who can even figure out what it means.[1] However, using that label can be your entrance ticket to respectability when trying to convince managers of large projects that you know what you're talking about.
[1] It's called regression testing because its purpose is to ensure that no change has caused any part of the program to regress back to an earlier, buggier stage of development.
What's this about laziness? Isn't that a pejorative way to describe luminaries of the Perl universe?
Actually, no; they'd take it as a compliment. Larry Wall enumerated three principal virtues of Perl programmers:
Laziness: "Hard work" sounds, well, hard. If you're faced with a mindless, repetitive task—such as running for public office—then laziness will make you balk at doing the same thing over and over again. Instead of stifling your creative spirit, you'll cultivate it by inventing a process that automates the repetitive task. If the Karate Kid had been a Perl programmer, he'd have abstracted the common factor from "wax on" and "wax off" shortly before fetching an orbital buffer. (Only to get, er, waxed, in the tournament from being out of shape. But I digress.)
Impatience: There's more than enough work to do in this business. By being impatient to get to the next thing quickly, you'll not spend unnecessary time on the task you're doing; you'll find ways to make it as efficient as possible.
Hubris: It's not good enough to be lazy and impatient if you're going to take them as an excuse to do lousy work. You need an unreasonable amount of pride in your abilities to carry you past the many causes for discouragement. If you didn't, and you thought about all the things that could go wrong with your code, you'd either never get out of bed in the morning, or just quit and take up potato farming.
So what are these magic modules that facilitate testing?
3.2.1 The Test Module
Test.pm was added in version 5.004 of Perl. By the time Perl 5.6.1 was released it was superseded by the Test::Simple module, which was published to CPAN and included in the Perl 5.8.0 core. Use Test::Simple instead.
If you inherit regression tests written to use Test.pm, it is still included in the Perl core for backward compatibility. You should be able to replace its use with Test::Simple if you want to start modernizing the tests.
3.2.2 The Test::Simple Module
When I say "simple," I mean simple. Test::Simple exports precisely one function, ok(). It takes one mandatory argument, and one optional argument. If its first argument evaluates to true, it prints "ok"; otherwise it prints "not ok". In each case it adds a number that starts at one and increases by one for each call to ok(). If a second argument is given, ok() then prints a dash and that argument, which is just a way of annotating a test.
Doesn't exactly sound like rocket science, does it? But on such a humble foundation is the entire Perl regression test suite built. The only other requirement is that we know how many tests we expected to run so we can tell if something caused them to terminate prematurely. That is done by an argument to the use statement:
use Test::Simple tests => 5;
The output from a test run therefore looks like:
1..5
ok 1 - Can make a frobnitz
ok 2 - Can fliggle the frobnitz
not ok 3 - Can grikkle the frobnitz
ok 4 - Can delete the frobnitz
ok 5 - Can't use a deleted frobnitz
Note that the first line says how many tests are expected to follow. That makes life easier for code like Test::Harness (see Section 3.2.9) that reads this output in order to summarize it.
3.2.3 The Test::More Module
You knew there couldn't be a module called Test::Simple unless there was something more complicated, right? Here it is. This is the module you'll use for virtually all your testing. It exports many useful functions aside from the same ok() as Test::Simple. Some of the most useful ones are:
is($expression, $value, $description)
Same as ok($expression eq $value, $description). So why bother? Because is() can give you better diagnostics when it fails.
like($attribute, qr/regex/, $description)
Tests whether $attribute matches the given regular expression.
is_deeply($struct1, $struct2, $description)
Tests whether data structures match. Follows references in each and prints out the first discrepancy it finds, if any. Note that it does not compare the packages that any components may be blessed into.
isa_ok($object, $class)
Tests whether an object is a member of, or inherits from, a particular class.
can_ok($object_or_class, @methods)
Tests whether an object or a class can perform each of the methods listed.
use_ok($module, @imports)
Tests whether a module can be loaded (if it contains a syntax error, for instance, this will fail). Wrap this test in a BEGIN block to ensure it is run at compile time, viz: BEGIN {use_ok("My::Module")}
There's much more. See the Test::More documentation. I won't be using any other functions in this chapter, though.
Caveat: I don't know why you might do this, but if you fork() inside the test script, don't run tests from child processes. They won't be recognized by the parent process where the test analyzer is running.
3.2.4 The Test::Exception Module
No, there's no module called Test::EvenMore.[2] But there is a module you'll have to get from CPAN that can test for whether code lives or dies: Test::Exception. It exports these handy functions:
[2] Yet. I once promised Mike Schwern a beer if he could come up with an excuse to combine the UNIVERSAL class and an export functionality into UNIVERSAL::exports as a covert tribute to James Bond. He did it. Schwern, I still owe you that beer . . . .
lives_ok()
Passes if code does not die. The first argument is the block of code, the second is an optional tag string. Note there is no comma between those arguments (this is a feature of Perl's prototyping mechanism when a code block is the first argument to a subroutine). For example:
lives_ok { risky_function() } "risky_function lives!";
dies_ok()
Passes if the code does die. Use this to check that error-checking code is operating properly. For example:
dies_ok { $] / 0 } "division by zero dies!";
throws_ok()
For when you want to check the actual text of the exception. For example:
throws_ok { some_web_function() } qr/URL not found/,
"Nonexistent page get fails";
The second argument is a regular expression that the exception thrown by the code block in the first argument is tested against. If the match succeeds, so does the test. The optional third argument is the comment tag for the test. Note that there is a comma between the second and third arguments.
3.2.5 The Test::Builder Module
Did you spot that all these modules have a lot in common? Did you wonder how you'd add a Test:: module of your own, if you wanted to write one?
Then you're already thinking lazily, and the testing guys are ahead of you. That common functionality lives in a superclass module called Test::Builder, seldom seen, but used to take the drudgery out of creating new test modules.
Suppose we want to write a module that checks whether mail messages conform to RFC 822 syntax.[3] We'll call it Test::MailMessage, and it will export a basic function, msg_ok(), that determines whether a message consists of an optional set of header lines, optionally followed by a blank line and any number of lines of text. (Yes, an empty message is legal according to this syntax. Unfortunately, too few people who have nothing to say avail themselves of this option.) Here's the module:
[3] http://www.faqs.org/rfcs/rfc822.html
Example 3.1. Using Test::Builder to Create Test::MailMessage
1 package Test::MailMessage;
2 use strict;
3 use warnings;
4 use Carp;
5 use Test::Builder;
6 use base qw(Exporter);
7 our @EXPORT = qw(msg_ok);
8
9 my $test = Test::Builder->new;
10
11 sub import
12 {
13 my $self = shift;
14 my $pack = caller;
15
16 $test->exported_to($pack);
17 $test->plan(@_);
18
19 $self->export_to_level(1, $self, 'msg_ok');
20 }
21
22 sub msg_ok
23 {
24 my $arg = shift;
25 my $tester = _new();
26 eval
27 {
28 if (defined(fileno($arg)))
29 {
30 while (<$arg>)
31 {
32 $tester->_validate($_);
33 }
34 }
35 elsif (ref $arg)
36 {
37 $tester->_validate($_) for @$arg;
38 }
39 else
40 {
41 for ($arg =~ /(.*\n)/g)
42 {
43 $tester->_validate($_);
44 }
45 }
46 };
47 $test->ok(!$@, shift);
48 $test->diag($@) if $@;
49 }
50
51 sub _new
52 {
53 return bless { expect => "header" };
54 }
55
56 sub _validate
57 {
58 my ($self, $line) = @_;
59 return if $self->{expect} eq "body";
60 if ($self->{expect} eq "header/continuation")
61 {
62 /^\s+\S/ and return;
63 }
64 $self->{expect} = "body", return if /^$/;
65 /^\S+:/ or croak "Invalid header";
66 $self->{expect} = "header/continuation";
67 }
68
69 1;
In line 1 we put this module into its own package, and in lines 2 and 3 we set warnings and strictness to help development go smoothly. In lines 4 and 5 we load the Carp module so we can call croak(), and the Test::Builder module so we can create an instance of it. In lines 6 and 7 we declare this to be a subclass of the Exporter module, exporting to the caller the subroutine msg_ok(). (Note that this is not a subclass of Test::Builder.)
In line 9 we create a Test::Builder object that will do the boring part of testing for us. Lines 11 through 20 are copied right out of the Test::Builder documentation; the import()routine is what allows us to say how many tests we're going to run when we use the module.
Lines 22 through 49 define the msg_ok() function itself. Its single argument specifies the mail message, either via a scalar containing the message, a reference to an array of lines in the message, or a filehandle from which the message can be read. Rather than read all of the lines from that filehandle into memory, we're going to operate on them one at a time because it's not necessary to have the whole message in memory. That's why we create the object $tester in line 25 to handle each line: it will contain a memory of its current state.
Then we call the _validate() method of $tester with each line of the message. Because that method will croak() if the message is in error, we wrap those loops in an eval block. This allows us easily to skip superfluous scanning of a message after detecting an error.
Finally, we see whether an error occurred; if an exception was thrown by croak()inside the eval block, $@ will contain its text; otherwise $@ will be empty. The ok() method of the Test::Builder object we created is the same function we're used to using in Test::Simple; it takes a true or false value, and an optional tag string, which we pass from our caller. If we had an exception, we pass its text to Test::Builder's diag() method, which causes it to be output as a comment during testing.
The _new() method in lines 50–53 is not called new() because it's not really a proper constructor; it's really just creating a state object, which is why we didn't bother to make it inheritable. It starts out in life expecting to see a mail header.
Lines 56–70 validate a line of a message. Because anything goes in a message body, if that's what we're expecting we have nothing to do. Otherwise, if we're expecting a header or header continuation line, then first we check for a continuation line (which starts with white space; this is how a long message header "overflows"). If we have a blank line (line 67), that separates the header from the body, so we switch to expecting body text.
Finally, we must at this point be expecting a header line, and one of those starts with non-white-space characters followed by a colon. If we don't have that, the message is bogus; but if we do, the next line could be either a header line or a continuation of the current header (or the blank line separating headers from the body).
Here's a simple test of the Test::MailMessage module:
1 #!/usr/bin/perl
2 use strict;
3 use warnings;
4
5 use lib qw(..);
6 use Test::MailMessage tests => 2;
7
8 msg_ok(<
9 from: ok
10 subject: whatever
11
12 body
13 EOM
14 msg_ok(\*DATA, "bogus");
15
16 __END__
17 bogus mail
18 message
The result of running this is:
1..2
ok 1 - okay
not ok 2 - bogus
# Failed test (./test at line 14)
# Invalid header at ./test line 14
# Looks like you failed 1 tests of 2.
Although we only used one Test:: module, we could have used others, for example:
use Test::MailMessage tests z=> 2;
use Test::Exception;
use Test::More;
Only one of the use statements for Test::Modules should give the number of tests to be run. Do not think that each use statement is supposed to number the tests run by functions of that module; instead, one use statement gives the total number of tests to be run.
brian d foy[4] used Test::Builder to create Test::Pod,[5] which is also worth covering.
[4] That's not a typo; he likes his name to be spelled, er, rendered that way, thus going one step farther than bell hooks.
[5] Now maintained by Andy Lester.
3.2.6 The Test::Pod Module
Documentation in Perl need not be entirely unstructured. The Plain Old Documentation (POD) format for storing documentation in the Perl source code (see the perlpod manual page) is a markup language and therefore it is possible to commit syntax errors. So rather than wait until your users try to look at your documentation (okay, play along with me here—imagine that you have users who want to read your documentation), and get errors from their POD viewer, you can make sure in advance that the POD is good.
Test::Pod exports a single function, pod_ok(), which checks the POD in the file named by its argument. I'll show an example of its use later in this chapter.
3.2.7 Test::Inline
If you're thinking that tests deserve to be inside the code they're testing just as much as documentation does, then you want Test::Inline. This module by Michael Schwern enables you to embed tests in code just like POD, because, in fact, it uses POD for that embedding.
3.2.8 Test::NoWarnings
Fergal Daly's Test::NoWarnings (formerly Test::Warn::None) lets you verify that your code is free of warnings. In its simplest usage, you just use the module, and increment the number of tests you're running, because Test::NoWarnings adds one more. So if your test starts:
use Test::More tests => 17;
then change it to:
use Test::NoWarnings;
use Test::More tests => 18;
and the final test will be that no warnings were generated in the running of the other tests.
3.2.9 The Test::Harness Module
Test::Harness is how you combine multiple tests. It predates every other Test:: module, and you'll find it in every version of Perl 5. Test::Harness exports a function, runtests(), which runs all the test files whose names are passed to it as arguments and summarizes their results. You won't see one line printed per test; runtests() intercepts those lines of output. Rather you'll see one line printed per test file, followed by a summary of the results of the tests in that file. Then it prints a global summary line. Here's an example of the output:
t/01load....ok
t/02tie.....ok
t/03use.....ok
t/04pod.....ok
All tests successful.
Files=4, Tests=24, 2 wallclock secs ( 1.51 cusr + 0.31 csys = 1.82 CPU)
As it runs, before printing "ok" on each line, you'll see a count of the tests being run updating in place, finally to be overwritten by "ok". If any fail, you'll see something appropriate instead of "ok".
You can use Test::Harness quite easily, for instance:
% perl -MTest::Harness -e 'runtests(glob "*.t")'
but it's seldom necessary even to do that, because a standard Perl module makefile will do it for you. I'll show you how shortly.
Test::Harness turns your regression tests into a full-fledged deliverable. Managers just love to watch the numbers whizzing around.
< Day Day Up >
Here's the hard part. Creating tests while you're writing the code for the first time is far, far easier than adding them later on. I know it looks like it should be exactly the same amount of work, but the issue is motivation. When robots are invented that can create code, they won't have this problem, and the rest of us can mull over this injustice while we're collecting unemployment pay (except for the guy who invented the robot, who'll be sipping margaritas on a beach somewhere, counting his royalties and hoping that none of the other programmers recognize him).
But we humans don't like creating tests because it's not in our nature; we became programmers to exercise our creativity, but "testing" conjures up images of slack-jawed drones looking for defects in bolts passing by them on a conveyor belt.
The good news is that the Test:: modules make it easy enough to overcome this natural aversion to writing tests at the time you're developing code. The point at which you've just finished a new function is when your antitesting hormones are at their lowest ebb because you want to know whether or not it works. Instead of running a test that gets thrown away, or just staring at the code long enough to convince yourself that it must work, you can instead write a real test for it, because it may not require much more effort than typing:
is(some_func("some", "inputs"), qr/some outputs/,
"some_func works");
The bad news is that retrofitting tests onto an already complete application requires much more discipline. And if anything could be worse than that, it would be retrofitting tests onto an already complete application that you didn't write.
There's no magic bullet that'll make this problem disappear. The only course of action that'll take more time in the long run than writing tests for your inherited code is not writing them. If you've already discovered the benefits of creating automated tests while writing an application from scratch then at least you're aware of how much they can benefit you. I'll explore one way to make the test writing more palatable in the next chapter.
< Day Day Up >
3.2 Extreme Testing
This testing philosophy is best articulated by the Extreme Programming (XP) methodology, wherein it is fundamental (see [BECK00]). On the subject of testing, XP says:
Development of tests should precede development of code.
All requirements should be turned into tests.
All tests should be automated.
The software should pass all its tests at the end of every day.
All bugs should get turned into tests.
If you've not yet applied these principles to the development of a new project, you're in for a life-altering experience when you first give them an honest try. Because your development speed will take off like a termite in a lumberyard.
Perl wholeheartedly embraces this philosophy, thanks largely to the efforts in recent years of a group of people including Michael Schwern, chromatic, and others. Because of their enthusiasm and commitment to the testing process, the number of tests that are run when you build Perl from the source and type "make test" has increased from 5,000 in Perl 5.004_04 (1997) to 70,000 in the current development version of Perl 5.9.0 (2004). That's right, a 14-fold increase.
True to the Perl philosophy, these developers exercised extreme laziness in adding those thousands of tests (see sidebar). To make it easier to create tests for Perl, they created a number of modules that can in fact be used to test anything. We'll take a look at them shortly.
What is it about this technology that brings such joy to the developer's heart? It provides a safety net, that's what. Instead of perennially wondering whether you've accidentally broken some code while working on an unrelated piece, you can make certain at any time. If you want to make a radical change to some interface, you can be sure that you've fixed all the dependencies because every scenario that you care about will have been captured in a test case, and running all the tests is as simple as typing "make test". One month into creating a new system that comprised more than a dozen modules and as many programs, I had built up a test suite that ran nearly 600 tests with that one command, all by adding the tests as I created the code they tested. When I made a radical change to convert one interface from functional to object-oriented, it took only a couple of hours because the tests told me when I was done.
This technique has been around for many years, but under the label regression testing, which sounds boring to anyone who can even figure out what it means.[1] However, using that label can be your entrance ticket to respectability when trying to convince managers of large projects that you know what you're talking about.
[1] It's called regression testing because its purpose is to ensure that no change has caused any part of the program to regress back to an earlier, buggier stage of development.
What's this about laziness? Isn't that a pejorative way to describe luminaries of the Perl universe?
Actually, no; they'd take it as a compliment. Larry Wall enumerated three principal virtues of Perl programmers:
Laziness: "Hard work" sounds, well, hard. If you're faced with a mindless, repetitive task—such as running for public office—then laziness will make you balk at doing the same thing over and over again. Instead of stifling your creative spirit, you'll cultivate it by inventing a process that automates the repetitive task. If the Karate Kid had been a Perl programmer, he'd have abstracted the common factor from "wax on" and "wax off" shortly before fetching an orbital buffer. (Only to get, er, waxed, in the tournament from being out of shape. But I digress.)
Impatience: There's more than enough work to do in this business. By being impatient to get to the next thing quickly, you'll not spend unnecessary time on the task you're doing; you'll find ways to make it as efficient as possible.
Hubris: It's not good enough to be lazy and impatient if you're going to take them as an excuse to do lousy work. You need an unreasonable amount of pride in your abilities to carry you past the many causes for discouragement. If you didn't, and you thought about all the things that could go wrong with your code, you'd either never get out of bed in the morning, or just quit and take up potato farming.
So what are these magic modules that facilitate testing?
3.2.1 The Test Module
Test.pm was added in version 5.004 of Perl. By the time Perl 5.6.1 was released it was superseded by the Test::Simple module, which was published to CPAN and included in the Perl 5.8.0 core. Use Test::Simple instead.
If you inherit regression tests written to use Test.pm, it is still included in the Perl core for backward compatibility. You should be able to replace its use with Test::Simple if you want to start modernizing the tests.
3.2.2 The Test::Simple Module
When I say "simple," I mean simple. Test::Simple exports precisely one function, ok(). It takes one mandatory argument, and one optional argument. If its first argument evaluates to true, it prints "ok"; otherwise it prints "not ok". In each case it adds a number that starts at one and increases by one for each call to ok(). If a second argument is given, ok() then prints a dash and that argument, which is just a way of annotating a test.
Doesn't exactly sound like rocket science, does it? But on such a humble foundation is the entire Perl regression test suite built. The only other requirement is that we know how many tests we expected to run so we can tell if something caused them to terminate prematurely. That is done by an argument to the use statement:
use Test::Simple tests => 5;
The output from a test run therefore looks like:
1..5
ok 1 - Can make a frobnitz
ok 2 - Can fliggle the frobnitz
not ok 3 - Can grikkle the frobnitz
ok 4 - Can delete the frobnitz
ok 5 - Can't use a deleted frobnitz
Note that the first line says how many tests are expected to follow. That makes life easier for code like Test::Harness (see Section 3.2.9) that reads this output in order to summarize it.
3.2.3 The Test::More Module
You knew there couldn't be a module called Test::Simple unless there was something more complicated, right? Here it is. This is the module you'll use for virtually all your testing. It exports many useful functions aside from the same ok() as Test::Simple. Some of the most useful ones are:
is($expression, $value, $description)
Same as ok($expression eq $value, $description). So why bother? Because is() can give you better diagnostics when it fails.
like($attribute, qr/regex/, $description)
Tests whether $attribute matches the given regular expression.
is_deeply($struct1, $struct2, $description)
Tests whether data structures match. Follows references in each and prints out the first discrepancy it finds, if any. Note that it does not compare the packages that any components may be blessed into.
isa_ok($object, $class)
Tests whether an object is a member of, or inherits from, a particular class.
can_ok($object_or_class, @methods)
Tests whether an object or a class can perform each of the methods listed.
use_ok($module, @imports)
Tests whether a module can be loaded (if it contains a syntax error, for instance, this will fail). Wrap this test in a BEGIN block to ensure it is run at compile time, viz: BEGIN {use_ok("My::Module")}
There's much more. See the Test::More documentation. I won't be using any other functions in this chapter, though.
Caveat: I don't know why you might do this, but if you fork() inside the test script, don't run tests from child processes. They won't be recognized by the parent process where the test analyzer is running.
3.2.4 The Test::Exception Module
No, there's no module called Test::EvenMore.[2] But there is a module you'll have to get from CPAN that can test for whether code lives or dies: Test::Exception. It exports these handy functions:
[2] Yet. I once promised Mike Schwern a beer if he could come up with an excuse to combine the UNIVERSAL class and an export functionality into UNIVERSAL::exports as a covert tribute to James Bond. He did it. Schwern, I still owe you that beer . . . .
lives_ok()
Passes if code does not die. The first argument is the block of code, the second is an optional tag string. Note there is no comma between those arguments (this is a feature of Perl's prototyping mechanism when a code block is the first argument to a subroutine). For example:
lives_ok { risky_function() } "risky_function lives!";
dies_ok()
Passes if the code does die. Use this to check that error-checking code is operating properly. For example:
dies_ok { $] / 0 } "division by zero dies!";
throws_ok()
For when you want to check the actual text of the exception. For example:
throws_ok { some_web_function() } qr/URL not found/,
"Nonexistent page get fails";
The second argument is a regular expression that the exception thrown by the code block in the first argument is tested against. If the match succeeds, so does the test. The optional third argument is the comment tag for the test. Note that there is a comma between the second and third arguments.
3.2.5 The Test::Builder Module
Did you spot that all these modules have a lot in common? Did you wonder how you'd add a Test:: module of your own, if you wanted to write one?
Then you're already thinking lazily, and the testing guys are ahead of you. That common functionality lives in a superclass module called Test::Builder, seldom seen, but used to take the drudgery out of creating new test modules.
Suppose we want to write a module that checks whether mail messages conform to RFC 822 syntax.[3] We'll call it Test::MailMessage, and it will export a basic function, msg_ok(), that determines whether a message consists of an optional set of header lines, optionally followed by a blank line and any number of lines of text. (Yes, an empty message is legal according to this syntax. Unfortunately, too few people who have nothing to say avail themselves of this option.) Here's the module:
[3] http://www.faqs.org/rfcs/rfc822.html
Example 3.1. Using Test::Builder to Create Test::MailMessage
1 package Test::MailMessage;
2 use strict;
3 use warnings;
4 use Carp;
5 use Test::Builder;
6 use base qw(Exporter);
7 our @EXPORT = qw(msg_ok);
8
9 my $test = Test::Builder->new;
10
11 sub import
12 {
13 my $self = shift;
14 my $pack = caller;
15
16 $test->exported_to($pack);
17 $test->plan(@_);
18
19 $self->export_to_level(1, $self, 'msg_ok');
20 }
21
22 sub msg_ok
23 {
24 my $arg = shift;
25 my $tester = _new();
26 eval
27 {
28 if (defined(fileno($arg)))
29 {
30 while (<$arg>)
31 {
32 $tester->_validate($_);
33 }
34 }
35 elsif (ref $arg)
36 {
37 $tester->_validate($_) for @$arg;
38 }
39 else
40 {
41 for ($arg =~ /(.*\n)/g)
42 {
43 $tester->_validate($_);
44 }
45 }
46 };
47 $test->ok(!$@, shift);
48 $test->diag($@) if $@;
49 }
50
51 sub _new
52 {
53 return bless { expect => "header" };
54 }
55
56 sub _validate
57 {
58 my ($self, $line) = @_;
59 return if $self->{expect} eq "body";
60 if ($self->{expect} eq "header/continuation")
61 {
62 /^\s+\S/ and return;
63 }
64 $self->{expect} = "body", return if /^$/;
65 /^\S+:/ or croak "Invalid header";
66 $self->{expect} = "header/continuation";
67 }
68
69 1;
In line 1 we put this module into its own package, and in lines 2 and 3 we set warnings and strictness to help development go smoothly. In lines 4 and 5 we load the Carp module so we can call croak(), and the Test::Builder module so we can create an instance of it. In lines 6 and 7 we declare this to be a subclass of the Exporter module, exporting to the caller the subroutine msg_ok(). (Note that this is not a subclass of Test::Builder.)
In line 9 we create a Test::Builder object that will do the boring part of testing for us. Lines 11 through 20 are copied right out of the Test::Builder documentation; the import()routine is what allows us to say how many tests we're going to run when we use the module.
Lines 22 through 49 define the msg_ok() function itself. Its single argument specifies the mail message, either via a scalar containing the message, a reference to an array of lines in the message, or a filehandle from which the message can be read. Rather than read all of the lines from that filehandle into memory, we're going to operate on them one at a time because it's not necessary to have the whole message in memory. That's why we create the object $tester in line 25 to handle each line: it will contain a memory of its current state.
Then we call the _validate() method of $tester with each line of the message. Because that method will croak() if the message is in error, we wrap those loops in an eval block. This allows us easily to skip superfluous scanning of a message after detecting an error.
Finally, we see whether an error occurred; if an exception was thrown by croak()inside the eval block, $@ will contain its text; otherwise $@ will be empty. The ok() method of the Test::Builder object we created is the same function we're used to using in Test::Simple; it takes a true or false value, and an optional tag string, which we pass from our caller. If we had an exception, we pass its text to Test::Builder's diag() method, which causes it to be output as a comment during testing.
The _new() method in lines 50–53 is not called new() because it's not really a proper constructor; it's really just creating a state object, which is why we didn't bother to make it inheritable. It starts out in life expecting to see a mail header.
Lines 56–70 validate a line of a message. Because anything goes in a message body, if that's what we're expecting we have nothing to do. Otherwise, if we're expecting a header or header continuation line, then first we check for a continuation line (which starts with white space; this is how a long message header "overflows"). If we have a blank line (line 67), that separates the header from the body, so we switch to expecting body text.
Finally, we must at this point be expecting a header line, and one of those starts with non-white-space characters followed by a colon. If we don't have that, the message is bogus; but if we do, the next line could be either a header line or a continuation of the current header (or the blank line separating headers from the body).
Here's a simple test of the Test::MailMessage module:
1 #!/usr/bin/perl
2 use strict;
3 use warnings;
4
5 use lib qw(..);
6 use Test::MailMessage tests => 2;
7
8 msg_ok(<
9 from: ok
10 subject: whatever
11
12 body
13 EOM
14 msg_ok(\*DATA, "bogus");
15
16 __END__
17 bogus mail
18 message
The result of running this is:
1..2
ok 1 - okay
not ok 2 - bogus
# Failed test (./test at line 14)
# Invalid header at ./test line 14
# Looks like you failed 1 tests of 2.
Although we only used one Test:: module, we could have used others, for example:
use Test::MailMessage tests z=> 2;
use Test::Exception;
use Test::More;
Only one of the use statements for Test::Modules should give the number of tests to be run. Do not think that each use statement is supposed to number the tests run by functions of that module; instead, one use statement gives the total number of tests to be run.
brian d foy[4] used Test::Builder to create Test::Pod,[5] which is also worth covering.
[4] That's not a typo; he likes his name to be spelled, er, rendered that way, thus going one step farther than bell hooks.
[5] Now maintained by Andy Lester.
3.2.6 The Test::Pod Module
Documentation in Perl need not be entirely unstructured. The Plain Old Documentation (POD) format for storing documentation in the Perl source code (see the perlpod manual page) is a markup language and therefore it is possible to commit syntax errors. So rather than wait until your users try to look at your documentation (okay, play along with me here—imagine that you have users who want to read your documentation), and get errors from their POD viewer, you can make sure in advance that the POD is good.
Test::Pod exports a single function, pod_ok(), which checks the POD in the file named by its argument. I'll show an example of its use later in this chapter.
3.2.7 Test::Inline
If you're thinking that tests deserve to be inside the code they're testing just as much as documentation does, then you want Test::Inline. This module by Michael Schwern enables you to embed tests in code just like POD, because, in fact, it uses POD for that embedding.
3.2.8 Test::NoWarnings
Fergal Daly's Test::NoWarnings (formerly Test::Warn::None) lets you verify that your code is free of warnings. In its simplest usage, you just use the module, and increment the number of tests you're running, because Test::NoWarnings adds one more. So if your test starts:
use Test::More tests => 17;
then change it to:
use Test::NoWarnings;
use Test::More tests => 18;
and the final test will be that no warnings were generated in the running of the other tests.
3.2.9 The Test::Harness Module
Test::Harness is how you combine multiple tests. It predates every other Test:: module, and you'll find it in every version of Perl 5. Test::Harness exports a function, runtests(), which runs all the test files whose names are passed to it as arguments and summarizes their results. You won't see one line printed per test; runtests() intercepts those lines of output. Rather you'll see one line printed per test file, followed by a summary of the results of the tests in that file. Then it prints a global summary line. Here's an example of the output:
t/01load....ok
t/02tie.....ok
t/03use.....ok
t/04pod.....ok
All tests successful.
Files=4, Tests=24, 2 wallclock secs ( 1.51 cusr + 0.31 csys = 1.82 CPU)
As it runs, before printing "ok" on each line, you'll see a count of the tests being run updating in place, finally to be overwritten by "ok". If any fail, you'll see something appropriate instead of "ok".
You can use Test::Harness quite easily, for instance:
% perl -MTest::Harness -e 'runtests(glob "*.t")'
but it's seldom necessary even to do that, because a standard Perl module makefile will do it for you. I'll show you how shortly.
Test::Harness turns your regression tests into a full-fledged deliverable. Managers just love to watch the numbers whizzing around.
< Day Day Up >
Surveying the Scene
Versions
It's important to know as soon as possible what version of Perl the program was developed for. This isn't necessarily the same as the version of Perl it may currently be running against, but find that out anyway so you have an upper bound. Again, get this information from the gold source: the original running environment. Type the complete path to perl that appears in the main program's shebang line (see below) followed by the -v argument to find out the version; for example:
% /opt/bin/perl -v
This is perl, v5.6.1 built for i386-linux
Copyright 1987-2003, Larry Wall
If this output indicates that they're running on a newer perl than the one you have (run the same command on your perl), do whatever you can to upgrade. Although upgrading may be unnecessary, if you have any difficulties getting the code to work, your energy for debugging will be sapped by the nagging fear that the problem is due to a version incompatibility.
One reason upgrading may be unnecessary is that the operating group upgraded their perl after the original program was written, and the program did not trigger any of the forward incompatibilities. A program written for Perl 4 could easily work identically under Perl 5.8.3 and probably would; the Perl developers went to fanatical lengths to preserve backward compatibility across upgrades.
Look at the dates of last modifications of the source files. You may need to visit the original operational system to be able to determine them. Although the dates may be more recent than any significant code changes (due to commenting, or insignificant changes to constants), the earlier the dates are, the more they can bound for you the most recent version of Perl the program was developed for. See the version history in Chapter 7 to find out how to determine that version.
Part or Whole?
Are you in fact taking over a complete program or a module used by other programs (or both)? Let's see how we can find out.
2.2.1 Shebang-a-Lang-a-Ding-Dong
You can recognize a Perl program file by the fact that the first two characters are:
#!
and somewhere on the rest of that line the word "perl" appears.[1] Developers call this the shebang line. It is possible to create a Perl program without this line by requiring it to be run explicitly with a command something like
[1] A Perl program on Windows could get away without this line, because the .pl suffix is sufficient to identify it as a Perl program, but it is good practice to leave the line in on Windows anyway. (The path to Perl isn't important in that case.)
% perl program_file_name
although it would be strange to receive a main program in this state. Don't depend on the filename ending with an extension like .pl or .plx; this is not necessary on many systems. A .pl extension is commonplace on Windows systems, where the file extension is required to tell the operating system the type of the file; aside from that .pl extensions were often used as a convention for "Perl Library": files containing specialized subroutines. These mostly precede the introduction in Perl 5 of objects, which provide a better paradigm for code reuse.
One time when the extension of a file is guaranteed, however, is for a Perl module; if the filename ends in .pm, then it's a module, intended to be used by another file that's the actual program.
Caveat: Sometimes a .pm file may begin with a shebang; this almost certainly means that someone created a module that contains its own tests so that executing the module as a program also works. If you see a .pm like this, try running it to see what happens. A file that's not a .pm can't be a module, but could still be a dependency rather than the main program. If it begins with a shebang, it could be a library of subroutine or constant definitions that's been endowed with self-testing capabilities. It may not be possible to tell the difference between this type of file and the main program without careful inspection or actual execution if the developer did not comment the file clearly.
It is quite possible that you will have to change the shebang line to refer to a different perl. The previous owners of the program may have located their perl somewhere other than where the one you plan to use is. If the code consists of a lot of files containing that path, here's how you can change them all at once, assuming that /their/path/to/perl is on the original shebang line and /your/path/to/perl is the location of your perl:
% perl -pi.bak -e \
's#/their/path/to/perl#/your/path/to/perl#g' *
This command puts the original version of each file—before any changes were made—in a file of the same name but with .bak appended to it. If you've been using a revision control system to store the files in, you don't need to make copies like that. (I told you that would turn out to be a good decision.) Leaving out the .bak:
% perl -pi -e 's#/their/path/to/perl#/your/path/to/perl#g' *
results in in-place editing; that is, the original files are overwritten with the new contents.
This command assumes that all the files to be changed are in the current directory. If they are contained in multiple subdirectories, you can combine this with the find command like this:
% find . -type f -print | xargs perl -pi -e \
's#/their/path/to/perl#/your/path/to/perl#g'
Of course, you can use this command for globally changing other strings besides the path to perl, and you might have frequent occasion to do so. Put the command in an alias, like so:
% alias gchange "find . -type f -print | xargs \
perl -pi.bak -e '\!1'"
the syntax of which may vary depending on which shell you are using. Then you can invoke it thusly:
% gchange s,/their/path/to/perl,/your/path/to/perl,g
Note that I changed the substitution delimiter from a # to a ,: A shell might take the # as a comment-introducing character. Because you might be using this alias to change pieces of code containing characters like $ and ! that can have special meaning to your shell, learn about how your shell does quoting and character escaping so you'll be prepared to handle those situations.
Note also that I put the .bak back. Because otherwise one day, you'll forget to check the files into your version control system first, because the alias isn't called something like gchange_with_no_backups.
If you want to develop this concept further, consider turning the alias into a script that checks each file in before altering it.
2.2.2 .ph Files
You may encounter a .ph file. This is a Perl version of a C header (.h) file, generated by the h2ph program that comes with perl. The odds are that you can eliminate the need for this file in rewriting the program. These .ph files have not been commonly used since Perl 4 because the dynamic module loading capability introduced in Perl 5 made it possible, and desirable, for modules to incorporate any header knowledge they required. A private .ph file is probably either a copy of what h2ph would have produced from a system header (but the author lacked the permission to install it in perl's library), or a modified version of the same. Read the .ph file to see what capability it is providing and then research modules that perform the same function.
Find the Dependencies
Look for documentation that describes everything that needs to exist for this program to work. Complex systems could have dependencies on code written in other languages, on data files produced by other systems, or on network connections with external services. If you can find interface agreements or other documents that describe these dependencies they will make the job of code analysis much easier. Otherwise you will be reduced to a trial-and-error process of copying over the main program and repeatedly running it and identifying missing dependencies until it appears to work.
A common type of dependency is a custom Perl module. Quite possibly the program uses some modules that should have been delivered to you but weren't. Get a list of modules that the program uses in operation and compare it with what you were given and what is in the Perl core. Again, this is easier to do with the currently operating version of the program. First try the simple approach of searching for lines beginning with "use " or "require ". On UNIX, you can use egrep:
% egrep '^(use|require) ' files...
Remember to search also all the modules that are part of the code you were given. Let's say that I did that and the output was:
use strict;
use warnings;
use lib qw(/opt/lib/perl);
use WWW::Mechanize;
Can I be certain I've found all the modules the program loads? No. For one thing, there's no law that use and require have to be at the beginning of a line; in fact I commonly have require statements embedded in do blocks in conditionals, for instance.
The other reason this search can't be foolproof is that Perl programs are capable of loading modules dynamically based on conditions that are unknown until run time. Although there is no completely foolproof way of finding out all the modules the program might use, a pretty close way is to add this code to the program:
END {
print "Directories searched:\n\t",
join ("\n\t" => @INC),
"\nModules loaded:\n\t",
join ("\n\t" => sort values %INC),
"\n";
}
Then run the program. You'll get output looking something like this:
Directories searched:
/opt/lib/perl
/usr/lib/perl5/5.6.1/i386-linux
/usr/lib/perl5/5.6.1
/usr/lib/perl5/site_perl/5.6.1/i386-linux
/usr/lib/perl5/site_perl/5.6.1
/usr/lib/perl5/site_perl/5.6.0
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.6.1/i386-linux
/usr/lib/perl5/vendor_perl/5.6.1
/usr/lib/perl5/vendor_perl
.
Modules loaded:
/usr/lib/perl5/5.6.1/AutoLoader.pm
/usr/lib/perl5/5.6.1/Carp.pm
/usr/lib/perl5/5.6.1/Exporter.pm
/usr/lib/perl5/5.6.1/Exporter/Heavy.pm
/usr/lib/perl5/5.6.1/Time/Local.pm
/usr/lib/perl5/5.6.1/i386-linux/Config.pm
/usr/lib/perl5/5.6.1/i386-linux/DynaLoader.pm
/usr/lib/perl5/5.6.1/lib.pm
/usr/lib/perl5/5.6.1/overload.pm
/usr/lib/perl5/5.6.1/strict.pm
/usr/lib/perl5/5.6.1/vars.pm
/usr/lib/perl5/5.6.1/warnings.pm
/usr/lib/perl5/5.6.1/warnings/register.pm
/usr/lib/perl5/site_perl/5.6.1/HTML/Form.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Date.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Headers.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Message.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Request.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Response.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Status.pm
/usr/lib/perl5/site_perl/5.6.1/LWP.pm
/usr/lib/perl5/site_perl/5.6.1/LWP/Debug.pm
/usr/lib/perl5/site_perl/5.6.1/LWP/MemberMixin.pm
/usr/lib/perl5/site_perl/5.6.1/LWP/Protocol.pm
/usr/lib/perl5/site_perl/5.6.1/LWP/UserAgent.pm
/usr/lib/perl5/site_perl/5.6.1/URI.pm
/usr/lib/perl5/site_perl/5.6.1/URI/Escape.pm
/usr/lib/perl5/site_perl/5.6.1/URI/URL.pm
/usr/lib/perl5/site_perl/5.6.1/URI/WithBase.pm
/opt/lib/perl/WWW/Mechanize.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/Clone.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/Entities.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/Parser.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/PullParser.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/TokeParser.pm
That doesn't mean that the user code loaded 34 modules; in fact, it loaded 3, one of which (WWW::Mechanize) loaded the rest, mostly via other modules that in turn loaded other modules that—well, you get the picture. Now you want to verify that the program isn't somehow loading modules that your egrep command didn't find; so create a program containing just the results of the egrep command and add the END block, like so:
use strict;
use warnings;
use lib qw(/opt/lib/perl);
use WWW::Mechanize;
END {
print "Directories searched:\n\t",
join ("\n\t" => @INC),
"\nModules loaded:\n\t",
join ("\n\t" => sort values %INC),
"\n";
}
Run it. If the output is identical to what you got when you added the END block to the entire program, then egrep almost certainly found all the dependencies. If it isn't, you'll have to dig deeper.
Even if the outputs match, it's conceivable, although unlikely, that you haven't found all the dependencies. Why? Just because one set of modules was loaded by the program the time you ran it with your reporting code doesn't mean it couldn't load another set some other time. You can't be certain the code isn't doing that until you've inspected every eval and require statement in it. For instance, DBI (the DataBase Independent module) decides which DBD driver module it needs depending on part of a string passed to its connect() method. Fortunately, code that complicated is rare.
Now check that the system you need to port the program to contains all the required modules. Take the list output by egrep and prefix each module with -M in a one-liner like so:
% perl -Mstrict -Mwarnings -MWWW::Mechanize -e 0
This runs a trivial program (0) after loading the required modules. If the modules loaded okay, you won't see any errors. If one or more modules don't exist on this system, you'll see a message starting, "Can't locate module.pm in @INC . . . "
That's quite likely what will happen with the preceding one-liner, and the reason is the use lib statement in the source. Like warnings and strict, lib is a pragma, meaning that it's a module that affects the behavior of the Perl compiler. In this case it was used to add the directory /opt/lib/perl to @INC, the list of directories perl searches for modules in. Seeing that in a program you need to port indicates that it uses modules that are not part of the Perl core. It could mean, as it did here, that it is pointing perl toward a non-core Perl module (WWW::Mechanize) that is nevertheless maintained by someone else and downloaded from CPAN. Or it could indicate the location of private modules that were written by the developers of the program you are porting. Find out which case applies: Look on CPAN for any missing modules. The easiest way to do this is to go to http://search.cpan.org/ and enter the name of each missing module, telling the search engine to search in "modules".[2]
[2] Unless you're faced with a huge list to check, in which case you can script searches using the CPAN.pm module's expand method.
So if we want to write a one-liner that searches the same module directories as the original code, we would have to use Perl's -I flag:
% perl -Mstrict -Mwarnings -I/opt/lib/perl -MWWW::Mechanize \
-e 0
However, in the new environment you're porting the program to, there may not be a /opt/lib/perl; there may be another location you should install third-party modules to. If possible, install CPAN modules where CPAN.pm wants to put them; that is, in the @INC site-specific directory. (Local business policies might prevent this, in which case you put them where local policy specifies and insert use lib statements pointing to that location in your programs.)
If you find a missing module on CPAN, see if you can download the same version that is used by the currently operational program—not necessarily the latest version. Remember, you want first of all to re-create the original environment as closely as possible to minimize the number of places you'll have to look for bugs if it doesn't work. Again, if you're dealing with a relatively small, unprepossessing program, this level of caution may not be worth the trouble and you will usually spend less time overall if you just run it against the latest version of everything it needs.
To find out what version of a module (Foo::Bar, say) the original program uses, run this command on the operational system:
% perl -MFoo::Bar -le 'print $Foo::Bar::VERSION'
0.33
Old or poorly written modules may not define a $VERSION package variable, leaving you to decide just how much effort you want to put into finding exactly the same historical version, because you'll have to compare the actual source code texts (unless you have the source your module was installed from and the version number is embedded in the directory name). Don't try getting multiple versions of the same module to coexist in the same perl installation unless you're desperate; this takes considerable expertise.
You can find tools for reporting dependencies in programs and modules in Tom Christiansen's pmtools distribution (http://language.perl.com/misc/pmtools-1.00.tar.gz).
2.3.1 Gobbledygook
What if you look at a program and it really makes no sense at all? No indentation, meaningless variable names, line breaks in bizarre places, little or no white space? You're looking at a deliberately obfuscated program, likely one that was created by running a more intelligible program through an obfuscator.[3]
[3] Granted, some programs written by humans can appear obfuscated even when there was no intention that they appear that way. See Section 1.5.
Clearly, you'd prefer to have the more intelligible version. That's the one the developer used; what you've got is something they delivered in an attempt to provide functionality while making it difficult for the customer to make modifications or understand the code. You're now the developer, so you're entitled to the original source code; find it. If it's been lost, don't despair; much of the work of reconstructing a usable version of the program can be done by a beautifier, discussed in Section 4.5. A tool specifically designed for helping you in this situation is Joshua ben Jore's module B::Deobfuscate (http://search.cpan.org/dist/B-Deobfuscate/).
It's important to know as soon as possible what version of Perl the program was developed for. This isn't necessarily the same as the version of Perl it may currently be running against, but find that out anyway so you have an upper bound. Again, get this information from the gold source: the original running environment. Type the complete path to perl that appears in the main program's shebang line (see below) followed by the -v argument to find out the version; for example:
% /opt/bin/perl -v
This is perl, v5.6.1 built for i386-linux
Copyright 1987-2003, Larry Wall
If this output indicates that they're running on a newer perl than the one you have (run the same command on your perl), do whatever you can to upgrade. Although upgrading may be unnecessary, if you have any difficulties getting the code to work, your energy for debugging will be sapped by the nagging fear that the problem is due to a version incompatibility.
One reason upgrading may be unnecessary is that the operating group upgraded their perl after the original program was written, and the program did not trigger any of the forward incompatibilities. A program written for Perl 4 could easily work identically under Perl 5.8.3 and probably would; the Perl developers went to fanatical lengths to preserve backward compatibility across upgrades.
Look at the dates of last modifications of the source files. You may need to visit the original operational system to be able to determine them. Although the dates may be more recent than any significant code changes (due to commenting, or insignificant changes to constants), the earlier the dates are, the more they can bound for you the most recent version of Perl the program was developed for. See the version history in Chapter 7 to find out how to determine that version.
Part or Whole?
Are you in fact taking over a complete program or a module used by other programs (or both)? Let's see how we can find out.
2.2.1 Shebang-a-Lang-a-Ding-Dong
You can recognize a Perl program file by the fact that the first two characters are:
#!
and somewhere on the rest of that line the word "perl" appears.[1] Developers call this the shebang line. It is possible to create a Perl program without this line by requiring it to be run explicitly with a command something like
[1] A Perl program on Windows could get away without this line, because the .pl suffix is sufficient to identify it as a Perl program, but it is good practice to leave the line in on Windows anyway. (The path to Perl isn't important in that case.)
% perl program_file_name
although it would be strange to receive a main program in this state. Don't depend on the filename ending with an extension like .pl or .plx; this is not necessary on many systems. A .pl extension is commonplace on Windows systems, where the file extension is required to tell the operating system the type of the file; aside from that .pl extensions were often used as a convention for "Perl Library": files containing specialized subroutines. These mostly precede the introduction in Perl 5 of objects, which provide a better paradigm for code reuse.
One time when the extension of a file is guaranteed, however, is for a Perl module; if the filename ends in .pm, then it's a module, intended to be used by another file that's the actual program.
Caveat: Sometimes a .pm file may begin with a shebang; this almost certainly means that someone created a module that contains its own tests so that executing the module as a program also works. If you see a .pm like this, try running it to see what happens. A file that's not a .pm can't be a module, but could still be a dependency rather than the main program. If it begins with a shebang, it could be a library of subroutine or constant definitions that's been endowed with self-testing capabilities. It may not be possible to tell the difference between this type of file and the main program without careful inspection or actual execution if the developer did not comment the file clearly.
It is quite possible that you will have to change the shebang line to refer to a different perl. The previous owners of the program may have located their perl somewhere other than where the one you plan to use is. If the code consists of a lot of files containing that path, here's how you can change them all at once, assuming that /their/path/to/perl is on the original shebang line and /your/path/to/perl is the location of your perl:
% perl -pi.bak -e \
's#/their/path/to/perl#/your/path/to/perl#g' *
This command puts the original version of each file—before any changes were made—in a file of the same name but with .bak appended to it. If you've been using a revision control system to store the files in, you don't need to make copies like that. (I told you that would turn out to be a good decision.) Leaving out the .bak:
% perl -pi -e 's#/their/path/to/perl#/your/path/to/perl#g' *
results in in-place editing; that is, the original files are overwritten with the new contents.
This command assumes that all the files to be changed are in the current directory. If they are contained in multiple subdirectories, you can combine this with the find command like this:
% find . -type f -print | xargs perl -pi -e \
's#/their/path/to/perl#/your/path/to/perl#g'
Of course, you can use this command for globally changing other strings besides the path to perl, and you might have frequent occasion to do so. Put the command in an alias, like so:
% alias gchange "find . -type f -print | xargs \
perl -pi.bak -e '\!1'"
the syntax of which may vary depending on which shell you are using. Then you can invoke it thusly:
% gchange s,/their/path/to/perl,/your/path/to/perl,g
Note that I changed the substitution delimiter from a # to a ,: A shell might take the # as a comment-introducing character. Because you might be using this alias to change pieces of code containing characters like $ and ! that can have special meaning to your shell, learn about how your shell does quoting and character escaping so you'll be prepared to handle those situations.
Note also that I put the .bak back. Because otherwise one day, you'll forget to check the files into your version control system first, because the alias isn't called something like gchange_with_no_backups.
If you want to develop this concept further, consider turning the alias into a script that checks each file in before altering it.
2.2.2 .ph Files
You may encounter a .ph file. This is a Perl version of a C header (.h) file, generated by the h2ph program that comes with perl. The odds are that you can eliminate the need for this file in rewriting the program. These .ph files have not been commonly used since Perl 4 because the dynamic module loading capability introduced in Perl 5 made it possible, and desirable, for modules to incorporate any header knowledge they required. A private .ph file is probably either a copy of what h2ph would have produced from a system header (but the author lacked the permission to install it in perl's library), or a modified version of the same. Read the .ph file to see what capability it is providing and then research modules that perform the same function.
Find the Dependencies
Look for documentation that describes everything that needs to exist for this program to work. Complex systems could have dependencies on code written in other languages, on data files produced by other systems, or on network connections with external services. If you can find interface agreements or other documents that describe these dependencies they will make the job of code analysis much easier. Otherwise you will be reduced to a trial-and-error process of copying over the main program and repeatedly running it and identifying missing dependencies until it appears to work.
A common type of dependency is a custom Perl module. Quite possibly the program uses some modules that should have been delivered to you but weren't. Get a list of modules that the program uses in operation and compare it with what you were given and what is in the Perl core. Again, this is easier to do with the currently operating version of the program. First try the simple approach of searching for lines beginning with "use " or "require ". On UNIX, you can use egrep:
% egrep '^(use|require) ' files...
Remember to search also all the modules that are part of the code you were given. Let's say that I did that and the output was:
use strict;
use warnings;
use lib qw(/opt/lib/perl);
use WWW::Mechanize;
Can I be certain I've found all the modules the program loads? No. For one thing, there's no law that use and require have to be at the beginning of a line; in fact I commonly have require statements embedded in do blocks in conditionals, for instance.
The other reason this search can't be foolproof is that Perl programs are capable of loading modules dynamically based on conditions that are unknown until run time. Although there is no completely foolproof way of finding out all the modules the program might use, a pretty close way is to add this code to the program:
END {
print "Directories searched:\n\t",
join ("\n\t" => @INC),
"\nModules loaded:\n\t",
join ("\n\t" => sort values %INC),
"\n";
}
Then run the program. You'll get output looking something like this:
Directories searched:
/opt/lib/perl
/usr/lib/perl5/5.6.1/i386-linux
/usr/lib/perl5/5.6.1
/usr/lib/perl5/site_perl/5.6.1/i386-linux
/usr/lib/perl5/site_perl/5.6.1
/usr/lib/perl5/site_perl/5.6.0
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.6.1/i386-linux
/usr/lib/perl5/vendor_perl/5.6.1
/usr/lib/perl5/vendor_perl
.
Modules loaded:
/usr/lib/perl5/5.6.1/AutoLoader.pm
/usr/lib/perl5/5.6.1/Carp.pm
/usr/lib/perl5/5.6.1/Exporter.pm
/usr/lib/perl5/5.6.1/Exporter/Heavy.pm
/usr/lib/perl5/5.6.1/Time/Local.pm
/usr/lib/perl5/5.6.1/i386-linux/Config.pm
/usr/lib/perl5/5.6.1/i386-linux/DynaLoader.pm
/usr/lib/perl5/5.6.1/lib.pm
/usr/lib/perl5/5.6.1/overload.pm
/usr/lib/perl5/5.6.1/strict.pm
/usr/lib/perl5/5.6.1/vars.pm
/usr/lib/perl5/5.6.1/warnings.pm
/usr/lib/perl5/5.6.1/warnings/register.pm
/usr/lib/perl5/site_perl/5.6.1/HTML/Form.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Date.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Headers.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Message.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Request.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Response.pm
/usr/lib/perl5/site_perl/5.6.1/HTTP/Status.pm
/usr/lib/perl5/site_perl/5.6.1/LWP.pm
/usr/lib/perl5/site_perl/5.6.1/LWP/Debug.pm
/usr/lib/perl5/site_perl/5.6.1/LWP/MemberMixin.pm
/usr/lib/perl5/site_perl/5.6.1/LWP/Protocol.pm
/usr/lib/perl5/site_perl/5.6.1/LWP/UserAgent.pm
/usr/lib/perl5/site_perl/5.6.1/URI.pm
/usr/lib/perl5/site_perl/5.6.1/URI/Escape.pm
/usr/lib/perl5/site_perl/5.6.1/URI/URL.pm
/usr/lib/perl5/site_perl/5.6.1/URI/WithBase.pm
/opt/lib/perl/WWW/Mechanize.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/Clone.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/Entities.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/Parser.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/PullParser.pm
/usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/TokeParser.pm
That doesn't mean that the user code loaded 34 modules; in fact, it loaded 3, one of which (WWW::Mechanize) loaded the rest, mostly via other modules that in turn loaded other modules that—well, you get the picture. Now you want to verify that the program isn't somehow loading modules that your egrep command didn't find; so create a program containing just the results of the egrep command and add the END block, like so:
use strict;
use warnings;
use lib qw(/opt/lib/perl);
use WWW::Mechanize;
END {
print "Directories searched:\n\t",
join ("\n\t" => @INC),
"\nModules loaded:\n\t",
join ("\n\t" => sort values %INC),
"\n";
}
Run it. If the output is identical to what you got when you added the END block to the entire program, then egrep almost certainly found all the dependencies. If it isn't, you'll have to dig deeper.
Even if the outputs match, it's conceivable, although unlikely, that you haven't found all the dependencies. Why? Just because one set of modules was loaded by the program the time you ran it with your reporting code doesn't mean it couldn't load another set some other time. You can't be certain the code isn't doing that until you've inspected every eval and require statement in it. For instance, DBI (the DataBase Independent module) decides which DBD driver module it needs depending on part of a string passed to its connect() method. Fortunately, code that complicated is rare.
Now check that the system you need to port the program to contains all the required modules. Take the list output by egrep and prefix each module with -M in a one-liner like so:
% perl -Mstrict -Mwarnings -MWWW::Mechanize -e 0
This runs a trivial program (0) after loading the required modules. If the modules loaded okay, you won't see any errors. If one or more modules don't exist on this system, you'll see a message starting, "Can't locate module.pm in @INC . . . "
That's quite likely what will happen with the preceding one-liner, and the reason is the use lib statement in the source. Like warnings and strict, lib is a pragma, meaning that it's a module that affects the behavior of the Perl compiler. In this case it was used to add the directory /opt/lib/perl to @INC, the list of directories perl searches for modules in. Seeing that in a program you need to port indicates that it uses modules that are not part of the Perl core. It could mean, as it did here, that it is pointing perl toward a non-core Perl module (WWW::Mechanize) that is nevertheless maintained by someone else and downloaded from CPAN. Or it could indicate the location of private modules that were written by the developers of the program you are porting. Find out which case applies: Look on CPAN for any missing modules. The easiest way to do this is to go to http://search.cpan.org/ and enter the name of each missing module, telling the search engine to search in "modules".[2]
[2] Unless you're faced with a huge list to check, in which case you can script searches using the CPAN.pm module's expand method.
So if we want to write a one-liner that searches the same module directories as the original code, we would have to use Perl's -I flag:
% perl -Mstrict -Mwarnings -I/opt/lib/perl -MWWW::Mechanize \
-e 0
However, in the new environment you're porting the program to, there may not be a /opt/lib/perl; there may be another location you should install third-party modules to. If possible, install CPAN modules where CPAN.pm wants to put them; that is, in the @INC site-specific directory. (Local business policies might prevent this, in which case you put them where local policy specifies and insert use lib statements pointing to that location in your programs.)
If you find a missing module on CPAN, see if you can download the same version that is used by the currently operational program—not necessarily the latest version. Remember, you want first of all to re-create the original environment as closely as possible to minimize the number of places you'll have to look for bugs if it doesn't work. Again, if you're dealing with a relatively small, unprepossessing program, this level of caution may not be worth the trouble and you will usually spend less time overall if you just run it against the latest version of everything it needs.
To find out what version of a module (Foo::Bar, say) the original program uses, run this command on the operational system:
% perl -MFoo::Bar -le 'print $Foo::Bar::VERSION'
0.33
Old or poorly written modules may not define a $VERSION package variable, leaving you to decide just how much effort you want to put into finding exactly the same historical version, because you'll have to compare the actual source code texts (unless you have the source your module was installed from and the version number is embedded in the directory name). Don't try getting multiple versions of the same module to coexist in the same perl installation unless you're desperate; this takes considerable expertise.
You can find tools for reporting dependencies in programs and modules in Tom Christiansen's pmtools distribution (http://language.perl.com/misc/pmtools-1.00.tar.gz).
2.3.1 Gobbledygook
What if you look at a program and it really makes no sense at all? No indentation, meaningless variable names, line breaks in bizarre places, little or no white space? You're looking at a deliberately obfuscated program, likely one that was created by running a more intelligible program through an obfuscator.[3]
[3] Granted, some programs written by humans can appear obfuscated even when there was no intention that they appear that way. See Section 1.5.
Clearly, you'd prefer to have the more intelligible version. That's the one the developer used; what you've got is something they delivered in an attempt to provide functionality while making it difficult for the customer to make modifications or understand the code. You're now the developer, so you're entitled to the original source code; find it. If it's been lost, don't despair; much of the work of reconstructing a usable version of the program can be done by a beautifier, discussed in Section 4.5. A tool specifically designed for helping you in this situation is Joshua ben Jore's module B::Deobfuscate (http://search.cpan.org/dist/B-Deobfuscate/).
Subscribe to:
Posts (Atom)