Monday, December 10, 2007

Comments

4.4 Comments
Comments are to your program as the Three Bears' furniture was to Goldilocks; too few aren't good enough, and more are not better. Ideally, code should not need commenting; for example, instead of saying:


# If employee was hired more than a week ago:

# get today's week number, convert the employee's

# hire date using the same format, take difference.

use POSIX qw(strftime);

use Date::Parse;

chomp(my $t1 = 'date +%U');

my @t2 = strptime($employee->hired);

if (strftime("%U", @t2) - $t1 > 1)


say:


if (time - $employee->hired > $ONE_WEEK)


by storing times in UNIX seconds-since-epoch format and defining $ONE_WEEK earlier as 60 * 60 * 24 * 7. This is not only easier to read but lacks the bug in the previous code. That tortured tangle of pendulous programming may look absurd, but I have often seen worse. A developer is used to getting a date from the date command in shell scripts, does the same in a Perl program, stores the result in the same format, and it goes downhill from there.

Sometimes you can't make the code clear enough. When you add a comment the word to keep in mind is, Why? Don't just recapitulate what the code does, but say why you're doing it. This doesn't need to be a verbose justification, but can be as simple as this:


if (time - $employee->hired > $ONE_WEEK) # Past probation?


Although then you have to ask why it wasn't written as:


if (time - $employee->hired > $PROBATION_PERIOD)


So a better example would be something like:


if (my ($name) = />(.*?)

which indicates some hard-won piece of knowledge.

4.4.1 Sentinel Comments
Sometimes you need to put a reminder in your code to change something later. Perhaps you're in the middle of fixing one bug when you discover another. By putting a distinctive marker in a comment, you have something you can search for later. I use the string "XXX" because gvim highlights it especially for this purpose.[5] (I suppose it also makes the program fail to pass through naive adult content filters.)

[5] Even though I use Emacs most of the time; it helps to hedge your bets in the perennial "Favorite Editor War."

4.4.2 Block Comments
People often ask how they can comment out a large piece of code. Wrapping it in an if (0) {...} block prevents it from being run, but still requires that it be syntactically correct.

Instead, wrap the code in Plain Old Documentation (POD) directives so it's treated as documentation.[6] There are a number of ways of doing this; here I've used =begin:

[6] See Section 10.3.1.


=begin comment



# Here goes the code we want

# to be ignored by perl



=end comment



=cut


This will still be interpreted as POD, though. However, in this case, POD formatters should ignore the block because none of them know what a block labeled "comment" is. I needed the =end directive to keep the POD valid (=begin directives should be matched), and I needed the =cut directive to end POD processing so perl could resume compiling the following code. Remember that POD directives need blank lines following them because they are paragraph-based. I'll go into what POD can do for your program in more detail in Chapter 10. For a more creative solution to the multiline comment problem, see http://www.perl.com/pub/a/2002/08/13/comment.html.




4.5 Restyling
What should you do when you're presented with an existing program whose author didn't follow your favorite style guideline and whose layout looks like a Jackson Pollock painting?[7]

[7] Make than a Pollock imitation. Uri Guttman referred me to an article showing how genuine Pollocks exhibit fractal ordering: [TAYLOR02].

Several attempts have been made at writing beautifiers, but the clear winner as of publication time is perltidy,[8] by Steve Hancock. You can also download perltidy from CPAN. Use Perl::Tidy as the module name to install.

[8] http://perltidy.sourceforge.net/

perltidy writes beautified versions of its input files in files of the same name with .tdy appended. It has too many options to document exhaustively here, but they are well covered in its manual page. Its default options format according to the perlstyle recommendations. My personal style is approximately equivalent to:


perltidy -gnu -bbt=1 -i=2 -nsfs -vt=1 -vtc=1 -nbbc


As good as it is, perltidy can't make the kinds of optimizations that turn the layout of a program into a work of art. Use it once to get a program written by someone else into a "house style," but don't let your style be a straightjacket. For instance, the preceding settings generated this formatting:


return 1 if $case_sensitive[length $word]{$word};

return 1 if $case_insensitive[length $word]{lc $word};


But I could further beautify that code into:


return 1 if $case_sensitive [length $word]{ $word};

return 1 if $case_insensitive[length $word]{lc $word};


Note that you can separate subscripting brackets from their identifiers by white space if you want. In fact, Perl is incredibly permissive about white space, allowing you to write such daring constructs as:


$ puppy = ($dog ->offspring)[0];

$platypup = ($platypus->eggs) [0];


although whether you really want to mix left and right justification without any, er, justification, is questionable.

4.5.1 Just Helping Out?
If you are modifying code that belongs to someone else, but not assuming responsibility for it, follow its existing style. You'll only annoy the author by "helpfully" reformatting their code. That's equivalent to going into the bathrooms at a house you've been invited to and changing the way all the toilet paper rolls are hung.



4.6 Variable Renaming
Easily vying with misindendation for unintentional obfuscation is cryptic variable naming. If you look at a piece of code and can't tell what a variable is for, it has the wrong name. A brief mnemonic is usually sufficient: $func is as good as $function, and unless there are other function references nearby, as good as $function_for_button_press_callback. On the other hand, $f is not good enough.

Names like $i and $n are fine if their variables enjoy the commonly accepted usages of an index and a count, respectively, but only over a short scope. Single-letter or otherwise cryptic names can otherwise be acceptable if they are recapitulating a formula that is well-known or included in official documentation for the program, for example:


$e = $m * $c ** 2;



# Weighted average contribution, see formula (4):

$v_avg = sum(map $v[$_] * $v_w[$_] => 0..$#v) / sum(@v_w);


However, single-character variable names should be confined to small scopes because they are hard to find with an editor's search function if you need to do that.

Don't use $i, $j, and so on, for indices in hierarchical data structures if the structures are not truly multidimensional; use names representative of what's stored at each level. For instance:



$level = $voxels[$x][$y][$z];


Okay; these really are x, y, z coordinates.




$val = $spreadsheet[$i][$j];


Okay; accepted usage.




$val = $spreadsheet[$row][$col];


Better; more common terms.




$total += $matrix[$i][$j][$k];


Fine, if we don't know any more about @matrix (perhaps this is in a multidimensional array utility method).




$e = $o{$i}{$j}{$k};


Bad; you have to look up what those variables mean.




$employee = $orgchart{$division}{$section}{$position};


Much clearer.




$emp = $org{$div}{$sec}{$pos};


Equally useful; the variables are sufficiently mnemonic to suggest their meaning immediately.


perlstyle has some excellent recommendations for formatting of variable names:

Use underscores for separating words (e.g., $email_address), not capitalization ($EmailAddress).

Use all capital letters for constants; for example, $PI.

Use mixed case for package variables or file-scoped lexicals; for example, $Log_File.

Use lowercase for everything else; for example, $full_name. This includes subroutine names (e.g., raid_fridge) even though, yes, they live in the current package and are not lexical.

Subroutines that are internal and not intended for use by your users should have names beginning with an underscore. (This won't stop the users from calling the subroutines, but they will at least know by the naming convention that they are doing something naughty.)

4.6.1 Symbolic Constants
I seldom use the constant pragma, which allows you to define symbols at compile time; for instance:


use constant MAX_CONNECTIONS => 5;

.

.

.

if ($conns++ < MAX_CONNECTIONS) ...


It has some advantages:

You cannot overwrite the value assigned to the symbol. (Well, not easily: The constant is actually a subroutine prototyped to accept no arguments. It could be redefined, but you wouldn't do that by accident.)

The use of a constant is (usually) optimized by the Perl compiler so that its value is directly inserted without the overhead of a subroutine call.

The symbols look constant-ish; identifiers beginning with a sigil ($, @, %) tend to make us think they are mutable.

It also has some disadvantages:

Constants are visually indistinguishable from filehandles.

A constant can't be used on the left-hand side of a fat arrow (=>) because that turns it into a string. You have to put empty parentheses after the constant to ensure that Perl parses it as a subroutine call.

The same is true of using a constant as a hash key.

Constants do not interpolate into double-quoted strings, unless you use the hair-raising @{[CONSTANT_GOES_HERE]} construction.

Being subroutines, they are package-based and cannot be confined to a lexical scope; but they go out of scope when you have a new package statement (unless you fully qualify their names).

Being subroutines, they are visible from other packages, and are even inheritable. You may consider this an advantage if you wish but it is not intuitively obvious.

So I tend to define my constants as just ordinary scalars or whatever they might be:


my $PI = 3.1415926535897932384;

my @BEATLES = qw(John Paul George Ringo);


Purists argue that constants should be unmodifiable, but Perl's central philosophy is to let you assume the responsibility that goes along with freedom. Anyone who changes the value of p deserves the consequences that go along with altering the fundamental structure of the universe. On the other hand, you might have good reason at various times to add 'Stuart' or 'Pete' to @BEATLES.

No comments: