Monday, December 10, 2007

Rewriting (Transplants)

< Day Day Up >

4.1 Strategizing
Perhaps the most important question you will have to answer to begin with is this: To what extent should you rewrite the code? Even if at first it looks far too long to replace, rewriting from scratch is an option you should pursue if at all possible for a host of reasons:

It allows you to psychologically "own" the code: It's no longer someone else's code; it's yours, because you typed it.

You'll be far more familiar with the code, because people remember things much better if they've written them instead of merely having read them.

The code may not need to be that long; the original code may have been accidentally or deliberately overelaborated and maybe there's a slim program in there just waiting to be liberated from the adiposity encasing it.

You'll be able to create tests incrementally as you write each new function.

Perhaps the program started as either a prototype or an application designed to handle a much smaller problem, but it became a victim of its own success and accreted new functionality as a hermit crab accumulates barnacles (the difference being that the crab can still get around). You can recode to fit the new, more complete requirements.

Similarly, the program may contain vestigial code that hasn't been used in years. If you rewrite, you'll automatically exclude it without having to hunt it down.

You can incorporate the latest technology in the new program. In particular, advanced modules are being created now at a dizzying pace; quite possibly large parts of the original code duplicate functionality that can now be left to the module maintainers.

Remember, it'll almost certainly take longer to figure out the code than you think. Consider whether increasing your estimate of the size of the task changes your mind about whether to rewrite or not.

< Day Day Up >
< Day Day Up >

4.2 Why Are You Doing This?
The goals of rewriting are similar to the goals of developing a program from scratch. You must consider these questions: Who are you writing for? How do the demands on your rewrite differ from the demands placed on the original code?

4.2.1 Who's the Audience?
Who is going to see your code? In some environments, code will be inspected by people who will not be involved in maintaining it, may not work in the same department, or may not even be programmers. But their reactions could nevertheless be influential. Determine before you start whether you are in such an environment and find out what you can about the reviewers' tastes. You may be better off coding to a low degree of sophistication so not to antagonize reviewers who are unable to understand more advanced code constructions.

Even if you're fortunate enough not to be incarcerated in such a politically distorted workplace, you must consider who is going to be maintaining this program. Assuming you're not attempting to ensure job security through creating code so obfuscated that no one else can decipher it, if someone else is going to assume responsibility for your program you will want to minimize the extent to which they have to bug you for help. Therefore, if you know who that person is going to be, you should keep the sophistication of your code to a level that they can assimilate. (Of course, this strategy has its limits; if you're creating an air traffic control system to be maintained by a recent high school graduate, accommodation may not be possible.[1])

[1] If you're creating an air traffic control system in Perl, please drop me a line. If your system is supposed to be maintained by teenagers, please also include a map of the areas it serves; rest assured I will put that map to good use. (To forestall the letters, the system described in [HOGABOOM03] is not used in operations.)

In the absence of more specific information, the most likely maintenance programmer for your program is, of course, yourself. Strange as it might seem at this point, many—if not most—programmers, in some subconscious fit of masochism, still code in a way that causes needless pain for themselves later on. You know who you are.

If you don't know who's going to take over your code, the question arises: What level of sophistication should you code to? There is much debating on this topic, and more than a few rancorous opinions. I believe you should code to your own level, and not try to satisfy the preferences of a purely hypothetical maintenance programmer. There abounds an absurd notion that there is an objective "best standard" to code to for maintenance, and people fritter away endless hours arguing whether a construction such as:


(my $new = $old) =~ s/old/new/;


fits this standard,[2] or whether it is clearer to the hypothetical maintainer to rewrite it as:

[2] Tom Christiansen dubbed this the en passant method of setting a variable to a transformation of an expression.


my $new = $old;

$new =~ s/old/new/;


If you're going to be the maintenance programmer, the only concern that need affect you is whether you will be able to understand the code. That is best determined by how long it took you to write it. If you write something quickly, it is more likely to be readable than something that takes hours of sweating and testing to get right.

If those hours are spent constructing a fiendishly compact brand-new idiom, the odds are you will spend even longer trying to figure it out when you come across it six months later. Leave it out for now and use more obvious code instead; if in some future programming effort the new idiom occurs to you without hours of toil, then you've hit on something useful; but more likely, you will be glad you resisted the temptation to use it.

(Writing something quickly doesn't guarantee that it'll make sense in a larger context later on, though, only that you'll understand the syntax. In other words, you may not be able to see the forest for the trees, but at least each tree will be somewhat familiar.)

If, on the other hand, you spend hours developing a small piece of code that doesn't have any apparently clearer representation, you have code that needs commenting. You understand it right now because of all the auxiliary notes that you took, the tests that you ran (which should be rolled into your testing scripts), the articles or texts that you read, and some complex analysis that you performed, none of which appear in the code. If you needed that supporting material to create the code, you'll need it to understand it, so put it in as comments. Later on I'll show how you can insert lengthier information that can be extracted into nicely formatted documents.

4.2.2 The Ten Levels of Perl Programmers
Several years ago, Tom Christiansen created a somewhat tongue-in-cheek list of different levels of Perl programming sophistication ([CHRISTIANSEN98]). With an eye toward classifying maintenance programmers, here's mine. Later on in this book, I'll occasionally refer to one of these levels when I want to indicate the degree of sophistication of some code or technique. It can be helpful to assess at what level the author of a program you are tasked with maintaining was.

Level 1:
Hasn't read any book (or anything else for that matter) on Perl; knows that it's a programming language but knows nothing else about it. Can run a Perl program written by someone else and realizes that they can change parts of the behavior—such as the content of printed strings—by editing parts of the program. Doesn't understand why changes to other parts of the program don't work; has no mental model to fit the language in, so can't differentiate the syntax from languages as diverse as COBOL and C++.

Level 2:
Understands the basic block structure syntax, although only to the extent of recognizing that it is similar to a language like JavaScript. Has a notion that blocks create some kind of scoping effect but does not know about lexical variables and has not encountered use strict or use warnings. Can change the sense of a conditional and use basic arithmetic and logical operators. Thinks that everything they need to do can be achieved by small modifications to programs written by others.

Level 3:
Wants to create programs from scratch and realizes that some kind of education is called for; asks people for book recommendations for learning Perl. Someone at this level may acquire the Camel book [WALL00] thinking that it is a tutorial, and attempt to read the whole thing, suffering severe neurological trauma in the process.

Level 4:
Learns for the first time about use strict and use warnings, but thinks they're more trouble than they're worth. Wonders what my means. Discovers that there are modules for solving just about any problem but doesn't know how to acquire and use them. Subscribes to a Perl newsgroup and is overwhelmed by the amount of discussion on topics that make no sense to them.

Level 5:
Has basic comprehension of regular expressions, operators, I/O, and scoping. Uses my, use strict, and use warnings because everyone says so. A large number of people never advance beyond this level because they can do pretty much anything (aside from creating reusable components), albeit often inefficiently. Learns about references either at this level or the next one.

Level 6:
May take a Perl class. Knows how to use objects and realizes that this knowledge and the Comprehensive Perl Archive Network (CPAN) enable them to create powerful programs very rapidly. Wants to advance to the next level to see how far this empowerment trend extends.

Level 7:
Learns how to create their own object-oriented modules and experiences the bliss of code reuse for the first time. Knows about packages and the difference between lexical and dynamic variables. Discovers that Perl's regular expressions aren't regular and are capable of far more than simple text manipulation.

Level 8:
Starts giving back in some fashion: either submitting bug reports or patches, modules to CPAN, documentation suggestions, or help for novices. Discovers advanced features like AUTOLOAD and starts using developer-oriented modules like Class::MethodMaker. Uses complex application modules like DBI or Tk where appropriate; comfortable using CGI.pm to create web-based applications.

Level 9:
Attends a Perl conference, participates in the Perl community in other ways; may frequent www.perlmonks.org or #perl (see Chapter 12). Comfortable with creating code on the fly with eval and manipulating the symbol table. Thinks often of performance considerations when coding, probably more than necessary. Publishes modules subclassing major modules to add significant functionality.

Level 10:
Takes part in Perl obfuscation and "golf" contests,[3] comfortable writing a single regular expression using embedded code to implement a function that most other people would have needed to write an entire program for; may submit patches to the Perl core, contribute major new modules, or otherwise become a well-known name within the Perl community.



[3] Perl golf is a contest of seeing who can solve a problem in the fewest number of characters. The winner is unlikely to be intelligible, but there again, how many other languages allow you to write an entire munitions-grade encryption algorithm (RSA) in three lines that fit on a t-shirt?

Continuing this progression for a few more levels until reaching one where the only inhabitant is Larry Wall is left as an exercise to the reader.

4.2.3 What Are the Requirements?
Before rewriting, you must find out how the requirements for the code have changed since the original program was developed. Don't assume that all you have to do is reproduce the functionality unless you've verified that. You may need to optimize the code for different goals; perhaps you're expected to improve its readability, performance, or portability, or extend its functionality.

An important question to ask the original developer is what, if anything, the code was optimized for. For instance, there may be complicated constructions in it that you would simplify if you didn't know that they're required for acceptable performance.

< Day Day Up >
< Day Day Up >

4.3 Style
In [SCOTT01], Ed Wright and I said, "The only believable comment to make about style is that you should have one." I'll go further than that here. Just as in clothing, having a consistent style doesn't necessarily mean other people won't laugh at you. So, to prevent you from accidentally picking a coding style that's equivalent to a 1970s-era pool hustler, here is a suggested style that's rather more conservative.

Of course, some people intentionally dress like pool sharks and may take umbrage at this disparagement of their sartorial tastes. By all means stick with an idiosyncratic style if you've arrived at it through conscious decision and you can live with the consequences. This section is for everyone else. Like many parts of this book, it incorporates a certain amount of personal taste that is one of a number of ways of getting a job done well; I'm not holding it up as the "One True Way."

You can find comments on style in the perlstyle documentation page that comes with Perl, parts of which are reflected here.

4.3.1 Layout
The best thing you can do for the layout of your program is to indent it properly. I continue to be amazed by how many people in my classes write programs with no indentation whatsoever, as though there were a price on the head of anyone using white space. My own tendencies are so ingrained in the other direction that I have to fight the urge to indent my one-liners! No matter how brief or short-lived the program, you're doing everyone a favor by indenting it from the beginning. Many editors make this process painless.

perlstyle recommends a four-column indent; I use two columns because it's enough for my eyes to recognize the block structure and I don't want deeply nested blocks crowding the right margin. Some people favor an eight-column indent by virtue of using the tab character for each indentation level, and I personally find this far too large. They sometimes claim that they merely need to set the tab stops in their editors to four columns to get good-looking indentation, but this forces maintenance programmers to fiddle with their editor configuration, something many programmers are so possessive of that you might as well ask them to stop breathing.

When it comes to brace styles for code blocks, programmers break out into confrontations that make Jonathan Swift's big-endian/little-endian battles ([SWIFT47]) look like Amish barn-raisings. I prefer a style called the BSD/Allman style:


sub read_dicts

{

my $dictref = shift;

my @path = split /:/, $ENV{DICTPATH} || "$ENV{HOME}:.";

local @ARGV = grep -e, ($DEFDICT, map "$_/.dict" => @path);

while (<>)

{

# Assume the line contains a single word

chomp;

$dictref->{$_}++;

}

}


whereas most of the Perl distribution uses the so-called K&R (Kernighan and Ritchie) style:


sub read_dicts {

my $dictref = shift;

my @path = split /:/, $ENV{DICTPATH} || "$ENV{HOME}:.";

local @ARGV = grep -e, ($DEFDICT, map "$_/.dict" => @path);

while (<>) {

# Assume the line contains a single word

chomp;

$dictref->{$_}++;

}

}


which saves a line per block but makes it harder—to this author—to see where the block begins, particularly because a continuation line may be indented. For example:


update($controller->connection, $record[$cur_count]->ticket,

$form_inputs->{params});


might force you to check the end of the first line to make sure it wasn't ending with a brace. Another example:


if (($best_count == 1 && $self->{single_match})

|| ($best_count > 0 && ! $self->{single_match})) {

$found_match = 1;

}


which—even with a four-column indentation—looks more confusing to me than


if (($best_count == 1 && $self->{single_match})

|| ($best_count > 0 && ! $self->{single_match}))

{

$found_match = 1;

}


with only a two-column indentation. (Although to be fair, some K&R adherents suggest this case is worth an exception.) See also how the continuation line was given a nonstandard indentation so the parallel clauses could line up nicely. Don't underestimate the value of using white space to prettify code in this way.[4]

[4] After some conversation with K&R adherents, I am starting to suspect that the difference between them and the BSD/Allman camp is that they tend to look at the ends of lines for semantic clues, whereas we look at the beginnings. Experiment with both styles and see which one you find more appealing.

Although there are at least two ways of conditionally executing a single statement without putting it in an if block, if that statement is long enough to overflow the line, using a block for it may be clearer than the other approaches (because it looks like multiple statements, even if it isn't):


if (@problems)

{

error("We found the following problems:\n"

. (join "\n" => @problems)

);

}


The non-block ways of conditional statement execution are:

condition and statement

For example:


/(banana|apple|pear)/ and push @fruit, $1;


statement if condition

For example:


warn "Reached line $." if $verbose;


Code should be as pleasant to look at as you can possibly make it without ornateness that's fragile with respect to modifications—you don't want to shy away from making necessary changes just because they would destroy a beautiful piece of formatting.

< Day Day Up >

No comments: