http://www.developer.com/

Back to article

Thinking In Perl


March 10, 2004

Not many people would argue that Perl is not one of the most flexible and useful scripting languages available today. Most, however, cannot look past that to see it as a programming language, and a powerful one at that. No one will argue that they could develop a prototype of a product in Java faster than they could in Perl, yet most would abandon Perl as soon as development "gets serious." After all, Perl is just a scripting language. No one in their right mind would attempt to use it for anything besides CGI and gluing other programs together.

The fact is, Perl should be taken more seriously as a programming language. The problem is that most of the Perl material in the world has been developed as either a fancy shell script or a stripped down prototype for C and Java projects. The very things that attract people to using Perl for these quick and dirty hacks also stagnate their progress in the language. Most instuctional Perl material only teaches enough to get an interested programmer informed enough to write shell or C programs in Perl.

Perl was developed by Larry Wall, a linguist and computer scientist. It started life as a simple scripting language, but under heavy influence of a linguist and a large community, grew in to its own language. Today, Perl brings all of the tools of any other language, plus much more, to the table. Most of these tools lie unnoticed by most beginners because they only know enough about Perl to be bored by its lack of support for typed variables and its maintainence nightmares. All of this is a very myopic view of Perl as a programming language. Through the eyes of a C programmer or shell scripter, Perl can look ugly and slow. However, if you're learning to think in Perl, not in C or shell script, Perl begins to become beautiful and fast.

Perl Style

Perl doesn't force you to program in any particular style. White space is neglible. Larry Wall and the community have come up with a set of style guidelines that serve to make Perl look consistant and thus more maintainable. The accepted style for programming looks like this:

my $var = 0;
$var = <STDIN>;
chomp($var);
if($var == 0) {
   # do something
}
else {
   # do something else
}

The key points in perl style are:

  1. Left brackets ( { ) begin on the same line as the if (or for, etc.).
  2. Right brackets ( } ) are usually found on a line by themselves.
  3. Else is not on the same line as the if's closing bracket.

These rules are not enforced by the interpretter, and are by no means "superior" to any other formatting style. They are agreed upon by the vast majority of the Perl community (including its founder) to be the definitive style of Perl Programming. That's good enough for most people. See 'perldoc perlstyle' for more.

Perl Pragmas

Pragmas are modules that change the behaviour of the interpretter. Included in the core distribution are several pragmas designed to make Perl more maintainable and a lot smarter. Regardless of how great of a typer or how smart a programmer is, there is one pragma that should ALWAYS be on. That pragma is "strict" and is activated with the following line of code:

use strict;

Most veteran Perl programmers shudder violently at the site of a 500-line Perl program without strict activated. No matter how great the original author of the program had been at checking for errors, the people who eventually modified the program probably did not spend as much time debugging. For the sake of maintainability, (and performance, to be demonstrated later) ALWAYS use strict. Under the "strict" pragma, all variables must be declared BEFORE they get used.

Taking things one step further, we could enable warnings in our Perl program to be more aware of deprecated syntax/functions in our program, as well as alert us to other potential problems. Another useful pragma is the diagnostics pragma that outputs detailed information about how the interpretter is reading the code it's presented. You may enable these pragmas with the following lines of code:

use warnings;
use diagnostics;

See the Perl documentation for more detailed information.

Perl Scope

Using the strict pragma, we need to declare all variables before they're used. There are two scopes in which variables may be declared, Package and Lexical. How does a programmer decide which scope to use for their variables?

Package Scope

Package Scope is the default variable scope in Perl. If we were to run this program:

#!/usr/bin/perl

$message = "Hello World!\n";
print $message;

The variable $message would be created in the containing package's scope, in this case "main". An entry is created in the symbol table for "$main::message", the "fully qualified package name" for $message. This example would not pass the strict pragma as we have used the variable $message without first creating it. By using Perl's our() function, we can create package variables in the current package:

#!/usr/bin/perl
use strict;

our $message = "Hello World!\n";
print $message;

Package variables are accessible to any package through the use of their fully qualified package name. This name consists of the sigil ($,@,%), a package name, followed by double colons, followed by the variable name. Package variables are also "global" in that they're accessibility and existance is the entire program from declaration on. This means that package variables never go out of scope, and thusly are not destroyed until program termination.

Lexical Scope

Lexical scope is more ideal for the majority of variables programmers use regularly. Lexical scope allows a variable to exist only within its containing closure. The widest scope available to a lexical variable is the entire file in which its declared. No entry is created in the symbol table for lexically scoped variables and these variables will be purged once they're out of scope. This use of Perl's garbage collection can save resources and prevent variable collisions. Lexical scope has to be declared by using Perl's my() function. The preceeding examples would be more correctly written as:

#!/usr/bin/perl

my $message = "Hello World!\n";
print $message;

my() or our()

Unless inside of a non-main package, always use Lexically scoped variables. For most practical applications, the benefits of garbage collection will be more rewarding than a globally accessible scope.

When to Declare

Most beginner-level programming instruction has programmers declaring all of their variables at the top of the file or function. This methodology completely ignores Perl's built-in garbage collector. The Perl way to do things is to declare variables in the smallest scope possible. This way, you can ensure that variables such as loop counters only use available memory while they need to be using that memory. The following code is prevalent in scripts all over:

#!/usr/bin/perl

use strict;

my $i = 0;
my $j = 0;
my $word = '';
my $letter = '';
my @array = ();
my @internalArray = ();
my @ordArray = ();

@array = ('see','spot','run','see','spot','jump');

for($i = 0; $i <= $#array; $i++) {
   $word = $array[$i];
   @internalArray = split '',$word;
   for($j=0; $j <= $#internalArray; $j++) {
      $letter = $internalArray[$j];
      $ordArray[$#ordArray + 1] = ord($letter);
   }
}

While this code is correct and passes the strict pragma, it's very inefficient. If this program were longer, say 6,000 lines, and a programmer needed to track down what's happening to $letter, it'd be difficult to tell exactly what's happening because it's declared at the beginning of the program and not used until much later. This code demonstrates one more Perl-ish way to acheive the same goal:

#!/usr/bin/perl

use strict;

my @array = qw/see spot run see spot jump/;
my @ordArray = ();

foreach my $word (@array) {
   foreach my $letter (split '', $word) {
      my $ordinal = ord($letter);
      push @ordArray, $ordinal;
   }
}

Not all the constructs may make sense to a new Perl programmer, but the snippet is intended to demonstrate where to declare variables. Use of the my() in the loop declaration scopes the variables to that loop. At the end of the loop, that variable is garbage collected by the Perl interpretter.

Variable Context

One of the most interesting features Perl provides is variable context. This allows a variable to have more than one value, depending on the surrounding code. This is a concept that comes straight from human language and makes a lot tasks much easier. Variables can be evaluated in scalar (singular) or list (plural) contexts. Most Perl functions and operators can operate or return in either context. The help files briefly touch on this, but they fail to demonstrate the full power of variable context. Consider the following:

print localtime(time());

The perldoc page for localtime() says it returns a list in list context, and in scalar context it returns a system formatted date string. To many beginner's surprise, the above code is actually invoked in list context. Calls to functions expect a list of parameters to be passed to a function and because of Perl's unique syntax for parameter passing, functions always invoke list context. To get the example to print the system formatted, human readable time, one must force scalar context. This is relatively easy with the scalar() function.

print scalar(localtime(time());

Arrays and hashes are created from lists. The () operator forces list context. Arrays and hashes also can be flattened into one list by using this list context. It's often used to merge two hashes. Hashes are associative arrays, and in their creation, any duplicate keys will be overriden by the last key in the list. If %data contains unique, important data and %filler is a hash that makes sure all the required fields have a value, it is possible to merge them into a hash %complete that contains only the unique keys from both %data and %filler and the duplicate keys are matched to values in %data:

my %filler = ( 'Page' => 1, 'ResultsPerPage' => 50,
               'TotalResults' => 0 );
my %data   = ( 'TotalResults' => 12345 );

my %complete = ( %filler, %data ); # ( Page=>1, ResultsPerPage=>50,
                                   #   TotalResults => 12345 );

Also very useful is that the meaning of arrays in scalar context return the number of items they contain. This can be used to process an array if it contains items, or skip that processing. Perl does not have a boolean type, but this is far from a handicap when you consider that conditional operators force scalar context. This makes testing the number of elements in an array fairly simplistic:

if(@list > 10) {
   # big list
   print "Big List: " . @list . " items!\n";
}
else {
   # small list
   print "Small List: less than 10 items!\n";
}

It was not necessary to force scalar context using the scalar() function in the print() statement because @list was prefixed by the string concatentation operator that forces scalar context. This can be confusing to beginners because not many other languages make the programmer be aware of the additional context the variables are being accessed in. With proper instruction and practice, this becomes second nature to a Perl programmer and oft saves them many key strokes. However, consider the output of the following snippet:

my @array = qw/a b c d e f g h i j k l/;

print "List contains " . @array . " items\n";
# prints "List contains 12 items"
print "List contains ", @array, " items\n";
# prints "List contains abcdefghijkl items"
print "List contains " . @array, " items\n";
# prints "List contains 12 items"
print "List contains ", @array . " items\n";
# prints "List contains 12 items"

This is why Perl can become so unbearable for programmers coming from other languages because context may not be a familiar concept. In the first print(), the @array is being forced into scalar context by use of the string concatenation operator (.). In the second print(), the @array is being passed as part of the list to the print() function. The comma is one of two list delimiters (, and =>) and maintains the list context of the function call. Even more confusing is the result of the third and fourth print() statements. Replacing just one of the commas in the above example forces @array to be evaluated in scalar context because the . operator requires a string (scalar) on BOTH sides of the operator.

The number of arguments being passed might shed some light on what's happening. In the first example, thanks to everything being concatenated, the list of paramters to the print state contains only one element, the entire string. The second exaple is actually passing 14 elements to the print function, the two strings at the beginning and end, and the 12 elements in the @array. The third example passses only two elements to the print statement; the string formed by the concatenation and the string " items\n". The fourth example passes two parameters as well: the string "List contains " and the string formed by the concatenation.

As with any other programming language, sometimes it is better to waste a few extra key strokes to assure readability. In the above snippet, it would be best to prefix the @array with a scalar() function call to ensure there's no confusion during future edits.

  • A quick guideline of the variables and their meaning in each context:
    • Scalar is a single item list in list and array context.
    • Lists in scalar context evaluate to their last element.
    • Arrays in scalar context evalute to the number of elements they contain.
  • Learning what functions and constructs affect context can take some time, but here are some basic rules.
    • Conditional Operators evaluate expressions in scalar context.
    • Anything inside of (..), [..], and {..} is evaluated in list context.
      • Function argument lists are lists (shocker).
      • (..) is the list operator, and envokes list context on its contents.
    • The right side of an assignment expression is evaluated based on the context of the left side of the expression. Some examples include:
    • my @z = qw/a b c/;
      
      my $a = qw/a b c/;    # Scalar = List in Scalar  $a = 'c';
      my ($b) = qw/a b c/;  # List = List              $b = 'a';
      
      my $c = @z;           # Scalar = Array in Scalar $c = 3;
      my ($d) = @z;         # List = Array in List     $d = 'a';
      

The Tip of The Iceberg

These examples and concepts in this article are just barely scratching the surface of the world of Perl programming. The concepts presented were only viewed at 10,000 feet. Luckily, interested Perl programmers have excellent documentation available to them at the tip of their fingers with the 'perldoc' command that ships standard with Perl. Running 'perldoc perl' presents a programmer with a long list of manuals and guides that they may drill down into. Relevant topics in the Perl core documentation include 'perldata', 'perlref', and 'perllol'.

About the Author

Brad Lhotsky is a Software Developer whose focus is primarily web based application in Perl and PHP. He has over 5 years experience developing systems for end users and system and network administrators. Brad has been active on Perl beginner's mailing lists and forums for years, attempting to give something back to the community.

Brad currently has one module released on the CPAN.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date