by Paul Doyle
Before you get into the nitty-gritty of using Perl on World Wide Web servers, you need take some time to look at Perl itself.
This chapter provides an overview of the Perl language. It is not a detailed course in Perl, but it should be enough to give you the flavor of the language and to help you make sense of the examples in this book. When used with the reference chapters in Part V of this book, this chapter may well give you enough Perl to get by with; as you use the language; you'll probably want to delve into more deeply after you've been programming in it for a while.
When you're ready to learn more, you may want to purchase the excellent Programming Perl, by Larry Wall and Randal L. Schwartz (O'Reilly & Associates, Inc.). This book is the definitive work on Perl so far (as you might suspect with Wall's name on the cover). It's readable and humorous yet still sufficiently technical to be of genuine use in everyday Perl programming.
Incidentally, the book is called the "Camel book" after the dromedary that happens to adorn the cover. Because of the ubiquitous nature of the book in Perl-literate circles, this animal has become the emblem of the language.
We're not going to go into too much detail in this chapter; all the gory details are covered in Part V of this book. By the end of this chapter, you should know enough to find your way around the reference chapters for the answers to particular questions. If you already know Perl, you may want to just skim this chapter to refresh your memory of the language and how it works. If you don't already know how to program in at least one language, this book is not the place to start.
This chapter is supposed to be a snappy introduction to the language, so why am I wasting your time with this stuff? The fact is, Perl is a unique language in ways that cannot be conveyed simply by describing the technical details of the language. Perl is a state of mind as much as it is a language grammar. So we'll take a few minutes to look at the external realities that provoked Perl into being; this information should give you some insight into the way that Perl was meant to be used. |
Now, here's the interesting bit: Larry could have written a utility to manage the particular job at hand and gotten on with his life. He could see, though, that it wouldn't be long before he'd have to write another special utility to handle something else that the standard tools couldn't quite hack. (He may have realized that most programmers are always writing special utilities to handle things that the standard tools can't quite hack.)
So rather than waste any more of his time, he invented a new language and wrote an interpreter for it. That statement may seem to be a paradox, but it isn't. Setting yourself up with the right tools is always an effort, but if you do it right, the effort pays off.
The new language emphasized system management and text handling. After a few revisions, it could handle regular expressions, signals, and network sockets, too. The language became known as Perl and quickly became popular with frustrated, lazy UNIX programmers-and with the rest of us.
Is it Perl or perl ? The definitive word from Larry Wall is that it doesn't matter. Many programmers like to refer to languages with capitalized names (Perl), but the program originated on a UNIX system, on which short lowercase names (awk, sed, and so on) are the norm. As is true of many things about the language, there's no single "right way" to use the term; just use it the way you want. Perl is a tool, after all, and not a dogma.
If you're sufficiently pedantic, you may want to call it [Pp]erl after you read the " Regular Expressions " section later in this chapter. |
Perl can handle low-level tasks quite well, particularly since Perl 5, when the whole messy business of references was put on a sound footing. In this sense, it has a great deal in common with C. But Perl handles the internals of data types, memory allocation, and so on automatically and seamlessly.
Perl code also bears a passing resemblance to C code, perhaps because Perl was written in C or perhaps because Larry found some of C's syntactic conventions to be handy. But Perl is less pedantic and much more concise than C is.
This magpie habit of picking up interesting features along the way-regular expressions here, database handling there-has been regularized in Perl 5. Now you can add your favorite bag of tricks to Perl fairly easily by using modules. Many of the added-on features of Perl, such as socket handling, are likely to be dropped from the core of Perl and moved out to modules in time.
Perl is not completely a public-domain product, though, and for very good reason. If the source were completely public-domain, someone could make minor alterations in it, compile it, and then sell it-in other words, rip off its creator. On the other hand, without distributing the source code, it's hard to make sure that everyone who wants to can use Perl.
The GNU General Public License is one way to distribute free software without the danger of being taken advantage of. Under this type of license, source code may be distributed freely and used by anybody, but any programs derived from such code must be released under the same type of license. In other words, if you derive any of your source code from GNU-licensed source code, you have to release your source code to anyone who wants it.
This arrangement is often sufficient to protect the interests of the author, but it can lead to a plethora of derivative versions of the original package, which may deprive the original author of a say in the development of his or her own creation. The situation can also lead to confusion on the part of users-it becomes hard to establish which version of the package is the definitive version, whether a particular script will work with a given version, and so on.
That's why Perl is released under the terms of the Artistic License-a variation on the GNU General Public License that says that anyone who releases a package derived from Perl must make it clear that the package is not actually Perl. All modifications must be clearly flagged; executables must be renamed, if necessary; and the original modules must be distributed along with the modified version. The effect is that the original author is clearly recognized as the owner of the package. The general terms of the GNU General Public license also apply.
The perl distribution comes with a nifty utility called Configure that tweaks the source files and the Makefile for your system. It probes your system software, shell, C compiler, and so on to determine the answers to various questions about how to build Perl-which compiler flags to use, the sizes of fundamental data types, and so on. You can override any of Configure's answers if you disagree with its findings, but it's generally very accurate indeed.
Running Configure before you make perl virtually guarantees you a perl installation that is not only successfully compiled and linked, but also well optimized for your particular system configuration-and with no tweaking or editing of source files on your part. You're more than welcome to tinker with obscure compiler flags if you want, however; that's why GNU C was invented.
Suppose that perl is correctly installed and working on your system. The simplest way to run perl on a Perl program is to invoke the perl interpreter with the name of the Perl program as an argument, as follows:
perl sample.pl
In this example, SAMPLE.PL is the name of a Perl file, and perl is the name of the perl interpreter. The example assumes that perl is in the execution path. If it isn't, you need to supply the full path to perl, too, as follows:
/usr/local/hin/perl sample.pl
This syntax is the preferred way of invoking perl, because it eliminates the possibility that you might invoke a copy of perl other than the one you intended to use. Because we'll be working with Web servers in this book-and, therefore, keenly aware of security issues-we'll use the full path from now on.
That much is the same on all systems that have a command-line interface. The following will do the trick in Windows NT, for example:
c:\NTperl\perl sample.pl
#!/usr/local/bin/perl
This line tells UNIX that the rest of this script file is to be interpreted by /USR/LOCAL/BIN/PERL. Next, you make the script itself executable, as follows:
chmod +x sample.pl
Then you can execute the script file directly and have the script file tell the operating system what interpreter to use while running it.
Usually, a few more steps are required to get a Web server to execute Perl programs automatically. Refer to Appendix A, " Perl Acquisition and Installation ," for platform-specific instructions on creating associations between scripts and interpreters. |
Option | Arguments | Purpose | Notes |
---|---|---|---|
-0 | Octal character code | Specify record separator | Default is new line ( \n ) |
-a | | Automatically split records | Used with -n or -p |
-c | | Check syntax only; do not execute | |
-d | | Run script, using Perl debugger | If Perl debugger is installed |
\-D | Flags | Specify debugging behavior | Refer to the PERLDEBUG man page on the CD-ROM that comes with this book |
-e | Command | Pass a command to Perl from the command line | Useful for quick operations; see tip after this table for an example |
-F | Regular expression | Expression to split by if -a is used | Default is white space |
-i | Extension | Replace original file with result | Useful for modi- fying contents of files; see tip after this table for an example |
-I | Directory | Specify location of include files | |
-l | Octal character code | Drop new lines when used with -n and -p , and use designated character as line- termination character | |
-n | | Process the script, using each specified file as an argument | Used for performing the same set of actions on a set of files |
-p | | Same as -n , but each line is printed | |
-P | | Run the script through the C preprocessor before Perl compiles it | |
-s | | Enable passing of arbitrary switches to Perl | Use -s -what -ever to have the Perl variables $what and $ever defined within your script |
-S | | Tell Perl to look along the path for the script | |
-T | | Use taint checking; don't evaluate expressions supplied in the command line | Very important for Web use |
-u | | Makes Perl dump core after compiling your script; intended to allow for generation of Perl executables | Very messy; wait for the Perl compiler |
-U | | Unsafe mode; over- rides Perl's natural caution. | Don't use this! |
-v | | Print Perl version number | |
-w | | Print warnings about script syntax | Extremely useful, especially during development; warning messages browsers if sent raw can confuse. |
The -e option is handy for quick Perl operations from the command line. Want to change all the foo s in WIFFLE.BAT to bar s? Try this: perl -i.old -p -e "s/foo/bar/g" wiffle.bat
This code says, "Take each line of WIFFLE.BAT ( -p ), store the original in WIFFLE.OLD ( -i ), replace all instances of foo with bar ( -e ), and write the result ( -p ) to the original file ( -i )."
You can supply Perl command-line arguments in the interpreter-invocation line in UNIX scripts. Following is a good start for any Perl script:
#!/usr/local/bin/perl -w -T
The -w switch is best omitted in versions of Perl older than 5.002, because it may produce spurious warnings.
Also, take care when you use the -w switch in scripts that send data to Web browsers. Warning messages sent before the browser receives a content-type line may result in an error message. |
Perl code can be quite free-flowing. The broad syntactic rules governing where a statement starts and ends are:
Here's a Perl statement inspired by Kurt Vonnegut:
print "My name is Yon Yonson\n";
No prizes for guessing what happens when Perl runs this code-it prints My name is Yon Yonson . If the \n doesn't look familiar, don't worry; it simply means that Perl should print a new-line character after the text (or, in other words, go to the start of the next line).
Printing more text is a matter of either stringing together statements like the following or giving multiple arguments to the print function:
print "My name is Yon Yonson,\n";
print "I live in Wisconsin,\n",
"I work in a lumbermill there.\n";
That's right- print is a function. It may not look like one in any of the earlier examples in this chapter, which have no parentheses to delimit the function arguments, but it is a function, and it takes arguments. More accurately, in this example print takes a single argument that consists of an arbitrarily long list.
We'll have much more to say about lists and arrays in " Data Types " later in this chapter. You'll find a few more examples of the more common functions in the remainder of this chapter, but refer to Chapter 15, "Function List," for a complete rundown on Perl's built-in functions.
For now, if you're uncomfortable with functions that take arbitrary numbers of arguments with no parentheses to corral them, pretend that you see parentheses. You can use them in Perl programs, if you like, but it would be better to get used to the idea that Perl syntax is loose and groovy in a way that C, for example, is not.
What does a complete Perl program look like? Here's a trivial UNIX example, complete with the invocation line at the top and a few comments:
#!/usr/local/bin/perl -w # Show warnings
print "My name is Yon Yonson,\n"; # Let's introduce ourselves
print "I live in Wisconsin,\n",
"I work in a lumbermill there.\n"; # Remember the line breaks
This example is not at all typical of a Perl program, though; it's just a linear sequence of commands with no structural complexity. The " Flow Control " section later in this chapter introduces some of the constructs that make Perl what it is and provides a more authentic flavor of what is normal in a Perl program. For now, we'll stick to simple examples like this one for the sake of clarity.
All Perl variable names, including scalars, are case-sensitive. $Name and $name , for example, are completely different quantities. |
Perl converts automatically between numbers and strings as required, so that
$a = 2;
$b = 6;
$c = $a . $b; # The "." operator concatenates two strings
$d = $c / 2;
print $d;
yields the result
13
This example involves converting two integers to strings; concatenating the strings into a new string variable; converting this new string to an integer; dividing it by 2; converting the result to a string; and printing it. All these conversions are handled implicitly, leaving the programmer free to concentrate on what needs to be done rather than on the low-level details of how it is to be done.
This situation might be a problem if Perl were regularly used for tasks in which explicit memory offsets were used, for example, and data types were critical. But for the type of task for which Perl is normally used-and certainly for the types of tasks that we'll be using it for in this book-these automatic conversions are smooth, intuitive, and generally a Good Thing.
We can develop the earlier example script with some string variables, as follows:
#!/usr/local/bin/perl -w # Show warnings
$who = 'Yon Yonson';
$where = 'Wisconsin';
$what = 'in a lumbermill';
print "My name is $who,\n"; # Let's introduce ourselves
print "I live in $where,\n",
"I work $what there.\n"; # Remember the line breaks
print "\nSigned: \t$who,\n\t\t$where.\n";
This script yields the following:
My name is Yon Yonson,
I work in Wisconsin,
I work in a lumbermill there.
Signed: Yon Yonson,
Wisconsin.
Don't worry-it gets better.
@trees = ("Larch", "Hazel", "Oak");
Array subscripts are denoted by brackets. $trees[0] , for example, is the first element of the @trees array. Notice that it's @trees but $trees[0] ; individual array elements are scalars, so they start with $ .
Mixing scalar types in an array is not a problem. The code
@items = (15, '45.67', "case");
print "Take $items[0] $items[2]s at \$$items[1] each.\n";
results in the following:
Take 15 cases at $45.67 each.
All arrays in Perl are dynamic. You never have to worry about memory allocation and management; Perl does all that stuff for you. Combine that with the fact that arrays can contain arrays as subarrays, and you're free to say things like the following:
@A = (1, 2, 3);
@B = (4, 5, 6);
@C = (7, 8, 9);
@D = (@A, @B, @C);
As a result of this code, the array @D contains the numbers 1 through 9. The power of constructs such as the following takes getting used to:
@Annual = (@Spring, @Summer, @Fall, @Winter);
This code example combines arrays that represent some aspect of each of the seasons in a concise and intuitive way. The arrays for the seasons might in turn consist of arrays of months, each of which might consist of an array of daily values. The @Annual array then would consist of a value for each day of the year. By defining your data in chunks such as this, you give yourself the option of handling it on a daily, monthly, or annual basis.
An aspect of Perl that often confuses newcomers (and occasionally old hands, too) is the context-sensitive nature of evaluations. Perl keeps track of the context in which an expression is being evaluated and can return a different value in an array context than in a scalar context. In this example, the array @B contains 1-4, whereas $C contains 4 (the number of values in the array):
|
Many of Perl's built-in functions take arrays as arguments. One example is sort , which takes an array as an argument and returns the same array, sorted alphabetically. The code
print sort ( 'Beta', 'Gamma', 'Alpha' );
prints AlphaBetaGamma.
You can make this code neater by using another built-in function, called join . This function takes two arguments: a string to connect with, and an array of strings to connect. join returns a single string that consists of all elements in the array joined with the connecting string. The code
print join ( ' : ', 'Name', 'Address', 'Phone' );
returns the string Name : Address : Phone.
Because sort returns an array, you can feed its output straight into join . The code
print join( ', ', sort ( 'Beta', 'Gamma', 'Alpha' ) );
prints Alpha, Beta, Gamma.
Notice that this code doesn't separate the initial scalar argument of join from the array that follows it. The first argument is the string to join things with. The rest of the arguments are treated as a single argument: the array to be joined. This is true even if you use parentheses to separate groups of arguments. The code
print join( ': ', ('A', 'B', 'C'), ('D', 'E'), ('F', 'G', 'H', 'I'));
returns A: B: C: D: E: F: G: H: I.
You can use one array or multiple arrays in a context such as this because of the way that Perl treats arrays; adding an array to an array gives you one larger array, not two arrays. In this case, all three arrays are bundled into one.
For even more powerful string-manipulation capabilities, refer to the splice function in Chapter 15, "Function List."
Arrays of the type that you've already seen are lists of values indexed by subscripts . In other words, to get an individual element of an array, you supply a subscript as a reference, as follows:
@fruit = ( "Apple", "Orange", "Banana" );
print $fruit[2];
This example yields Banana , because subscripts start at zero, so 2 is the subscript for the third element of the @fruit array. A reference to $fruit[7] here returns the null value, because no array element with that subscript has been defined.
Now, here's the point of all this: Associative arrays are lists of values indexed by strings . Conceptually, that's all there is to them. The implementation of associative arrays is more complex, because all the strings ( keys ) need to be stored in addition to the values to which they refer.
When you want to refer to an element of an associative array, you supply a string (the key) instead of an integer (the subscript). Perl returns the corresponding value. Consider the following example:
%fruit = ("Green", "Apple", "Orange", "Orange", "Yellow", "Banana" );
print $fruit{"Yellow"};
This code prints Banana , as before. The first line defines the associative array in much the same way that you have already defined ordinary arrays; the difference is that instead of listing values, you list key/value pairs. The first value is Apple , and its key is Green . The second value is Orange , which happens to have the same string for both value and key. Finally, the value Banana has the key Yellow .
On a superficial level, you can use string subscripts to provide mnemonics for array references, allowing you to refer to $Total{'June'} instead of $Total[5] . But you wouldn't even be beginning to use the power of associative arrays. Think of the keys of an associative arrays as you might think of a key that links tables in a relational database, and you're closer to the idea. Consider this example:
%Folk = ( 'YY', 'Yon Yonson',
'TC', 'Terra Cotta',
'RE', 'Ron Everly' );
%State = ( 'YY', 'Wisconsin',
'TC', 'Minnesota',
'RE', 'Bliss' );
%Job = ( 'YY', 'work in a lumbermill',
'TC', 'teach nuclear physics',
'RE', 'watch football');
foreach $person ( 'TC', 'YY', 'RE' ) {
print "My name is $Folk{$person},\n",
"I live in $State{$person},\n",
"I $Job{$person} there.\n\n";
}
We had to sneak the foreach construct in there for that example to work. That construct is explained in full in " Flow Control " later in this chapter. For now, you'll just have to take it on trust that foreach makes Perl execute the three print statements for each of the people in the list after the foreach keyword. Otherwise, you could try executing the code in the sample and see what happens.
You also can treat the keys and values of an associative array as separate (ordinary) arrays by using the keys and values keywords, respectively. The code
print keys %Folk;
print values %State;
prints the string YYRETCWisconsinBlissMinnesota.
Looks as though we need to do some more work on string handling. That task is best left until after we cover some flow-control mechanisms, however.
A special associative array called %ENV stores the contents of all environment variables, indexed by variable name. $ENV{'PATH'} , for example, returns the current search path. Following is a way to print the current values of all environment variables, sorted by variable name for good measure:
|
You can regard a file handle as being a pointer to a file from which Perl is to read or to which it will write. (C programmers are familiar with the concept.) The basic idea is that you associate a handle with a file or device, and then refer to the handle in the code whenever you need to perform a read or write operation.
File handles generally are written in uppercase. Perl has some useful predefined file handles, as Table 1.2 shows.
File Handle | Points to... |
---|---|
STDIN | Standard input (normally, the keyboard) |
STDOUT | Standard output (normally, the console; in many Web applications, the browser) |
STDERR | Device where error messages should be written (normally, the console; in a Web server environment, normally the server-error log file) |
The print statement can take a file handle as its first argument, as follows:
print STDERR "Oops, something broke.\n";
Notice that no comma appears after the file handle in this example. That helps Perl figure out that the STDERR is not something to be printed. If you're uneasy with this implicit list syntax, you can put parentheses around all the print arguments, as follows:
print (STDERR "Oops, something broke.\n");
You still have no comma after the file handle, however.
Use the standard file handles explicitly, especially in complex programs. Redefining the standard input or output device for a while is convenient sometimes; make sure that you don't accidentally wind up writing to a file what should have gone to the screen.
You can use the open function to associate a new file handle with a file, as follows:
open (INDATA, "/etc/stuff/Friday.dat");
open (LOGFILE, ">/etc/logs/reclaim.log");
print LOGFILE "Log of reclaim procedure\n";
By default, open opens files for reading only. If you want to override this default behavior, add to the file name one of the special direction symbols listed in Table 1.3. (The > at the start of the file name in the second output statement of the preceding example, for example, tells Perl that you intend to write to the named file.)
Symbol | Meaning |
---|---|
< | Open the file for reading (the default action) |
> | Open the file for writing |
>> | Open the file for appending |
+< | Open the file for both reading and writing |
+> | Open the file for both reading and writing |
| (before file name) | Treat file as command into which Perl is to pipe text |
| (after file name) | Treat file as command from which input is to be piped to Perl |
To take a more complex example, here's one way to feed output to the mypr printer on a UNIX system:
open (MYLPR, "|lpr -Pmypr");
print MYLPR "A line of output\n";
close MYLPR;
A special Perl operator for reading from files consists of two angle brackets-<>-around the file handle of the file from which you want to read. This operator returns the next line or lines of input from the file or device, depending on whether the operator is used in a scalar or an array context. When no more input remains, the operator returns false .
A construct such as
while (<STDIN>) {
print;
}
simply echoes each line of input back to the console until Ctrl+D (Ctrl+Z in Windows NT) is pressed, because the print function takes the current default argument here: the most recent line of input. For an explanation, see " Special Variables " later in this chapter.
If the user types
A
Bb
Ccc
^D
the screen looks like this:
A
A
Bb
Bb
Ccc
Ccc
^D
Notice that in this case, <STDIN> is in a scalar context, so one line of standard input is returned at a time. Compare that example with the following example:
print <STDIN>
In this case, because print expects an array of arguments (it can be a single-element array, but it's an array as far as print is concerned), the <> operator obligingly returns all the contents of STDIN as an array, and then print prints it. Because the array is fully built before it is printed, nothing is written to the console until the user presses Ctrl+D:
A
Bb
Ccc
^D
A
Bb
Ccc
This script prints out the contents of the file .SIGNATURE, double-spaced:
open (SIGFILE, ".signature");
while ( <SIGFILE> ) {
print; print "\n";
}
The first print here has no arguments, so it takes the current default argument and prints it. The second print has an argument, so it prints that instead. Perl's habit of using default arguments extends to the <> operator; if that operator is used with no file handle, Perl assumes that <ARGV> is intended. <ARGV> expands to each line in turn of each file listed in the command line.
If no files are listed in the command line, Perl instead assumes that STDIN is intended. The following code, therefore, keeps printing more... as long as something other than Ctrl+D appears in standard input:
while (<>) {
print "more.... ";
}
Perl 5 allows array elements to be references to any data type. As a result, you can build arbitrary data structures of the kind used in C and other high-level languages, but with all the power of Perl. You can have an array of associative arrays, for example. |
print "Looking for files along the path ($ENV{'PATH'})...\n";
The %ENV array is quite useful in CGI programming, in which parameters are passed from the browser to CGI programs as environment settings.
C programmers, beware: The first element of this array is the first actual argument, not the name of the program. The special variable $0 contains the name of the Perl script that is being executed. |
The following code prints the command-line arguments one per line, sorted alphabetically:
print join("\n", sort @ARGV);
The command-line arguments are of limited use in CGI scripts, in which arguments are passed via the environment rather than the command line. These arguments are quite useful in normal Perl work, of course.
$line=0;
while ( <SOMEFILE> ) {
++$line;
print "Line $line : ", $_;
}
You occasionally need to store the contents of $_ somewhere, as in the following example:
$oldvalue = $_;
But the opposite operation-setting the value of $_ manually-is rarely appropriate, as in this example:
$_ = $oldvalue;
Pattern matching and substitution take place on the contents of this variable unless you specify otherwise. These topics are covered in " Regular Expressions " later in this chapter.
This example reports failure if the open call failed:
open ( INFILE, "./missing.txt") || die "Couldn't open \"./missing.txt\" ($!).\n";
The || here is the Boolean or operator, which is covered in " Flow Control " later in this chapter. die causes Perl to terminate after printing the string given to die as an argument.
If the file does not exist, Perl terminates after displaying something like this:
Couldn't open "./missing.txt" (No such file or directory).
The form and content of error messages vary from one system to the next.
$Weekend = $Saturday || $Sunday;
In the next example, $Solvent is true only if $income is greater than 3 and $debts is less than 10:
$Solvent = ($income > 3) && ($debts < 10);
Now consider the logic of evaluating one of these expressions. It isn't always necessary to evaluate both operands of either an && or a || operator. In the first example earlier in this section, if $Saturday, is true, you know that $Weekend will be true, regardless of whether $Sunday is also true (the midnight condition, perhaps?).
This means that when the left side of an or expression is evaluated as true, the right side is not be evaluated. Combine this with Perl's easy way with data types, and you can say things like the following:
$value > 10 || print "Oops, low value...\n";
If $value is greater than 10, the right side of the expression is never evaluated, so nothing is printed. If $value is not greater than 10, Perl needs to evaluate the right side, too, so as to decide whether the expression as a whole is true or false. That means that Perl evaluates the print statement, printing out the message.
OK, it's a trick, but it's a very useful one.
Something analogous applies to the && operator. In this case, if the left side of an expression is false, the expression as a whole is false, so Perl does not evaluate the right side. The && operator can, therefore, be used to produce the same kind of effect as the || trick, but with the opposite sense, as in the following example:
$value > 10 && print "OK, value is high enough...\n";
As is true of most Perl constructs, the real power of these tricks comes when you apply a little creative thinking. Remember that the left and right sides of these expressions can be any Perl expressions; think of them as being conjunctions in a sentence rather than logical operators, and you'll get a better feel for how to use them. Expressions such as the following give you a little of the flavor of creative Perl:
$length <= 80 || die "Line too long.\n";
$errorlevel > 3 && warn "Hmmm, strange error level ($errorlevel)...\n";
open ( LOGFILE, ">install.log") || &bust("Log file");
The &bust in this example is a subroutine call, by the way. Refer to " Subroutines " later in this chapter for more information.
open ( INFILE, "./missing.txt") if $missing;
The execution of the statement is contingent upon both the evaluation of the expression and the sense of the operator.
The expression is evaluated as either true or false and can contain any of the relational operators listed in Table 1.4 (although it need not). Following are a few examples of valid expressions:
$full
$a == $b
<STDIN>
Operator | Numeric Context | String Context |
---|---|---|
Equality | == | eq |
Inequality | != | ne |
Inequality with signed result | <=> | cmp |
Greater than | > | gt |
Greater than or equal to | >= | ge |
Less than | < | lt |
Less than or equal to | <= | le |
When we're comparing strings, less than means lexically less than . If $left comes before $right when the two are sorted alphabetically, $left is less than $right . |
Perl has four modifiers, each of which behaves the way that you might expect from the corresponding English word:
. The statement is executed if the logical expression is true and is not executed otherwise. Examples:
$max = 100 if $min < 100;
print "Empty!\n" if !$full;
open (ERRLOG, "test.log") unless $NoLog;
print "Success" unless $error>2;
$total -= $decrement while $total > $decrement;
$n=1000; "print $n\n" while $n-- > 0;
$total += $value[$count++] until $total > $limit;
print RESULTS "Next value: $value[$n++]" until $value[$n] = -1;
Notice that the logical expression is evaluated only one time in the case of if and unless , but multiple times in the case of while and until . In other words, the first two are simple conditionals, and the last two are loop constructs.
The following example is somewhat similar to C's if syntax:
if ( ( $total += $value ) > $limit ) {
print LOGFILE "Maximum limit $limit exceeded. Offending value was $value.\n";
close (LOGFILE);
die "Too many! Check the log file for details.\n";
}
The if statement is capable of a little more complexity, with else and elsif operators, as in the following example:
if ( !open( LOGFILE, "install.log") ) {
close ( INFILE );
die "Unable to open log file!\n";
}
elsif ( !open( CFGFILE, ">system.cfg") ) {
print LOGFILE "Error during install: Unable to open config file for writing.\n";
close ( LOGFILE );
die "Unable to open config file for writing!\n";
}
else {
print CFGFILE "Your settings go here!\n";
}
until ( $total >= 50 ) {
print "Enter a value: ";
$value = scalar (<STDIN>);
$total += $value;
print "Current total is $total\n";
}
print "Enough!\n";
The while and until statements are described in " Conditional Expressions " earlier in this chapter. The for statement resembles the one in C. for is followed by an initial value, a termination condition, and an iteration expression, all enclosed in parentheses and separated by semicolons, as follows:
for ( $count = 0; $count < 100; $count++ ) {
print "Something";
}
The foreach operator is special; it iterates over the contents of an array and executes the statements in a statement block for each element of the array. Following is a simple example:
@numbers = ("one", "two", "three", "four");
foreach $num ( @numbers ) {
print "Number $num...\n";
}
The variable $num first takes on the value one , then two , and so on. That example looks fairly trivial, but the real power of this operator lies in the fact that it can operate on any array, as follows:
foreach $arg ( @ARGV ) {
print "Argument: \"$arg\".\n";
}
foreach $namekey ( sort keys %surnames ) {
print REPORT "Surname: $value{$namekey}.\n",
"Address: $address{$namekey}.\n";
}
RECORD: while ( <INFILE> ) {
$even = !$even;
next RECORD if $even;
print;
}
The three label-control statements are:
. Jumps to the next iteration of the loop marked by the label or to the innermost enclosing loop, if no label is specified.
sub Usage {
print "Usage: \n",
"twiddle [-args] infile outfile\n";
print "Copyleft 1996, Jonathan F. Squirmsby.";
}
Subroutines are called with & , as follows:
sub bust {
print "Oops, some kind of error seems to have occurred.\n";
die "Fatal error, terminating.\n";
}
open ( LOGFILE, ">install.log") || &bust;
In this example, the subroutine was defined before it was called. You can define and call subroutines in any order in Perl; the convention is to define them after the main routine.
open ( LOGFILE, ">install.log") || &bust("Failed to open log file \"install.log\".");
But here is where Perl's subroutine syntax starts to get a little strange; C programmers may want to take a seat before reading on.
All Perl subroutines receive their arguments as an arbitrarily long array of scalars with the special name of @_ . There is no mechanism for declaring the arguments when the subroutine is declared. There is no fixed number of arguments. Also, the calling function can pass any mixture of scalars and arrays; they are all be treated as one big @_ array when they get to the subroutine.
In the example earlier in this section, in which bust is called with a single argument, you can pick it up in the subroutine and use it to provide a more sensible error message, as in the following example:
sub bust {
($errortext) = @_;
print "Oops, an error occurred ($errortext).\n";
die "Fatal error, terminating.\n";
}
Notice that we went to the trouble of assigning the scalar $errortext to the argument array @_ . This assignment may seem to be unnecessary; in fact, we could have simply used @_ instead of $errortext in the print statement. Explicitly assigning variables to the contents of the @_ array is much clearer, though, especially when the subroutine takes multiple arguments. Compare the example
print "Error $_[0] opening file $_[1].\n";
with this one:
($errfile, $errtext) = @_;
print "Error $errtext opening file $errfile.\n";
Notice, too, that when we assigned the single value $errortext to the @_ array in the bust example, we placed it in parentheses. We did so to force an array context, so that what gets assigned to $errortext is the first (and only) value of the @_ array, not the number of values in @_ . In effect, we're telling Perl to treat $errortext as a single-element array. The earlier example that uses $errfile and $errtext is a clearer example of an array-to-array assignment.
In " Variable Scope " later in this chapter, you learn how to protect local variables such as $errortext in subroutines by using the local and my keywords.
&PrintRes( "alpha", (1, 3, 5, 7), "beta", (2, 4, 6, 8) );
Try to unpack these arguments into the following values as they come into the subroutine:
$p1 = "alpha";
@p2 = (1, 3, 5, 7);
$p3 = "beta";
@p4 = (2, 4, 6, 8);
A statement like
( $p1, @p2, $p3, @p4 ) = @_;
won't get beyond the second parameter. The following list explains what happens:
There's no point in trying to specify subarrays, as in the following example, because Perl expands the array on the left to the same thing as before:
( $p1, (@p2), $p3, (@p4) ) = @_;
The moral of the story is: Don't pass more than one array into a subroutine. And if you do pass an array, make sure that it's the last argument.
sub AddIt {
( $a, $b ) = @_;
$a + $b;
}
That means that the value 7 is substituted for the subroutine call after evaluation. The code
print "Summing 4 and 3 yields ", &AddIt(4, 3), ".\n";
prints the following:
Summing 4 and 3 yields 7.
Notice that we had to keep the subroutine call outside the quotes to allow Perl to recognize & as a subroutine invocation.
It isn't always clear which statement is the last to be executed in a subroutine, particularly if it contains loops or conditional statements. One way to ensure that the correct value is returned is to place a reference to the variable on a line by itself at the end of the subroutine, as follows:
sub Maybe {
# Various loops and conditionals here which set the value of "$result"...
$result;
}
Take care not to add seemingly innocuous statements near the end of a subroutine. A print statement returns a value of 1 (if successful) for example, so a subroutine that prints something just before it returns always returns 1. |
The return value can be a scalar, an array, or an associative array. Listing 1.1 shows a complete example in which a subroutine builds an associative array of names keyed by initials and then returns the associative array. The keys of this array-the initials-are then printed in sorted order. Take your time reading through this example; a lot is going on in there, but it's comprehensively commented.
Listing 1.1-INITIALS.PL: Returning an Associative Array from a Subroutine
#!/usr/local/bin/perl -w
# Pass the names into the subroutine.
# Store the results in an associative array called "keyedNames".
%keyedNames = &GetInitials("Jane Austen", "Emily Bronte", "Mary Shelley" );
# Print out the initials, sorted:
print "Initials are ", join(', ', sort keys %keyedNames), ".\n";
# The GetInitials subroutine.
sub GetInitials {
# Let's store the arguments in a "names" array for clarity.
@names = @_;
# Process each name in turn:
foreach $name ( @names ) {
# The "split" function is explained in Chapter 15, "Function List".
# In this statement, we're getting split to look for the ' ' in the name;
# It returns an array of chunks of the original string (i.e. $name) which were
# separated by spaces, i.e. the forename and surname respectively in our case.
# The variables "$forename" and "$surname" are then assigned to this array
# using parentheses to force an array assignment.
( $forename, $surname ) = split( ' ', $name );
# OK, now we have the forename and surname. We use the "substr" function,
# also explained in chapter 15, to extract the first character from each of these.
# The "." operator concatenates two strings (for example, "aa"."bb" is "aabb")
# so the variable "$inits" takes on the value of the initials of the name:
$inits = substr( $forename, 0, 1 ) . substr( $surname, 0, 1 );
# Now we store the name in an associative array using the initials as the key:
$NamesByInitials{$inits} = $name;
}
# Having built the associative array, we simply refer to it at the end of the
# subroutine so that it's value is the last thing evaluated here. It will then
# be passed back to the calling function.
%NamesByInitials;
}
$name = "Dana";
@name = ("Donna", "Dana", "Diana");
%name = ("Donna", "Elephants", "Dana", "Finches", "Diana", "Parakeets");
print "I said $name{$name}, not $name{$name[0]}!\n";
The bad news is that by default, Perl uses just one name space for each data type, for all functions. So if you have a variable called $temp in the main function, and you call a routine that uses another variable called $temp , the value of $temp in the main function gets clobbered. The references to the two variables are in fact two references to the same variable, as far as Perl is concerned.
That's where the local (Perl 4 and 5) and my (Perl 5 only) functions come in. These functions force Perl to treat variables as though they are local to the current code block, whether that block is a loop, an if-block, or a subroutine.
The following example uses two variables called $temp (one outside and one inside a while loop):
$temp = "Still here!\n";
print "Enter a few words at a time, Ctrl+D to terminate:\n";
while (<>) {
local( $temp, @etc ) = split(' ', $_ );
print "You said $temp";
@etc && print " and then you said @etc";
print ". Enter some more, or press Ctrl+D to end:\n";
}
print $temp;
The difference between Perl 4's local() and Perl 5's my() is that local variables are local to the current package, whereas my variables are really local.
The basic pattern-matching operations discussed in this section are:
, in which we want to know whether a particular string matches a pattern
The patterns referred to here are more properly known as regular expressions, and we'll start by looking at them.
A few concrete examples usually help after an overblown definition like that one. The regular expression b. matches the strings bovine, above, Bobby , and Bob Jones , but not the strings Bell, b , or Bob . That's because the expression insists that the letter b (lowercase) must be in the string and must be followed immediately by another character.
The regular expression b+ , on the other hand, requires the lowercase letter b at least once. This expression matches b and Bob in addition to the example matches for b. in the preceding paragraph. The regular expression b* requires zero or more b s, so it matches any string. That seems to be fairly useless, but it makes more sense as part of a larger regular expression. Bob*y , for example, matches all of Boy, Boby , and Bobby but not Boboby .
Assertion | Matches | Example | Matches | Doesn't Match |
---|---|---|---|---|
^ | Start of string | ^fool | foolish | tomfoolery |
$ | End of string | fool$ | April fool | foolish |
\b | Word boundary | be\bside | be side | beside |
\B | Nonword boundary | be\Bside | beside | be side |
Atom | Matches | Example | Matches | Doesn't Match |
---|---|---|---|---|
period (.) | Any character except new line | b.b | bob | bb |
List of characters in brackets | Any one of those characters | ^[Bb] | Bob, bob | Rbob |
Regular expression in parentheses | Anything that regular expression matches | ^a(b.b)c$ | abobc | abbc |
Quantifier | Matches | Example | Matches | Doesn't Match |
---|---|---|---|---|
* | Zero or more instances of the atom | ab*c | ac, abc | abb |
+ | One or more instances of the atom | ab+c | abc | ac |
? | Zero or one instances of the atom | ab?c | ac, abc | abbc |
{n} | n instances of the atom | ab{2}c | abbc | abbbc |
{n,} | At least n instances of the atom | ab{2,}c | abbc, abbbc | abc |
{nm} | At least n , at most m instances of the atom | ab{2,3}c | abbc | abbbbc |
Symbol | Matches | Example | Matches | Doesn't Match |
---|---|---|---|---|
\d | Any digit | b\dd | b4d | bad |
\D | Nondigit | b\Dd | bdd | b4d |
\n | New line | |||
\r | Carriage return | |||
\t | Tab | |||
\f | Form feed | |||
\s | White-space character | |||
\S | Non-white-space character | |||
\w | Alphanumeric character | a\wb | a2b | a^b |
\W | Nonalphanumeric character | a\Wb | aa^b | aabb |
This mechanism is a backslash ( \ ), followed by a numeric quantity. This quantity can take any of the following formats:
matched quantities after a match. These matched quantities are called backreferences and are explained in the following section.
$prompt = "Enter some text or press Ctrl+D to stop: ";
print $prompt;
while (<>) {
/^[aA]/ && print "Starts with a or A. ";
/[0-9]$/ && print "Ends with a digit. ";
/perl/ && print "You said it! ";
print $prompt;
}
?$filename =~ /dat$/ && die "Can't use .dat files.\n";
A corresponding operator, !~ , has the opposite sense. !~ is true if the first operator does not match on the second, as follows:
$ENV{'PATH'} !~ /perl/ && warn "Not sure if perl is in your path...";
$installpath =~ m!^/usr/local! || warn "The path you have chosen is odd.\n";
warns that "The path you have chosen is odd.\n " if the variable $installpath starts with /usr/local .
Switch | Meaning |
---|---|
g | Perform global matching |
i | Perform case-insensitive matching |
o | Evaluate the regular expression one time only |
The g switch continues matching even after the first match has been found. This switch is useful when you are using backreferences to examine the matched portions of a string, as described in the " Backreferences " section later in this chapter.
The i switch forces a case-insensitive match.
Finally, the o switch is used inside loops in which a great deal of pattern matching is taking place. This switch tells Perl that the regular expression (the match operator's operand) is to be evaluated one time only. The switch can improve efficiency when the regular expression is fixed for all iterations of the loop that contains it.
while (<>) {
/\b(\S{4})\s(\S{4})\s(\S{4})\b/ && print "Gosh, you said $1 $2 $3!\n";
}
The first four-letter word lies between a word boundary ( \b ) and some white space ( \s ), and consists of four non-white-space characters ( \S ). If there is a match on the expression \b(\S{4})\s -if a four-letter word is found-the matching substring is stored in the special variable \1 , and the search continues. When the search is complete, you can refer to the backreferences as $1, $2 , and so on.
What if you don't know in advance how many matches to expect? Perform the match in an array context; Perl returns the matches in an array. Consider this example:
@hits = ("Yon Yonson, Wisconsin" =~ /(\won)/g);
print "Matched on ", join(', ', @hits), ".\n";
We'll start at the right side and work backward. The regular expression (\won) means that we match any alphanumeric character followed by on and store all three characters. The g option after the // operator means that we want to do this for the entire string, even after we find a match. The =~ operator means that we carry out this operation on a given string ( Yon Yonson, Wisconsin ). Finally, the whole thing is evaluated in an array context, so Perl returns the array of matches, and we store it in the @hits array. Following is the output from this example:
Matched on Yon, Yon, son, con.
The pattern to be replaced goes between the first and second delimiters, and the replacement pattern goes between the second and third delimiters. This simple example changes $house from henhouse to doghouse :
$house = "henhouse";
$house =~ s/hen/dog/;
Notice that it isn't possible to use the =~ operator with a literal string as you can when matching, because you can't modify a literal constant. Instead, store the string in a variable and modify that variable.
This book now moves on to Web matters, but look in the following places for more information about Perl:
Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103 rd Street, Indianapolis, IN 46290.Notice: This material is from Special Edition, Using Perl for Web Programming , ISBN: 0-7897-0659-8. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.