Security Issues in Perl Scripts
A programming language, by design, does not normally constitute a security risk; it is with the programmer that the risk is introduced. Almost every language has certain flaws that may facilitate to some extent the creation of insecure software, but the overall security of a piece of software still depends largely on the knowledge, understanding, and security consciousness of the authors. Perl has its share of security "gotchas", and most Perl programmers are aware of none of them.
In this article, we will look at some of the most widely misused and overlooked features of Perl. We'll see how their incorrect use can pose threats to the security of the system on which they are running as well as to their users. We will show how such weaknesses can be exploited and how to fix or avoid them.
Basic user input vulnerabilities
One big source of security problems in Perl scripts is improperly validated (or unvalidated) user input. Any time your program might take input from an untrusted user, even indirectly, you should be cautious. For example, if you are writing CGI scripts in Perl, expect that malicious users will send you bogus input.
If trusted and used without validation, improper user input to such applications can cause many things to go wrong. The most common and obvious mistake is executing other programs with user provided arguments, without proper validation.
The system() and exec() functions
Perl is famous for its use as a ``glue'' language -- it does an excellent job of calling other programs to do the work for it, carefully coordinating their activities by collecting the output of one program, reformatting it in a particular manner and passing it as input to some other program so everything runs smoothly. As the Perl slogan tells us, there is more than one way to do this.
One way to execute an external program or a system command is by calling the exec() function. When Perl encounters an exec() statement, it looks at the arguments that exec() was invoked with, then starts a new process executing the specified command. Perl never returns control to the original process that called exec().
Another similar function is system(). system() acts very much like exec(). The only major difference is that Perl first forks off a child from the parent process. The child is the argument supplied to system(). The parent process waits until the child is done running, and then proceeds with the rest of the program. We will discuss the system() call in greater detail below, but most of the discussion applies to exec() just as well.
The argument given to system() is a list --- the first element on the list is the name of the program to be executed and the rest of the elements are passed on as arguments to this program. However, system() behaves differently if there is only one parameter. When that is the case, Perl scans the parameter to see if it contains any shell metacharacters. If it does, then it needs those characters to be interpreted by a shell. Therefore, Perl will spawn a command shell (often the Bourne shell) to do the work. Otherwise, Perl will break up the string into words, and call the more efficient C library call execvp(), which does not understand special shell characters.
Now suppose we have a CGI form that asks for a username, and shows some file containing statistics for that user. We might use a system() call to invoke 'cat' for that purpose like this:
system ("cat /usr/stats/$username");
and the $username came from the form:
$username = param ("username");
The user fills in the form, with username = jdimov for example, then submits it. Perl doesn't find any meta-characters in the string ``cat /usr/stats/jdimov'' so it calls execvp(), which runs ``cat'' and then returns to our script. This script might look harmless, but it can actually be exploited by a malicious attacker. The problem is that by using special characters in the 'username' field on the form, an attacker can execute any command through the shell. For example, let's say the attacker were to send the string "jdimov; cat /etc/passwd". Perl recognizes the semicolon as a meta-character and passes this to the shell:
cat /usr/stats/jdimov; cat /etc/passwd
The attacker gets both the dummy stats file and the password file. If the attacker is feeling destructive, he could just send "; rm rf /*".
We mentioned earlier that system() takes a list of parameters and executes the first element as a command, passing it the rest of the elements as arguments. So we change our script a little so that only the program we want gets executed:
system ("cat", "/usr/stats/$username");
Since we specify each argument to the program separately, a shell will never get invoked. Therefore, sending ";rm -rf /*" will not work, because the attack string will be interpreted as a filename only.
This approach is much better than the one argument version, since it avoids use of a shell, but there are still potential pitfalls. In particular, we need to worry about whether the value of $username could ever be used to exploit weaknesses of the program that is being executed (in this case "cat"). For example, an attacker could still exploit our rewritten version of the code to show the system password file by setting $username to the string "../../etc/passwd".
Many other things can go wrong, depending on the program. For example, some applications interpret special character sequences as requests for executing a shell command. One common problem is that some versions of the Unix "mail" utility will execute a shell command when they see the ~! escape sequence in particular contexts. Thus, user input containing "~!rm -rf *" on a blank line in a message body may cause trouble under certain circumstances.
As far as security is concerned, everything stated above with regard to the system() function applies to exec() too.
The open() function
The open() function in Perl is used to open files. In its most common form, it is used in the following way:
open (FILEHANDLE, "filename");
Used like this, "filename" is open in read-only mode. If "filename" is prefixed with the ">" sign, it is open for output, overwriting the file if it already exists. If it is prefixed with ">>" it is open for appending. The prefix "<" opens the file for input, but this is also the default mode if no prefix is used. Some problems of using unvalidated user input as part of the filename should already be obvious. For example the backward directory traversing trick works just as well here.
There are other worries. Let's modify our script to use open() instead of "cat". We would have something like:
open (STATFILE, "/usr/stats/$username");
and then some code to read from the file and show it. The Perl documentation tells us that:
If the filename begins with "|", the filename is interpreted as a command to which output is to be piped, and if the filename ends with a "|", the filename is interpreted as a command which pipes output to us.
The user can then run any command under the /usr/stats directory, just by postfixing a '|'. Backwards directory traversal can allow the user to execute any program on the system.
One way to work around this problem is to always explicitly specify that you want the file open for input by prefixing it with the '<' sign:
open (STATFILE, "</usr/stats/$username");
Sometimes we do want to invoke an external program. For example, let's say that we want to change our script so it reads the old plain-text file /usr/stats/username, but passes it through an HTML filter before showing it to the user. Let's say we have a handy utility sitting around just for this purpose. One approach is to do something like this:
open (HTML, "/usr/bin/txt2html /usr/stats/$username|"); print while <HTML>;
Unfortunately, this still goes through the shell. However, we can use an alternate form of the open() call that will avoid spawning a shell:
open (HTML, "-|") or exec ("/usr/bin/txt2html", "/usr/stats/$username"); print while <HTML>;
When we open a pipe to "-", either for reading ("-|") or for writing ("|-"), Perl forks the current process and returns the PID of the child process to the parent and 0 to the child. The "or" statement is used to decide whether we are in the parent or child process. If we're in the parent (the return value of open() is nonzero) we continue with the print() statement. Otherwise, we're the child, so we execute the txt2html program, using the safe version of exec() with more than one argument to avoid passing anything through the shell. What happens is that the child process prints the output that txt2html produces to STDOUT and then dies quietly (remember exec() never returns), while in the mean time the parent process reads the results from STDIN. The very same technique can be used for piping output to an external program:
open (PROGRAM, "|-") or exec ("/usr/bin/progname", "$userinput"); print PROGRAM, "This is piped to /usr/bin/progname";
These forms of open() should always be preferred to a direct piped open() when pipes are needed, since they don't go through the shell.
Now suppose that we converted the statistics files into nicely formatted HTML pages, and for convenience decided to store them in the same directory as the Perl script that shows them. Then our open() statement might look like this:
open (STATFILE, "<$username.html");
When the user passes username=jdimov from the form, the script shows jdimov.html. There is still the possibility of attack here. Unlike C and C++, Perl does not use a null byte to terminate strings. Thus the string "jdimov\0blah" is interpreted as just "jdimov" in most C library calls, but remains "jdimov\0blah" in Perl. The problem arises when Perl passes a string containing a null to something that has been written in C. The UNIX kernel and most UNIX shells are pure C. Perl itself is written primarily in C. What happens when the user calls our script as such: "statscript.pl?username=jdimov\%00"? Our script passes the string "jdimov\%00.html" to the corresponding system call in order to open it, but since those system calls are coded in C and expect null-terminated strings, they the .html part. The results? The script will just show the file ``jdimov'' if it exists. It probably doesn't, and even if it does, no big deal. But what if we call the script with:
If the script is in the same directory as our html files, then we can use this input to trick the poor script into showing us all its guts. This may not be too much of a security threat in this case, but it certainly can be for other programs, since it allows an attacker to analyze the source for other exploitable weaknesses.