Thursday, September 6, 2007

Perl as replacement for shell scripting (Part I)

By shell scripting I mean bash as it is what most (all?) Linux distributions use. Bash can be used as a quite capable programming language. Bash allows programmer to build rather complex scripts by using other programs as building blocks. System comes with a number of such building blocks: find, grep, sed, awk and many others and unsurprisingly there is a lot you can do with them. But it is often a challenge to write robust shell scripts which work or at least fail gracefully for any kind of input. The main reason is that historically shell scripts could use one only data type - string*. Those building blocks, external programs you use in shell scripts have very restricted interface: there are program arguments which are strings, stream of strings as input, stream of strings as output and exit code.

Even a simple concept like a list have to be emulated. For example a list of file names often is passed as a string which contains these file names separated by whitespace. But what if one of these file names contains whitespace? You get a problem. To fix it you need to escape whitespace characters in the filename. And it is rather easy to miss places where you have to do escaping. A bit convolved example:

rm `ls`
This would delete all files in the current directory .. unless they have whitespace characters in their names. There are many similar cases where an unwary programmer can make a mistake in his(her) shell script. Passing data from one process to another often requires a lot of care and the simplest code is often wrong. Another problem is that you are very limited in how you can handle errors in shell scripts - you only have process's exit code to tell you if it finished successfully. And usually it is just a boolean value saying you if there was any error or not. Quote from the linked document:
However, many scripts use an exit 1 as a general bailout upon error. Since exit code 1 signifies so many possible errors, this probably would not be helpful in debugging.
If say mkdir fails your script cannot easily tell if it is because another directory with the same name already exists or you just don't have permissions for this operation.

So any solutions to this problem? As for myself any moment I see my shell script getting longer then three lines of code I rewrite whole thing into Perl. In Perl you don't need to use external programs as much as often as you need in bash. Therefore you are not limited to their restrictive interfaces of them (remember, only strings and exit codes for input and output); native Perl APIs can be much more expressive when they need to.

There is a price though. Perl code is not always as compact as similar shell code for some scripting tasks. This is because the shell scripting is optimized very well to handle interaction of processes and Perl is not as much. It is worth to mention that many things which come for granted in the shell scripting often require you using Perl modules including non standard CPAN Perl modules. It is not problem as such except that not all Perl programmers know where to look for things if they are not covered by perlfunc. This mainly a concern for newbie Perl programmers but it is still a real problem. Also using CPAN modules is not always an option.

Of course in your Perl program you can fail back to using same external programs you would use in a shell script but then you lose advantages of Perl over shell scripting. So .. don't do this if possible. As interesting example of this principle: Perl before version 5.6.0 would fail back to shell to execute operation glob. That was causing various problems for Perl developers: for example I saw Perl programs using glob to fail when run on one tightly secured web hosting server because binary Perl was calling was simply removed from the server for security reasons. In later versions of Perl the implementation of glob was changed: it is implemented purely in Perl now and doesn't use external programs.

To be continued in Part II: mapping between common shell operations and corresponding Perl modules.

[*] New versions of bash support arrays. I'd argue that usefulness of arrays in bash is limited as programs you call from shell scripts cannot use them to pass output data. You are still limited to string streams and exit codes. Not to mention that this is not very portable across different systems.


Richie said...

Found your blog post after searching for tips on converting old shell scripts to Perl. I find, as you have posted, that my shell scripts become quite unruly after 10 or so lines of code. I'm hoping that Perl will be able to reduce and simplify my code substantially. Here's hoping you write Part 2.

Wolf said...

Perl is the best scripting language for Text processing and handle regex. I have posted few articles related to those at my blog

Also Perl's Cpan has lots of support that I don't even need to think extra while developing project. I didn't find such help on other programming language except Java and .NET