Cronjobs

Cron(1) is one of the most useful unix facitlies around. If you need to have something occur on a regular basis, cron is your friend. However, among new users, cron is also one of the most misunderstood unix tools. The sections that follow will try to explain some of the mysteries, and offer guidelines for getting the most out of it.

But, it worked on the command line

This is probably the most frequently difficulty that people encounter when using cron. Your script works fine when you run it from the command line, but behaves unexpected when run via cron. This ususally comes down to a matter of environment. When writing scripts to be run by cron, the basic rule of thumb is this: assume nothing about the environment. Under cron, you won't get much more than that.

To get an understanding of what I'm talking about, schedule this as an experiment.

* * * * * /usr/bin/env

Yes, this will send out an email message once a minute. Feel free to remove the job after the first one arrives. The important thing to note here is the contents of the message:

Your "cron" job on hostname
/usr/bin/env

produced the following output:

HOME=/home/srevilak
LOGNAME=srevilak
PATH=/usr/bin:
SHELL=/usr/bin/sh
TZ=US/Eastern

That's not much of an environment, is it? But like it or not, that's what you've got to work with. Again, the rule is "assume nothing about the environment". In particular, be sure to give complete definitions of PATH, and any other environment variables that your script needs. Note that this is a very good guideline for shell programming in general: the behavior of a shell script should never depend on the user that invokes it. More work, yes, but the reward is that your script will always behave consistently and predictably, no matter who uses them.

Dealing with output

One of the biggest mistakes that people make when writing cron scripts is to dump a lot of information to standard output. By default, the invoking user will receive this information via email. If you have one crontab entry that's run once a day, this isn't a bad thing. You might even appreciate the assurance. For example, I have a checkout script that runs each day before I arrive to work. It sends me the output of this command:

cvs -q checkout -P everything

To me this is important -- it allows me to see what files were changed in the last day, and it let's me know if I've got any uncomitted files sitting around in my sandbox. Every day, I look line by line at what cvs checkout said, and I do something with that information.

Now let's look at the other extreme, where this approach doesn't work quite so well. Consider the person who gets root's mail on a network of several dozen machines. Each box runs a nightly maintenance script. Do you really want to arrive in the morning, to find 60-70 of these messages waiting for you? More importantly, do you plan on attentively reading each one of them?

If you answered "no;" to either of the above questions, then there are two rules of thumb that you should follow:

Scripts that succeed should succeed silently.
Only generate output if you plan on reading it.

Let's go back to our example of the daily job that runs each night on 60 boxes. If it works, you get no mail. If one machine has a problem, you get one message. Because it's only one message, the problem is immediately obvious. The challenge there is to make your script intelligent. Check the exit codes of commands that it runs. Capture output and examine it. This is more work up front, but the time that it saves in the long run is worth the investment.

But wait a second. There are times when it's useful to see diagnostic output, especially when dealing with intermittent problems. Easy, instead of doing this:

echo "$0: rebuilding updatedb"

do something like this instead:

log () {
  $verbose && echo "$0: $@"
}
  ...
log "rebuilding updatedb"

Here, verbose is a variable is value is true or false. You can arrange for it to be set via the command line, or simply put the assignment in your script and change as needed. One can even do clever things like

verbose=false
tty -s && verbose=true

This will give your script "quiet" behavior when run from the command line, and "verbose" behavior when run interactively.

Well, gee, that's fine, but I'd really like to see more detail when something goes wrong. Fair enough. Send stdout to a file, and examine the file afterwards. For example, if your script happened to be doing nightly builds of a software product, you might opt for something like this:

me=`basename $0`
logfile=/var/tmp/$0.log
   ...
rm -f $logfile

log "doing build"
ant -f build.xml >> $logfile 2>&1
status=$?
if test $status -ne 0 || grep -i failed $logfile >/dev/null; then
  echo "Build broken" 
  echo "last few lines before error"
  grep -B10 -i failed $logfile
  echo
  echo "full output:"
  cat $logfile
  exit 1
fi

The first few lines establish a places for collecting output. Here, we're just interested in capturing the results of a single invocation, not maintaining history. The next few lines perform some work. Afterwards, our script checks for signs of trouble. If there's trouble, it complains. If not, it just goes silently on its way. Easy, huh?

The Evils of Filtering

One of the biggest mistakes one can make is to have a large number of automated jobs, then attempt to use inbox filtering to separate successful runs from those that work out as planned. Why is this bad? I know of a development group that did hourly builds of a software product; each build produced a message with one of two subject lines: "build okay" or "build failed", as dictated by the presence of compiler errors in the build output. Folks in this group set their mail clients to drop any messages that didn't have the word "failed" in the subject line.

One day, someone noticed that the mod times of all of the files in the build's target directory were over a week old. A little investigation turned up a long string of messages with one line in the body

test: argument expected

Of course, in this case, the execution of the build relied on test returning a true value at least some of the time -- but it was konking out long before the compilation stage. Since the compiler never ran, the logic that set the subject based on the compiler output wasn't able to detect the problem, and neither was anyones mail filter. The result? Two weeks without regular builds, and no one noticed. Ouch :(

The moral of the story - only write what you plan to read. And then, be sure to read it.

$Id: cron.html,v 1.2 2004/02/23 01:04:29 srevilak Exp $