Crontab Problem

There was a problem in one of my bash scripts. The script (master-universe) is the starter of another script. It should be run daily at midnight to check  if a child process, which is the manager of the another script, is still running. If the child process is somehow dead, it will (re) execute the child.

theInduk.sh
	(call)  ---  master-universe.sh
		   (call)  ---   child_proses.sh

I tested the script with bash and /bin/sh, it seems to be  working just fine.

 kill_master() {
for pid in `ps aux| grep -v grep |grep 'master-universe' |awk '{print $2}'`; do
echo "Killing pid: " $pid
kill -9 $pid
done
}

The function above is the source of the problem. When it execute using crontab, it refused go inside the for loop as if  the process master-universe does not exist (checked via ps aux from cli).

My first thought was /bin/sh can’t interpret the for syntax for whatever reasons. But thats not true because when run it from the console

sh ./theInduk.sh

everything seemed allright.

After spending more than 1 hour debugging and nearly thrashing my mouse to the 19”  lcd, I realized it! I added a line to the code, and it revealed everything.

kill_master() {
ps aux > /tmp/napeTakSama.txt
for pid in `ps aux| grep -v grep |grep ‘master-universe’ |awk ‘{print $2}’`; do
echo “Killing pid: ” $pid
kill -9 $pid
done
}

Result of /tmp/napeTakSama.txt

root    809  0.0  0.0  5688   996  v3  Is+   2Jul09   0:00.00 /usr/libexec/gett
root    810  0.0  0.0  5688   996  v4  Is+   2Jul09   0:00.00 /usr/libexec/gett
root    811  0.0  0.0  5688   996  v5  Is+   2Jul09   0:00.00 /usr/libexec/gett
root    812  0.0  0.0  5688   996  v6  Is+   2Jul09   0:00.00 /usr/libexec/gett
root    813  0.0  0.0  5688   996  v7  Is+   2Jul09   0:00.00 /usr/libexec/gett
root  60580  0.0  0.0 20440  1624  p0  I     6Jan10   0:00.00 su
root  60581  0.0  0.1 10104  2712  p0  I+    6Jan10   0:00.02 _su (csh)
xmen 83289  0.0  0.1  9016  2248  p0  Is   21Jul09   0:00.04 /usr/local/bin/ba
root    941  0.0  0.1 10104  3104  p1  Is    2Jul09   0:00.10 /bin/csh
root  61684  0.0  0.1  9456  2980  p1  S+    6Jan10   0:11.10 tcpdump -avvv -i
root    971  0.0  0.1 10104  3084  p2  Is+   2Jul09   0:00.15 /bin/csh
xmen  3960  0.0  0.1  9016  2176  p3  Is    3Jul09   0:00.01 -bash (bash)
xmen  3966  0.0  0.0  8144  1716  p3  S+    3Jul09   0:23.29 screen -l
xmen  3970  0.0  0.1  9016  2176  p4  Is+   3Jul09   0:00.06 /usr/local/bin/ba
xmen  4029  0.0  0.1  9016  2252  p5  Is    3Jul09   0:00.12 /usr/local/bin/ba

Can you see the problem?

Yup.. the result of ps aux above is somehow truncated to 81 characters for each line and therefore my grep cannot find the master-universe string.

The Solution?
Instead of ps aux I just used ‘ps ax’ and filtered the first column.

for pid in `ps ax| grep -v grep |grep 'master-universe' |awk '{print $1}'`; do

The above code worked as I expected.

This happen on FreeBSD 7.2. (I hate bsd!) I tried the original code on Linux (Ubuntu Karmic) and  guess what? The result is as expected from a well behaved innocent OS.

(Note to myself: Remember, always love Linux).

Leave a Reply