There was a problem in one of my bash scripts. The script (master-universe) is the starter of another script. It should be run daily at midnight to check if a child process, which is the manager of the another script, is still running. If the child process is somehow dead, it will (re) execute the child.
theInduk.sh (call) --- master-universe.sh (call) --- child_proses.sh
I tested the script with bash and /bin/sh, it seems to be working just fine.
kill_master() { for pid in `ps aux| grep -v grep |grep 'master-universe' |awk '{print $2}'`; do echo "Killing pid: " $pid kill -9 $pid done }
The function above is the source of the problem. When it execute using crontab, it refused go inside the for loop as if the process master-universe does not exist (checked via ps aux from cli).
My first thought was /bin/sh can’t interpret the for syntax for whatever reasons. But thats not true because when run it from the console
sh ./theInduk.sh
everything seemed allright.
After spending more than 1 hour debugging and nearly thrashing my mouse to the 19” lcd, I realized it! I added a line to the code, and it revealed everything.
kill_master() {
ps aux > /tmp/napeTakSama.txt
for pid in `ps aux| grep -v grep |grep ‘master-universe’ |awk ‘{print $2}’`; do
echo “Killing pid: ” $pid
kill -9 $pid
done
}
Result of /tmp/napeTakSama.txt
root 809 0.0 0.0 5688 996 v3 Is+ 2Jul09 0:00.00 /usr/libexec/gett root 810 0.0 0.0 5688 996 v4 Is+ 2Jul09 0:00.00 /usr/libexec/gett root 811 0.0 0.0 5688 996 v5 Is+ 2Jul09 0:00.00 /usr/libexec/gett root 812 0.0 0.0 5688 996 v6 Is+ 2Jul09 0:00.00 /usr/libexec/gett root 813 0.0 0.0 5688 996 v7 Is+ 2Jul09 0:00.00 /usr/libexec/gett root 60580 0.0 0.0 20440 1624 p0 I 6Jan10 0:00.00 su root 60581 0.0 0.1 10104 2712 p0 I+ 6Jan10 0:00.02 _su (csh) xmen 83289 0.0 0.1 9016 2248 p0 Is 21Jul09 0:00.04 /usr/local/bin/ba root 941 0.0 0.1 10104 3104 p1 Is 2Jul09 0:00.10 /bin/csh root 61684 0.0 0.1 9456 2980 p1 S+ 6Jan10 0:11.10 tcpdump -avvv -i root 971 0.0 0.1 10104 3084 p2 Is+ 2Jul09 0:00.15 /bin/csh xmen 3960 0.0 0.1 9016 2176 p3 Is 3Jul09 0:00.01 -bash (bash) xmen 3966 0.0 0.0 8144 1716 p3 S+ 3Jul09 0:23.29 screen -l xmen 3970 0.0 0.1 9016 2176 p4 Is+ 3Jul09 0:00.06 /usr/local/bin/ba xmen 4029 0.0 0.1 9016 2252 p5 Is 3Jul09 0:00.12 /usr/local/bin/ba
Can you see the problem?
Yup.. the result of ps aux above is somehow truncated to 81 characters for each line and therefore my grep cannot find the master-universe string.
The Solution?
Instead of ps aux I just used ‘ps ax’ and filtered the first column.
for pid in `ps ax| grep -v grep |grep 'master-universe' |awk '{print $1}'`; do
The above code worked as I expected.
This happen on FreeBSD 7.2. (I hate bsd!) I tried the original code on Linux (Ubuntu Karmic) and guess what? The result is as expected from a well behaved innocent OS.
(Note to myself: Remember, always love Linux).