Python Process Watcher

Occasionally an application may crash unexpectedly. Instead of reinventing the wheel I found a simple unix/linux daemon in Python for the daemon functionality and just added the run section logic. Using the bits below I can monitor or potentially restart the failed application. The run section forks a daemon which checks for the existence of a processes using pgrep every 5 seconds in a loop. It reads from a file named observe.list for any number of named processes. Each line in observe.list should contain a unique process name like puppetmasterd and on the next line httpd or whatever you'd like to watch. I tried using logger with SysLogHandler but I'm not sure it was available with python 2.4.3 which ships with RHEL 5 or CentOS 5.

#!/usr/bin/env python

import sys, time, subprocess, syslog
from daemon import Daemon

# file which contains the list of process to observe
process_file = open('observe.list', 'r')
process_list = []
for line in process_file:
    process_list.append(line.rstrip('\n'))

def isRunning ( process_name ):
    ps = subprocess.call("pgrep "+process_name, shell=True, stdout=subprocess.PIPE)
    if ps is 1:
        return False
    else:
        return True

class Observe(Daemon):
    def run(self):
        while True:
            for process in process_list:
                if isRunning(process) == False:
                    syslog.syslog(process + " not running!")
                    time.sleep(5)
                else:
                    syslog.syslog(process + ' is running!')
                    time.sleep(5)

if __name__ == "__main__":
    daemon = Observe('/tmp/observe.pid')
    if len(sys.argv) == 2:
        if 'start' == sys.argv[1]:
            daemon.start()
        elif 'stop' == sys.argv[1]:
            daemon.stop()
        elif 'restart' == sys.argv[1]:
            daemon.restart()
        else:
            print "Unknown command"
            sys.exit(2)
        sys.exit(0)
    else:
        print "usage: %s start|stop|restart" % sys.argv[0]
        sys.exit(2)

Why work doesn't happen at work

Jason Fried: Why work doesn't happen at work

Jason Fried describes the high cost of context switches, meetings and interruptions in the workplace.

Very Useful Oracle Documentation

A friend of mine recently pointed me to an Oracle DBA resource that has one of the most comprehensive documents I've ever seen for getting things done in Oracle. Of particular interest, Oracle DBA Code Examples which contains hundreds of examples.

Mac OS X Server 10.6 Essentials

I passed Snow 201 today. Slightly more difficult than I expected, mostly due to idiosyncrasies that must be memorized.

Your Score: Pass - 90.66% (82.5 earned out of 91 possible)

Apple certification is a little strange. Passing just 201 doesn't certify me as anything, however passing 101 makes an "Apple Certified Support Professional", while passing 101 and 201 make an Apple Certified Technical Coordinator. To become an Apple Certified System Administrator (not too many of these around) one must complete 201, 301 (Directory Services), 302 (Deployment), 303 (Mobility) exams.

I still need to go back and knock out Snow 101, should be pretty easy with a little studying.

OpenSolaris Dead

Basically, the open source development model has now been axed and OpenSolaris is officially now dead.

OpenSolaris wasn't widely used to begin with, by most definitions it could even be called a non-starter. These non-organic "open" projects using peculiar licenses which are generally incompatible with the established community choice are all destined to fail.