- My recent and current projects.
- My Prior projects.
- A little bit about me.
Recent and Current Projects
These projects are the ones people seem to have the most interest in or are being actively used.Configuration management with Config/DACS
From 2005-2008 "Config" was extensively modified and updated. It is now named DACS - Distribution and Configuration System. In DACS subversion replaces CVS as the source code control system. Plus it has greatly enhanced features compared to the original Config. You should visit its home page at http://www.cs.umb.edu/~rouilj/DACS. The Config/DACS system started from a paper I wrote for LISA in 1994 with the assistance of Rick Martin. It was titled It integrated:- A database mechanism to record information about computer configurations
- A version control system to allow:
- tracking/auditing and rollback of changes
- automatic validation of changes
- access control to parts of the configuration tree
- A file generation mechanism driven from the database to create configuration files from standard formats
- A file distribution mechanism
Real time log analysis using SEC - the simple event correlator
In November 2004 I presented a paper at the USENIX LISA conference titled:
Real-time log file analysis using the Simple Event Correlator (SEC)
The main page with links to the paper as published
and the original longer author's cut along with example rulesets and
slides are located here.
I also created a coursebook for a class on SEC that I taught for the
LISA 2009 conference.
The coursebook is based on tiddlywiki and provides:
- Textbook
- Quizzes
- Presentations
- Student notebook
Nagios system monitoring software.
I have two published changes for the Nagios network/application monitoring system:- Addition of advanced correlation capability to Nagios 2 using the Simple Event Correlator
- Patches to the Webinject http testing tool to make it interact better with nagios
Combining SEC and Nagios.
I wrote a patch for nagios to allow it to use SEC - the simple event correlator for finer gained correlation. Nagios has dependencies, however they are limited to using the exit codes for decisions and don't make decisions on the type of error generated. Also nagios flapping service detection never really quite worked for me. I gave a work in progress presentation at LISA 2006. A PDF of the slides and notes is available . Some use cases of this integration are:- Control over the definition of a flapping service
- Require 4 ok states in a row before rearming/clearing service.
- Different thresholds for a single service. E.G. one between 7AM and 6PM allows two processes to run while outside that range only one process can run.
- Recognize a known error condition and report a different message to make resolving the problem easier.
Nagios patch for webinject.
I made two sets of patches to webinject (http://www.webinject.org/) to better support nagios. My patches and descriptive text are located here.TkWatcher
If you are looking for TkWatcher, you have come to the right place. Just click here for its homepage. Tkwatcher is a tcl program that allows monitoring and analysis of program output. It can use any tcl based shell including tclsh, expect, wish, or scotty. It runs with tcl-7.3 and tcl-8.0 and should work with future versions. By default it reports problems by email, but it can also send problem reports to a file or to standard output. The reports can be human or program readable. In addition to these reporting modes, it can log error using external programs to permit syslogging, paging or other real-time notification of errors.Benefits over Watcher
It was inspired by the program watcher by Kenneth Ingham (not Inghman as mistyped in the docs these many years. Hope to fix this typo in version 1.5, sorry Mr. Ingham), but adds features that I found lacking in the original watcher. Among those features are the ability to:- select portions of the controlfile
- print command headers in the error messages
- select individual lines from a command output stream using absolute positions, or a regular expression
- perform and test calculations based on the input data
- specify multiple tests on a value that are anded together to determine if a warning should be issued.
- set thresholds for reports when all other tests are positive. This allows the user to set thresholds low enough for problems to be caught early, but excessive noise is eliminated by waiting until enough low threshold tests are passed.
It can Monitor
This tool has been used to:- monitor disk space
- look for stalled jobs in the print queue
- look for run away processes chewing up cpu time
- monitor swap space changes
- look for problems with network interfaces:
- excessive collision rates
- bad rpc calls
- excessive bad xid's indicating that an nfs server is overloaded
- monitor for swapping and paging activity
- verify operation of the X windows font server
- monitoring ntp network for hosts with excessive deviance from established norms
- monitor for excessive ping round trip time
- verify that required daemons are running on the system
- look for excessive copies of certain programs running on the system
- monitor users who keep software licenses for more than a preset amount of time
- watch for changing ip to ethernet mappings in the arp cache indicating an arp attack.
A System Administration Template for the Roundup Issue Tracker
This is a tracker last tested with the 0.7.0 (IIRC) release of the Roundup issue tracker. I am working on updating it to work with the 1.4.8 release of roundup which is the current stable version as I write this. I wrote this in my spare time intending on putting it in place for the company I was contracting at. Sadly, the contract ended before I could place it into production. The full documentation (in html) is located here . The tarball with the 0.7.0 tracker is located here. Note that your mileage may vary, especially if you are using it with a recent version of roundup. Roundup is a simple-to-use and install issue-tracking system with command-line, web and e-mail interfaces. It is based on the winning design from Ka-Ping Yee in the Software Carpentry "Track" design competition. Roundup's homepage is at http://roundup-tracker.org/. Roundup has been deployed for:- bug tracking and TODO list management (the classic installation)
- customer help desk support (with a wizard for the phone answerers, linking to networking, system and development issue trackers)
- issue management for IETF working groups
- sales lead tracking
- conference paper submission and double-blind referee management
- weblogging (well, almost :)
Sysadmin tracker intro
The sysadmin tracker for roundup is designed to handle issues generated in a system administration or help desk environment. It is more complex than the classic tracker, but has things that I have wanted in my trackers. Some of the functions were inspired by equivalent functionality in rt while some other things I have wanted in various trackers (req, reqng, queuemh, rt1, clearquest, remedy) through the years.Features
Compared to the classic tracker, this tracker has:- Well defined relationships between issues including:
- dependson - shows dependencies and prevents issues from being resolved if they have any unresolved issues that they depend on.
- grouping (merging) - used to update multiple similar issues as a group.
- seealso - used to link issues that are related in some way.
- Two notification (nosy) lists for issues:
- Watcher list (aka nosy) - This list receives replies to the issue and is intended for non-implementation level conversations with the requestor. It is also used to notify the requestor of milestone achievements.
- Technical List (aka verynosy) - This list is meant for technical discussion of how to implement solutions, and coordination of implementation as well as detailed implementation notes.
- Two types of messages: replies and comments.
- reply messages - are used to interact with the requestor and are sent to both the the Watcher and Technical lists.
- comment messages - are used to interact with other technical people, and are sent to only the Technical list.
- The repair role has been defined and is used to restrict which users are visible in the assignedto list box, or which users can be assigned to an issue via email. Also there is an auditor (that is not installed by default) that will assign the first person, with the repair role, to respond to an issue to that issue. See unused/autotake.py.
- Issues can be assigned to queues. Each queue can have its own list of email address for announcing a new unassigned item in the queue.
- Users assigned an issue by a third party (like a manager or first line support person) are sent email with the initial message ans the last three messages to get them to come up to speed as quickly as possible.
- Issues support tracking the amount of time spent solving the issue. (Not available via email yet)
- Issue has the extra attributes for:
- scheduling:
- startdate - date on which work should start
- leadtime - amount of time the work should take
- duedate - date on which work needs to be done
- workingorder - set a numeric order to issue. Useful to
determine which issue gets worked on first when
you have multiple issues at the same priority.
Must be an integer > 0. The system doesn't care
if you use 1000 for the first item to be
worked, or 1 for the first item. That is up to
local policy.
- summary info:
- fyi - string used to keep critical info (e.g. server serial numbers, external call ticket numbers) quickly available without having to search through issues.
- billing:
- billinfo - text attribute for storing billing information
- notification:
- needsreply - for indicating on the index when a new message that needs to be replied to has been received on an issue. There are some times when you just can't monitor email to get change notification. This is a quick and dirty way of being notified via the web interface.
- automatic actions:
- actiondate - date on which action will occur
- actionstatus - the [optional] new status for the issue after the action triggers.
- actioncomment - the [optional] comment message that will be added to the issue when the action triggers.
- can change user who opened/requested issue:
- requestedby - set the user who requested this issue.
It has a matching detector that by default will
set it to the creator of an issue. Also it adds
the requestedby person to the nosy list.
By default this attribute will have the same value as the creator property, but this attribute can be changed unlike the creator. This means it can be set to a different user if you have a help desk that takes calls and enters issues on behalf of another person.
- requestedby - set the user who requested this issue.
It has a matching detector that by default will
set it to the creator of an issue. Also it adds
the requestedby person to the nosy list.
- scheduling:
- In the web interface, each message displayed has a replyto link that will reload the issue with the change note text area filled with a quoted attributed version of the message you are replying to.
- In the web interface each message that is sent can be carbon copied to arbitrary email addresses.
- Wherever issue numbers show up, they also display an indication of the state of the issue. This is settable in the status class.
- The tracker administrator can configure what status transitions are allowed. E.G. you can force new to transition to only open, stalled or hold. Then those states can transition to resolved to enforce a particular status path.
- The tracker administrator can limit setting a state to particular users. (Note is is shown in the web interface, but an error is produced when it is set. This is considered a bug.)
- It comes preloaded with searches for displaying (in a limited way) relations between issues, scheduling showing due dates, showing watcher and technical lists.
- Users can add any query they want by name to their list of queries.
- It has a printable link that makes it easier to print out issues for review.
- It has an auto refresh mode that automatically refreshes the screen at a user selectable interval. Great for keeping up to date on new and changed issues.
- All pages have a full text search and goto item forms to ease locating particular issues.
- Index pages (search results) provide a count of number of issues and your location within the list from a patch by Patrick Ohly.
- Date spec modified to support yyyy/mm/dd (and more importantly mm/dd) from patch by Luke Opperman.
- The user's item page has a fixed format list of non-resolved issues requested by this user as well as a link to the index page of all issues requested by the user to allow interactive sorting and grouping capability.
- The user index page includes a link to a search for all the user's issues.
- On the index pages, all columns that are links to users are now hyperlinked to the user.
- The ability to do the functions requested below. Used
implementation from K Schutte from the roundup list.
On Mon, 18 Aug 2003 09:50 pm, K Schutte wrote:
How should I set up Roundup such that for any new issue with topic <foo> I am put on the nosy list?
Does this also work when with an existing issue the topic >foo< is added? How can this be arranged such that any user (that is, including those who can't edit Python) can specify on which topics they will be added on the nosy list? - Publishing of an rss newsfeed keeping you up to date on changes in the tracker.
Prior Projects
These are projects that are complete and available for people to use.Personal LOgging Device modifications.
This tool was written by Hal Pomeranz and presented at LISA 93. The abstract reads:PLOD (the Personal LOgging Device) is a simple text interface which allows System Administrators (and others) to keep a record of the work they from day to day. The program was developed in Perl with device independence, flexibility, extensibility, and ease of use in mind. The user-interface is reminiscent of Berkeley mail, complete with many pre-defined tilde-escapes which perform various useful functions. Users may easily extend the program by defining their own personal escape sequences.Plod is a tool for logging your daily tasks. I have modified the version available from Hal's web site. My modified version allows time tracking and assigning (time) and plod notes to particular tasks. I use it for recording the amount of time I spend on reading email, working on a particular trouble ticket, responding to critical incidents etc. The files are:
- a gzipped tar file with the changes applied.
- The master source of plod in shar format in case Hal's site disappears or a newer version of plod is released.
- a patch file to apply to the master source.
- The changelog file for my patch.
plod -T -d `date +%m/%d/%Y`to display the number of minutes spent in each category for today.
function timecard () {
plod -T -d $1 -D `date +%m/%d/%Y` $2;
plod -d $1 -D `date +%m/%d/%Y` -g : $2 ;
}
is used to display the timecard (i.e number of minutes per category)
and all the log entries between the start date and the current
day. Useful for filling out timecards. If the start date and current
date are in different months, then this will have to be run
twice. E.G. December 1, 2005 falls on a Thursday. So I would run:
timecard 11/30/2005 200511 timecard 12/1/2005on December 2nd to get the time for the week. Sorry about that, I didn't change the code to allow this to work in a single command.
Majordomo
I was a significant contributer to the majordomo mailing list manager in the early 1990's. I was responsible for the current email based configuration as well as the 1.90 through 1.93 (or was it 1.94) releases.Software management with Depot Lite
At the USENIX LISA conference in 1994 I fortunate enough to have a second paper accepted. It was titled: Depot-Lite is a software management, packaging and deployment method. It extends the Depot concept of software management with some additions that are useful in an academic environment where students are allowed to install and publish software for others to use.Cygwin Involvement
I have been using cygwin for many years. It is the one thing that makes windows bearable. I am hosting the latest copy I have of Michael A Chase's clean_setup.zip version 1.0700 from July 02, 2003 because it is a useful tool and there doesn't seem to be another downloadable version on the internet. It used to be at:http://home.ix.netcom.com/~mchase/zip/clean_setup.zipThanks to Angelo Graziosi for providing me with the copy. If you download it from here, please consider posting it on your own web page so that this tool will not once again face extinction. Thanks.
Using ssh and screen together
I use ssh with screen all the time because I work remotely and have my ssh session drop on a regular basis. I also use ssh-agent on my laptop and forward the agent via my ssh session. Now within my screen session, I often ssh to other hosts where I want my ssh-agent to be accessible. When I initially log in it works fine. The SSH_AUTH_SOCK variable is in screen's environment and is inherited by the sessions under screen. When I ssh to other systems, again the SSH_AUTH_SOCK is forwarded along. However after the ssh disconnects and is reconnected (using autossh or manually), the SSH_AUTH_SOCK variable in my screen sessions is pointing to a dead socket. Fixing this for screen sessions on the ssh target host is easy, save the SSH_AUTH_SOCK variable before invoking screen -Dr, and source the new SSH_AUTH_SOCK into the shell running under screen. However for remote ssh sessions, it is more difficult as we have to redirect their remote SSH_AUTH_SOCK to the newly created socket. To do this, I created ssh_auth_shuffle which is a bash script that combs the environments of open ssh sessions and symbolically links their SSH_AUTH_SOCK's to the newly created socket allowing access to the ssh-agent. So to recap, the sequence:- desktop -> access_host
- access_host -> host1
- host1 -> host2
ssh access_host 'ssh_auth_shuffle && screen -d -r'The ssh_auth_shuffle script locates all your established ssh commands and figures out where their end of the ssh-auth tunnel is located. It then links that end to the new ssh-auth tunnel created by the ssh from your desktop to access_host. You can install ssh_auth_shuffle anywhere in your path. Some useful aliases are:
- auth '. ~/.auth_ssh' manually run auth to update the ssh endpoint information in your shell.
- ssh '. ~/.auth_ssh; ssh' makes sure ssh works by pulling the newest ssh endpoint info before executing ssh.
- scp '. ~/.auth_ssh; scp' same as above except for scp.
export SSH_AUTH_SOCK=/tmp/ssh-xNQZo23620/agent.23620 export SSH_CLIENT='::ffff:65.33.255.162 1100 22' export SSH_CONNECTION='::ffff:63.33.222.162 1100 ::ffff:192.168.7.14 22' export SSH_TTY=/dev/pts/7 export DISPLAY=localhost:11.0The values of these environment variables (except SSH_CLIENT) are described in the ssh man page. In addition the current DISPLAY that is in use by ssh is exported as well so you can use X11 forwarding after reconnecting to access_host although this does not work for host1 or host2. Note that this script is very Bourne shell centric since the ~/.auth_ssh file uses "export var=value" syntax to set variables. However modifying it to support csh style shells should be pretty straight forward.
Korn shell semaphore implementation
This was originally from:Implementing Semaphores in the Shelland implements a fair queuing semaphore implementation in ksh. I had a couple of problems with it when I used it for controlling resource usage by BackupPC, so I patched it and the original source plus the patch are available here. It is released under GPL V2 as are my patches to it. I am putting the shell semaphore code here because the link to the article (and in theory the source code) I found for it at: http://unix.ittoolbox.com/documents/implementing-semaphores-in-the-shell-15726 is dead.
10/6/2004 By Ed Schaefer and John Spurgeon for UNIX Review
Summary: The authors present a Korn shell implementation of a counting semaphore.
A little about me
I have posted my resume (in adobe acrobat (.pdf) format) and will post it in other formats along with my picture here when I get a chance. If I am not working on an ambulance (I've been an emergency medical technician since the mid 80's) I can be found playing Ultimate Frisbee, or giving somebody a massage.I am currently employed as a system administrator with Renesys Corporation.
Also you may be interested in my profile on LinkedIn.