[hpcdev-discuss] Static DTrace Probes for Grid Engine

Daniel Templeton Dan.Templeton at Sun.COM
Mon Aug 20 17:34:11 PDT 2007


All,

I've been working on replacing the rmon facility in Grid Engine 6.1 with 
user-land static DTrace probes for systems running Solaris Nevada.  It 
actually works really well, and I suspect it's slightly faster than the 
unmodified 6.1 binaries, as the DTrace probes use some tricks to be 
faster than the regular "if" statements that rmon uses.  Because I'm 
using user-land static probes, though, it only work with Solaris Nevada 
builds and distros.  It will not work with Solaris 10.

With my changes, there's a new script in the util directory called 
dld.sh.  dld.sh lets you attach to a running qmaster, scheduler, 
execution daemon, or shadow daemon and print debugging output.  There is 
no need to restart the daemon.  There is no penalty to the daemon when 
the debugging output is not being printed.  All that good DTrace stuff.  
The old method of restarting the daemon also works, but the process is a 
little different.  See the header of the dld.sh script for details.  The 
debug output from dld.sh is the same as the debug output you get today.

Attached is a diff patch for 6.1 to get my changes.  Most of the changes 
are in the supporting scripts and make files.  Very little source code 
changed.  After you have applied the diff patch, here's the process to 
see the new debug output:

1) Build the system normally, i.e. "aimk".
2) Build special DTrace daemon binaries with "aimk -dtrace".  The 
binaries will be built in a NEVADA* build directory.  The -dtrace switch 
will cause aimk to only build the four main daemon binaries.
3) Install the system normally, i.e. "distinst -local -all".
4) Install the special DTrace binaries with "aimk -local -onlydaemons 
snv-(amd64|sparc64|x84)".  The binaries will be installed in the sol-* 
directory, not a snv-* directory, for simplicity's sake.
5) Start the daemons normally, i.e. 
"$SGE_ROOT/$SGE_CELL/common/sgemaster;$SGE_ROOT/$SGE_CELL/common/sgeexecd".
6) Set a debug level normally, e.g. "source $SGE_ROOT/util/dl.csh; dl 2".
7) Connect to the qmaster with "$SGE_ROOT/util/dld.sh qmaster".
8) Watch the debug output scroll by...  Should look familiar.

The DTrace probes obey the debug level settings as set by the dl 
scripts.  Otherwise, you'd get swamped in a ton of useless output.  In 
order to use dld.sh, you must set the debug level with the dl scripts, 
just like you'd do normally.  Also note that you must be root to run 
dld.sh as only root is allowed to run the dtrace command.

The dual build process is a little bulky.  The reason for it is that 
DTrace has to modify every object file that uses user-land static 
probes.  To avoid having to completely rewrite the build process, I only 
applied the modifications to the four main daemons.  They're the only 
ones for which the DTrace probes make sense anyway.

If this is something that is generally agreed to be useful, I'll talk to 
Andy and Joachim about merging it into the main trunk.  Please do send 
me feedback.

Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ybbs.zip
Type: application/zip
Size: 9186 bytes
Desc: not available
Url : http://mail.opensolaris.org/pipermail/hpcdev-discuss/attachments/20070820/3458fd4f/attachment.zip 


More information about the hpcdev-discuss mailing list