[fm-discuss] Re: improved platform control

Cynthia McGuire cindi at sun.com
Thu Jul 13 11:50:43 PDT 2006


Hi Garret,

The Solaris Fault Management Daemon (fmd(1M) was designed to take a telemetry of error events and dispatch them to diagnosis software specifically developed to analyze what the underlying problem is.  The result of diagnosis is what we call, a fault event.  When diagnosis software can not determine a single problem (or fault event), list of suspected faults are published.  Agent software can be written to take an action upon the diagnosis (or list of suspected faults) such as automatically deconfigure a resource from the system, light an LED or send an SNMP trap.  

The type of events you are talking about fall into the realm of sensor data (for the most part) and not really 'errors' as we have defined them for the Fault Manager.  We are currently in the design stages of a sensor architecture that will allow us to process sensor telemetry much like we do in the fault manager for errors.  Specifically written sensor modules will process the raw data and publish a recommended action for sensor agent software to do things like light LEDs, turn up fan speeds, dim panels, or re-assign resources for QOS.

Unfortunately, the sensor architecture is not yet available in Open Solaris.  I don't recommend using PICL: we are moving away from using it and is not really designed to effect changes in the system.  Using sysevent as the transport between your sensor event producer and sensor data agent software will work too.  Keep in mind that sysevent is only a transport and not designed to process events.  Another approach (and one that I recommend) is to make use of the fault manager for the events that fall into the error category and over time, migrate the others (strictly sensor related) to using the sensor architecture when it becomes available.

Cindi
--
This message posted from opensolaris.org



More information about the fm-discuss mailing list