[fm-discuss] Proposal: libtopo eumeration of power supplies and fans via IPMI
Rob Johnston
Robert.Johnston at Sun.COM
Mon Jan 7 12:49:26 PST 2008
Hello and happy new year!
What follows is a proposal that Eric Schrock and I have been working on
to define a mechanism to enumerate power supplies and fans on platforms
that support IPMI. This is a first step in the larger Sensor
Abstraction Layer project.
Comments and questions are encouraged.
--------------------------------------------------------------------
1. DESCRIPTION
The Solaris FMA framework is designed to diagnose failures in system
components. Currently these components are discovered by probing the
hardware visible to Solaris via standard OS paths (I/O, CPU, DIMMs,
etc). However, there exists a set of components that are crucial to the
ongoing health of the system that have no connection visible to Solaris.
The most common components, and the most likely to encounter failures,
are power supplies and fans.
On low-end hardware, these components are often not observable, and it
is the responsibility of the user to manually detect component failure,
or run custom (Windows) software to observe the system. Higher end
systems (such as the x4000 series shipped by Sun) have a service
processor that manages the physical components and sensors in the
system. Some systems (such as SPARC) have a custom communications
mechanism between the OS and the SP, but the industry standard is IPMI
(Intelligent Platform Management Interface). Solaris already has the
ability to communicate with the SP over the baseboard management
controller (/dev/bmc), and a basic library (libipmi) already exists.
Integrating support for power supplies and fans within FMA is an
important step in bringing all hardware topology enumeration and
diagnosis under a single infrastructure. Without this ability, users
must manage a separate OS instance (on the SP) with different
configuration, separate management, and separate notification
mechanisms.
This proposal adds basic enumeration support for power supplies and fans
on platforms supporting IPMI. It does not include the ability to
diagnose psu or fan failures, nor does it provide a way to read
environmental sensors (fan speed, etc) for these components. This
functionality will be provided by a future project.
2. TOPOLOGY CHANGES
On x86 systems, the root of the hc topology tree is hc:///motherboard=0
(though bay nodes can exist at the root level as well). It doesn't make
sense to have physical components like fans underneath the motherboard,
nor does it make sense to have them directly at the root level. Future
projects will add sensors that monitor the chassis itself, and the
components are contained within the chassis, so a new root hc node is
created:
hc:///chassis=0
There is only ever a single chassis. Within IPMI, fans and psus can be
grouped together into domains that represent a logical unit (typically a
FRU). While uncommon for power supplies, this is quite common for fan
modules or fan trays that contain multiple fans. Therefore a
multi-level topology will be created of the form:
hc:///chassis=0/psu=0
hc:///chassis=0/psu=1
hc:///chassis=0/powermodule=0
hc:///chassis=0/powermodule=0/psu=0
hc:///chassis=0/powermodule=0/psu=1
hc:///chassis=0/fan=0
hc:///chassis=0/fan=1
hc:///chassis=0/fanmodule=0
hc:///chassis=0/fanmodule=0/fan=0
hc:///chassis=0/fanmodule=0/fan=1
The IPMI components are technically 'cooling' elements, not fans. For
the systems which currently support Solaris and IPMI, only fans are
supported. In the future, we may be able to detect non-fan cooling
elements by examining the set of associated sensors (such as a
tachometer) and inferring the type of cooling element.
With IPMI, we know all components, even if a component is not currently
present. To allow management software to detect empty component slots,
the FMRIs will always be enumerated, but the is_present method will
return false if the component is not currently present.
3. DYNAMIC ENUMERATION
A new common libtopo module, ipmi, will be provided that will do dynamic
enumeration of IPMI components. While currently only supported on x86
systems, any system supporting IPMI should work, so the module will be
present on all architectures. If future SPARC platforms support IPMI
over /dev/bmc, then everything should "just work".
IPMI has the unusual property that the world is defined solely by
'sensor descriptor records' (which may be sensors, FRUs, etc). Instead
of iterating over entities (the IPMI term for components), one instead
iterates over all SDR records and infers an entity's existence based on
the sensor records that refer to it. The logic to handle this will be
kept within libipmi, and the ipmi enumerator will iterate over all
discovered entities for any 'power domain', 'power supply', 'cooling
domain', or 'cooling unit' entities. Using IPMI entity association
records, libipmi will have already organized these into the appropriate
hierarchy.
The default label for each entity will be based on the entity id and the
entity instance number (which is globally unique). These labels may or
may not correspond to the labels on the chassis, but under a correct
IPMI implementation they will be roughly correct, and there will be a
means to override them on a per-platform basis (see below). For
components with a FRU locator record, it may be possible to assign a
label matching the FRU name, such as 'ft0.fm1.fru', though it's unclear
if this is any better (the naming is entirely up to the SP, and the
'.fru' extension is just a convention currently used by the current SP
firmware).
Each component that is directly under the chassis will be assigned a FRU
matching its resource. Components within an association will default to
the FRU of their parent, unless they have associated FRU locator
records, in which case they will have a distinct FRU matching their
resource.
The sensors associated with the entity will be used to determine
presence as described in the IPMI specification.
4. STATIC ENUMERATION
It would be nice if dynamic enumeration were enough to model any system
supporting IPMI. Unfortunately, as is the case with most platform
technologies (such as SMBIOS), complete support for enumeration is
hampered by limitations of the specification as well as the
implementation. With a proper implementation of the IPMI spec, it is
possible to enumerate all the components, though attaching semantic
meaning to them (labels, failure sensors, etc) is only possible in some
cases.
On top of this, most platforms have an IPMI implementation that leaves
something to be desired. A common problem is the lack of entity
association records, so fans that should be part of a logical module
(even if correctly represented via SDR records) are not associated with
one another. Other problems include presence sensors that reference
incorrect entities, missing or incorrect FRU locator records, etc.
To compensate for both of these problems, libtopo will support both
dynamic enumeration, static enumeration, and static assignment of senors
and properties to dynamically discovered entities.
5. LIBIPMI DETAILS
As part of this work, libipmi will be expanded in several different
capacities, mostly related to parsing SDR records and representing
entities.
The SDR infrastructure will be expanded to support all possible SDR
record types (compact sensors, full sensor, entity association, etc).
The code will also be simplified to separate out the SDR name (when
available) from the record, since constructing this value is non-trivial
and should not be left to the consumer.
New interfaces for gathering sensor readings based on a compact or full
SDR record will be introduced. This consists mainly of a large number
of #defines, code to transform readings based on the linearization
function, and parsing the sensor units. Some of this infrastracture
will not be fully used until future sensor work is complete, but enough
of it is needed at this point (namely parsing sensor-specific state
masks) to warrant its inclusion as part of this project.
Based on this new infrastructure, libipmi will be enhanced to have a
native notion of entities, even these do not exist as such in the IPMI
specification. The library will scan the SDR records, detect referenced
entities, group sensors with associated entities, and parse entity
association records to create a hierarchy of entities. This will also
include a function to detect entity presence.
This isolates the details of IPMI entities (of which there are many) to
within libipmi, simplifying the topo enumerator and allowing other
software to be developed on top of it. One of these pieces of software
will be a private utility under /usr/lib/fm, 'ipmitopo', which will
display all IPMI entities (id, type, presence) and sensors associated
with each entity (reading, state, type, etc). This tool is not designed
to replace the open source 'ipmitool' and exists solely to debug the
IPMI topo implementation by leveraging the same code used by libtopo.
6. LIBTOPO ENHANCEMENTS
To make the implementation of this project possible, a handful of
extensions to both the libtopo enumerator module API and XML schema are
necessary.
Currently it is not possible to register module methods on nodes that
are statically enumerated via XML map files. Typically, node methods
are registered onto a node by the enumerator module after the node is
bound to the topology. However, since statically enumerated modules
aren't created by the enumerator module this registration doesn't occur.
While there will be cases where we will be forced to statically define
psu and fan topologies via XML, these nodes still need to support the
node methods that are implemented by the ipmi enumerator module. In
order to allow these methods to be registered on statically defined
nodes, the topo_modops_t struct will be extended with a new operation
(tmo_meth_reg) as shown below:
typedef int topo_meth_reg_f(topo_mod_t *, tnode_t *);
typedef struct topo_modops {
topo_enum_f *tmo_enum; /* enumeration op */
topo_release_f *tmo_release; /* resource release op */
topo_meth_reg_f *tmo_meth_reg; /* method registration op */
} topo_modops_t;
The tmo_meth_reg operation will be optional. Enumerator modules
which implement this operation will register the appropriate set of
methods on the topo node that is passed in.
To provide a connection between this new operation and nodes that are
statically defined in XML, the syntax of the <node> element will be
extended to include a new optional "mod" attribute. The value of this
attribute should be set to the name of an enumerator module, whose methods
should be registered on that node. Below is an example usage of this
new attribute:
<range name='fan' min='0' max='2'>
<node instance='0' mod='ipmi'>
. . .
</node>
</range>
Additionally, the syntax of the <range> element will also be extended to
allow a new "set" attribute. The intention is to allow for conditional
enumeration of a range of nodes based on the platform type. This is
analagous to the conditional specification of properties which is
currently supported via the <propset> element. Below is an example
usage of this new attribute:
<range name='fanmodule' min='0' max='4'
set='Sun-Fire-X4500|Sun-Fire-X4540'>
. . .
</range>
In the example above, the <range> element (and all children elements
within) will only be parsed and evaluated if the machine's platform type
matches one of the platforms specified by the "set" attribute's value.
All of the above extensions will be backwards compatible with any
existing map files and enumerator modules.
7. FUTURE WORK
This proposal lays the groundwork for a variety of future work under the
auspices of the FMA Sensor Framework.
The next step will be to include fan and PSU diagnosis. This requires
representing failure sensors within libtopo using the facility nodes
proposed as part of the sensor framework. These sensors are then read
by a sensor-transport module that has as 1:1 correspondence between
ereports and faults.
This will serve as a proof of concept for facility nodes and prepare
the way for the larger sensor and alert framework, while providing the
greatest immediate benefit. Future work will include representing
analog sensors in libtopo, developing an environmental monitor,
detecting fan and PSU hotplug, and creating a persistent alert
framework.
8. REFERENCES
"IPMI v2.0 rev. 1.0 specification markup for IPMI v2.0/v1.5 errata
revision 3"
http://www.intel.com/design/servers/ipmi/pdf/IPMIv2_0_rev1_0_E3_markup.pdf
Sensor Abstraction Layer OpenSolaris Project
http://www.opensolaris.org/os/project/sensors/
Libtopo documentation: FMD Programmer's Reference, Chapter 9
http://www.opensolaris.org/os/community/fm/FMDPRM.pdf
More information about the fm-discuss
mailing list