[fm-discuss] Proposal: libtopo eumeration of power supplies and fans via IPMI
Rob Johnston
Robert.Johnston at Sun.COM
Tue Jan 8 13:29:18 PST 2008
Cynthia McGuire wrote:
> Hi Rob,
>
> It's nice see progress here. See my comments below.
>
> Cindi
>
> Rob Johnston wrote:
>> Hello and happy new year!
>>
>> What follows is a proposal that Eric Schrock and I have been working
>> on to define a mechanism to enumerate power supplies and fans on
>> platforms that support IPMI. This is a first step in the larger
>> Sensor Abstraction Layer project.
>>
>> Comments and questions are encouraged.
>>
>> --------------------------------------------------------------------
>>
>> 1. DESCRIPTION
>>
>> The Solaris FMA framework is designed to diagnose failures in system
>> components. Currently these components are discovered by probing the
>> hardware visible to Solaris via standard OS paths (I/O, CPU, DIMMs,
>> etc). However, there exists a set of components that are crucial to the
>> ongoing health of the system that have no connection visible to Solaris.
>> The most common components, and the most likely to encounter failures,
>> are power supplies and fans.
>>
>> On low-end hardware, these components are often not observable, and it
>> is the responsibility of the user to manually detect component failure,
>> or run custom (Windows) software to observe the system. Higher end
>> systems (such as the x4000 series shipped by Sun) have a service
>> processor that manages the physical components and sensors in the
>> system. Some systems (such as SPARC) have a custom communications
>> mechanism between the OS and the SP, but the industry standard is IPMI
>> (Intelligent Platform Management Interface). Solaris already has the
>> ability to communicate with the SP over the baseboard management
>> controller (/dev/bmc), and a basic library (libipmi) already exists.
>>
>> Integrating support for power supplies and fans within FMA is an
>> important step in bringing all hardware topology enumeration and
>> diagnosis under a single infrastructure. Without this ability, users
>> must manage a separate OS instance (on the SP) with different
>> configuration, separate management, and separate notification
>> mechanisms.
>>
>> This proposal adds basic enumeration support for power supplies and fans
>> on platforms supporting IPMI. It does not include the ability to
>> diagnose psu or fan failures, nor does it provide a way to read
>> environmental sensors (fan speed, etc) for these components. This
>> functionality will be provided by a future project.
>>
>>
>> 2. TOPOLOGY CHANGES
>>
>> On x86 systems, the root of the hc topology tree is hc:///motherboard=0
>> (though bay nodes can exist at the root level as well). It doesn't make
>> sense to have physical components like fans underneath the motherboard,
>> nor does it make sense to have them directly at the root level. Future
>> projects will add sensors that monitor the chassis itself, and the
>> components are contained within the chassis, so a new root hc node is
>> created:
>>
>> hc:///chassis=0
>>
>> There is only ever a single chassis. Within IPMI, fans and psus can be
>> grouped together into domains that represent a logical unit (typically a
>> FRU). While uncommon for power supplies, this is quite common for fan
>> modules or fan trays that contain multiple fans. Therefore a
>> multi-level topology will be created of the form:
>>
>> hc:///chassis=0/psu=0
>> hc:///chassis=0/psu=1
>> hc:///chassis=0/powermodule=0
>> hc:///chassis=0/powermodule=0/psu=0
>> hc:///chassis=0/powermodule=0/psu=1
>>
>> hc:///chassis=0/fan=0
>> hc:///chassis=0/fan=1
>> hc:///chassis=0/fanmodule=0
>> hc:///chassis=0/fanmodule=0/fan=0
>> hc:///chassis=0/fanmodule=0/fan=1
>
> I see no reason to limit the number of chassis's to one. There are
> plenty of examples where there are more than one chassis in which
> these (and other) components reside. Also, in the topology, does the
> motherboard become a sub-component (child) of a chassis or is it a
> peer (sibling)?
>
>>
>> The IPMI components are technically 'cooling' elements, not fans. For
>> the systems which currently support Solaris and IPMI, only fans are
>> supported. In the future, we may be able to detect non-fan cooling
>> elements by examining the set of associated sensors (such as a
>> tachometer) and inferring the type of cooling element.
>>
>> With IPMI, we know all components, even if a component is not currently
>> present. To allow management software to detect empty component slots,
>> the FMRIs will always be enumerated, but the is_present method will
>> return false if the component is not currently present.
>>
>>
>> 3. DYNAMIC ENUMERATION
>>
>> A new common libtopo module, ipmi, will be provided that will do dynamic
>> enumeration of IPMI components. While currently only supported on x86
>> systems, any system supporting IPMI should work, so the module will be
>> present on all architectures. If future SPARC platforms support IPMI
>> over /dev/bmc, then everything should "just work".
>>
>> IPMI has the unusual property that the world is defined solely by
>> 'sensor descriptor records' (which may be sensors, FRUs, etc). Instead
>> of iterating over entities (the IPMI term for components), one instead
>> iterates over all SDR records and infers an entity's existence based on
>> the sensor records that refer to it. The logic to handle this will be
>> kept within libipmi, and the ipmi enumerator will iterate over all
>> discovered entities for any 'power domain', 'power supply', 'cooling
>> domain', or 'cooling unit' entities. Using IPMI entity association
>> records, libipmi will have already organized these into the appropriate
>> hierarchy.
>>
>> The default label for each entity will be based on the entity id and the
>> entity instance number (which is globally unique). These labels may or
>> may not correspond to the labels on the chassis, but under a correct
>> IPMI implementation they will be roughly correct, and there will be a
>> means to override them on a per-platform basis (see below). For
>> components with a FRU locator record, it may be possible to assign a
>> label matching the FRU name, such as 'ft0.fm1.fru', though it's unclear
>> if this is any better (the naming is entirely up to the SP, and the
>> '.fru' extension is just a convention currently used by the current SP
>> firmware).
>>
>> Each component that is directly under the chassis will be assigned a FRU
>> matching its resource.
>
> What does this statement mean? Are you saying that the FRU property
> for a topo node directly under 'chassis' will be equal to the resource
> property? What does the FRU label property look like?
>
> Components within an association will default to
>> the FRU of their parent, unless they have associated FRU locator
>> records, in which case they will have a distinct FRU matching their
>> resource.
>
> The paragraphs above use the acronym, FRU, to mean FRU label and FRU
> FMRI. Both of which are properties for a give topo node. Please be
> clear about which you're talking about.
>
>>
>> The sensors associated with the entity will be used to determine
>> presence as described in the IPMI specification.
>>
>> 4. STATIC ENUMERATION
>>
>> It would be nice if dynamic enumeration were enough to model any system
>> supporting IPMI. Unfortunately, as is the case with most platform
>> technologies (such as SMBIOS), complete support for enumeration is
>> hampered by limitations of the specification as well as the
>> implementation. With a proper implementation of the IPMI spec, it is
>> possible to enumerate all the components, though attaching semantic
>> meaning to them (labels, failure sensors, etc) is only possible in some
>> cases.
>>
>> On top of this, most platforms have an IPMI implementation that leaves
>> something to be desired. A common problem is the lack of entity
>> association records, so fans that should be part of a logical module
>> (even if correctly represented via SDR records) are not associated with
>> one another. Other problems include presence sensors that reference
>> incorrect entities, missing or incorrect FRU locator records, etc.
>>
>> To compensate for both of these problems, libtopo will support both
>> dynamic enumeration, static enumeration, and static assignment of senors
>> and properties to dynamically discovered entities.
>>
>>
>> 5. LIBIPMI DETAILS
>>
>> As part of this work, libipmi will be expanded in several different
>> capacities, mostly related to parsing SDR records and representing
>> entities.
>>
>> The SDR infrastructure will be expanded to support all possible SDR
>> record types (compact sensors, full sensor, entity association, etc).
>> The code will also be simplified to separate out the SDR name (when
>> available) from the record, since constructing this value is non-trivial
>> and should not be left to the consumer.
>>
>> New interfaces for gathering sensor readings based on a compact or full
>> SDR record will be introduced. This consists mainly of a large number
>> of #defines, code to transform readings based on the linearization
>> function, and parsing the sensor units. Some of this infrastracture
>> will not be fully used until future sensor work is complete, but enough
>> of it is needed at this point (namely parsing sensor-specific state
>> masks) to warrant its inclusion as part of this project.
>>
>> Based on this new infrastructure, libipmi will be enhanced to have a
>> native notion of entities, even these do not exist as such in the IPMI
>> specification. The library will scan the SDR records, detect referenced
>> entities, group sensors with associated entities, and parse entity
>> association records to create a hierarchy of entities. This will also
>> include a function to detect entity presence.
>>
>> This isolates the details of IPMI entities (of which there are many) to
>> within libipmi, simplifying the topo enumerator and allowing other
>> software to be developed on top of it. One of these pieces of software
>> will be a private utility under /usr/lib/fm, 'ipmitopo', which will
>> display all IPMI entities (id, type, presence) and sensors associated
>> with each entity (reading, state, type, etc). This tool is not designed
>> to replace the open source 'ipmitool' and exists solely to debug the
>> IPMI topo implementation by leveraging the same code used by libtopo.
>>
>>
>> 6. LIBTOPO ENHANCEMENTS
>>
>> To make the implementation of this project possible, a handful of
>> extensions to both the libtopo enumerator module API and XML schema are
>> necessary.
>>
>> Currently it is not possible to register module methods on nodes that
>> are statically enumerated via XML map files. Typically, node methods
>> are registered onto a node by the enumerator module after the node is
>> bound to the topology. However, since statically enumerated modules
>> aren't created by the enumerator module this registration doesn't occur.
>>
>> While there will be cases where we will be forced to statically define
>> psu and fan topologies via XML, these nodes still need to support the
>> node methods that are implemented by the ipmi enumerator module. In
>> order to allow these methods to be registered on statically defined
>> nodes, the topo_modops_t struct will be extended with a new operation
>> (tmo_meth_reg) as shown below:
>>
>> typedef int topo_meth_reg_f(topo_mod_t *, tnode_t *);
>>
>> typedef struct topo_modops {
>> topo_enum_f *tmo_enum; /* enumeration op */
>> topo_release_f *tmo_release; /* resource release op */
>> topo_meth_reg_f *tmo_meth_reg; /* method registration op */
>> } topo_modops_t;
>>
>> The tmo_meth_reg operation will be optional. Enumerator modules
>> which implement this operation will register the appropriate set of
>> methods on the topo node that is passed in.
>>
>> To provide a connection between this new operation and nodes that are
>> statically defined in XML, the syntax of the <node> element will be
>> extended to include a new optional "mod" attribute. The value of this
>> attribute should be set to the name of an enumerator module, whose
>> methods
>> should be registered on that node. Below is an example usage of this
>> new attribute:
>>
>> <range name='fan' min='0' max='2'>
>> <node instance='0' mod='ipmi'>
>> . . .
>> </node>
>> </range>
>>
>
> I don't see a need for the extra registration op nor the new attribute
> name, mod. I prefer to see the XML extended to permit enumerators to
> be called after the XML parser has created the static node.
> The enumerators can then register their methods and perform any
> post-processing of the node (i.e. add other properties). The
> attribute name should be 'enum' as for dynamically created nodes.
Ok - so along those lines an alternative approach could do something like:
<range name='fan' min='0' max='2'>
<node instance='0'>
. . .
</node>
...
<enum-process name='ipmi' version='1'>
</range>
This could invoke a new entry point in the module which would walk the
tree and post-process any nodes that it's responsible for (like fan and
psu nodes). Is this closer to what you're thinking of?
>> Additionally, the syntax of the <range> element will also be extended to
>> allow a new "set" attribute. The intention is to allow for conditional
>> enumeration of a range of nodes based on the platform type. This is
>> analagous to the conditional specification of properties which is
>> currently supported via the <propset> element. Below is an example
>> usage of this new attribute:
>>
>> <range name='fanmodule' min='0' max='4'
>> set='Sun-Fire-X4500|Sun-Fire-X4540'>
>> . . .
>> </range>
>>
>
> This looks backwards. Shouldn't the set contain the range?
Effectively it does. What I was trying to avoid was creating a brand
new element (ala <rangeset>). By having the set that the range applies
to be just an attribute of the <range> element greatly simplifies the
actual code changes that I have to make to libtopo's parser.
I suppose an alternative would be to do something like this:
<rangeset type='product' set="Sun-Fire-X4500|Sun-Fire-X4540'>
<range name='fanmodule' min='0' max='4'>
. . .
</range>
</range>
However this ends up being quite a bit more difficult to implement in
the parser wothout any functional benefit. I also think the xml code
looks a bit cleaner the other way.
> You will also need to specify the set type (someplace) just as we do
> in propset.
Agreed - I'll add a type attribute, ala propset
thanks,
rob
More information about the fm-discuss
mailing list