[fm-discuss] Proposal: libtopo eumeration of power supplies and fans via IPMI

Rob Johnston Robert.Johnston at Sun.COM
Tue Jan 8 13:29:18 PST 2008


Cynthia McGuire wrote:
> Hi Rob,
>
> It's nice see progress here.  See my comments below.
>
> Cindi
>
> Rob Johnston wrote:
>> Hello and happy new year!
>>
>> What follows is a proposal that Eric Schrock and I have been working 
>> on to define a mechanism to enumerate power supplies and fans on 
>> platforms that support IPMI.  This is a first step in the larger 
>> Sensor Abstraction Layer project.
>>
>> Comments and questions are encouraged.
>>
>> --------------------------------------------------------------------
>>
>> 1. DESCRIPTION
>>
>> The Solaris FMA framework is designed to diagnose failures in system
>> components.  Currently these components are discovered by probing the
>> hardware visible to Solaris via standard OS paths (I/O, CPU, DIMMs,
>> etc).  However, there exists a set of components that are crucial to the
>> ongoing health of the system that have no connection visible to Solaris.
>> The most common components, and the most likely to encounter failures,
>> are power supplies and fans.
>>
>> On low-end hardware, these components are often not observable, and it
>> is the responsibility of the user to manually detect component failure,
>> or run custom (Windows) software to observe the system.  Higher end
>> systems (such as the x4000 series shipped by Sun) have a service
>> processor that manages the physical components and sensors in the
>> system.  Some systems (such as SPARC) have a custom communications
>> mechanism between the OS and the SP, but the industry standard is IPMI
>> (Intelligent Platform Management Interface).  Solaris already has the
>> ability to communicate with the SP over the baseboard management
>> controller (/dev/bmc), and a basic library (libipmi) already exists.
>>
>> Integrating support for power supplies and fans within FMA is an
>> important step in bringing all hardware topology enumeration and
>> diagnosis under a single infrastructure.  Without this ability, users
>> must manage a separate OS instance (on the SP) with different
>> configuration, separate management, and separate notification
>> mechanisms.
>>
>> This proposal adds basic enumeration support for power supplies and fans
>> on platforms supporting IPMI.  It does not include the ability to
>> diagnose psu or fan failures, nor does it provide a way to read
>> environmental sensors (fan speed, etc) for these components.  This
>> functionality will be provided by a future project.
>>
>>
>> 2. TOPOLOGY CHANGES
>>
>> On x86 systems, the root of the hc topology tree is hc:///motherboard=0
>> (though bay nodes can exist at the root level as well).  It doesn't make
>> sense to have physical components like fans underneath the motherboard,
>> nor does it make sense to have them directly at the root level.  Future
>> projects will add sensors that monitor the chassis itself, and the
>> components are contained within the chassis, so a new root hc node is
>> created:
>>
>>     hc:///chassis=0
>>
>> There is only ever a single chassis.  Within IPMI, fans and psus can be
>> grouped together into domains that represent a logical unit (typically a
>> FRU).  While uncommon for power supplies, this is quite common for fan
>> modules or fan trays that contain multiple fans.  Therefore a
>> multi-level topology will be created of the form:
>>
>>     hc:///chassis=0/psu=0
>>     hc:///chassis=0/psu=1
>>     hc:///chassis=0/powermodule=0
>>     hc:///chassis=0/powermodule=0/psu=0
>>     hc:///chassis=0/powermodule=0/psu=1
>>
>>     hc:///chassis=0/fan=0
>>     hc:///chassis=0/fan=1
>>     hc:///chassis=0/fanmodule=0
>>     hc:///chassis=0/fanmodule=0/fan=0
>>     hc:///chassis=0/fanmodule=0/fan=1
>
> I see no reason to limit the number of chassis's to one.  There are 
> plenty of examples where there are more than one chassis in which 
> these (and other) components reside.  Also, in the topology, does the 
> motherboard become a sub-component (child) of a chassis or is it a 
> peer (sibling)?
>
>>
>> The IPMI components are technically 'cooling' elements, not fans. For
>> the systems which currently support Solaris and IPMI, only fans are
>> supported.  In the future, we may be able to detect non-fan cooling
>> elements by examining the set of associated sensors (such as a
>> tachometer) and inferring the type of cooling element.
>>
>> With IPMI, we know all components, even if a component is not currently
>> present.  To allow management software to detect empty component slots,
>> the FMRIs will always be enumerated, but the is_present method will
>> return false if the component is not currently present.
>>
>>
>> 3. DYNAMIC ENUMERATION
>>
>> A new common libtopo module, ipmi, will be provided that will do dynamic
>> enumeration of IPMI components.  While currently only supported on x86
>> systems, any system supporting IPMI should work, so the module will be
>> present on all architectures.  If future SPARC platforms support IPMI
>> over /dev/bmc, then everything should "just work".
>>
>> IPMI has the unusual property that the world is defined solely by
>> 'sensor descriptor records' (which may be sensors, FRUs, etc).  Instead
>> of iterating over entities (the IPMI term for components), one instead
>> iterates over all SDR records and infers an entity's existence based on
>> the sensor records that refer to it.  The logic to handle this will be
>> kept within libipmi, and the ipmi enumerator will iterate over all
>> discovered entities for any 'power domain', 'power supply', 'cooling
>> domain', or 'cooling unit' entities.  Using IPMI entity association
>> records, libipmi will have already organized these into the appropriate
>> hierarchy.
>>
>> The default label for each entity will be based on the entity id and the
>> entity instance number (which is globally unique).  These labels may or
>> may not correspond to the labels on the chassis, but under a correct
>> IPMI implementation they will be roughly correct, and there will be a
>> means to override them on a per-platform basis (see below).  For
>> components with a FRU locator record, it may be possible to assign a
>> label matching the FRU name, such as 'ft0.fm1.fru', though it's unclear
>> if this is any better (the naming is entirely up to the SP, and the
>> '.fru' extension is just a convention currently used by the current SP
>> firmware).
>>
>> Each component that is directly under the chassis will be assigned a FRU
>> matching its resource.  
>
> What does this statement mean?  Are you saying that the FRU property 
> for a topo node directly under 'chassis' will be equal to the resource 
> property?  What does the FRU label property look like?
>
> Components within an association will default to
>> the FRU of their parent, unless they have associated FRU locator
>> records, in which case they will have a distinct FRU matching their
>> resource.
>
> The paragraphs above use the acronym, FRU, to mean FRU label and FRU 
> FMRI.  Both of which are properties for a give topo node.  Please be 
> clear about which you're talking about.
>
>>
>> The sensors associated with the entity will be used to determine
>> presence as described in the IPMI specification.
>>
>> 4. STATIC ENUMERATION
>>
>> It would be nice if dynamic enumeration were enough to model any system
>> supporting IPMI.  Unfortunately, as is the case with most platform
>> technologies (such as SMBIOS), complete support for enumeration is
>> hampered by limitations of the specification as well as the
>> implementation.  With a proper implementation of the IPMI spec, it is
>> possible to enumerate all the components, though attaching semantic
>> meaning to them (labels, failure sensors, etc) is only possible in some
>> cases.
>>
>> On top of this, most platforms have an IPMI implementation that leaves
>> something to be desired.  A common problem is the lack of entity
>> association records, so fans that should be part of a logical module
>> (even if correctly represented via SDR records) are not associated with
>> one another.  Other problems include presence sensors that reference
>> incorrect entities, missing or incorrect FRU locator records, etc.
>>
>> To compensate for both of these problems, libtopo will support both
>> dynamic enumeration, static enumeration, and static assignment of senors
>> and properties to dynamically discovered entities.
>>
>>
>> 5. LIBIPMI DETAILS
>>
>> As part of this work, libipmi will be expanded in several different
>> capacities, mostly related to parsing SDR records and representing
>> entities.
>>
>> The SDR infrastructure will be expanded to support all possible SDR
>> record types (compact sensors, full sensor, entity association, etc).
>> The code will also be simplified to separate out the SDR name (when
>> available) from the record, since constructing this value is non-trivial
>> and should not be left to the consumer.
>>
>> New interfaces for gathering sensor readings based on a compact or full
>> SDR record will be introduced.  This consists mainly of a large number
>> of #defines, code to transform readings based on the linearization
>> function, and parsing the sensor units.  Some of this infrastracture
>> will not be fully used until future sensor work is complete, but enough
>> of it is needed at this point (namely parsing sensor-specific state
>> masks) to warrant its inclusion as part of this project.
>>
>> Based on this new infrastructure, libipmi will be enhanced to have a
>> native notion of entities, even these do not exist as such in the IPMI
>> specification.  The library will scan the SDR records, detect referenced
>> entities, group sensors with associated entities, and parse entity
>> association records to create a hierarchy of entities.  This will also
>> include a function to detect entity presence.
>>
>> This isolates the details of IPMI entities (of which there are many) to
>> within libipmi, simplifying the topo enumerator and allowing other
>> software to be developed on top of it.  One of these pieces of software
>> will be a private utility under /usr/lib/fm, 'ipmitopo', which will
>> display all IPMI entities (id, type, presence) and sensors associated
>> with each entity (reading, state, type, etc).  This tool is not designed
>> to replace the open source 'ipmitool' and exists solely to debug the
>> IPMI topo implementation by leveraging the same code used by libtopo.
>>
>>
>> 6. LIBTOPO ENHANCEMENTS
>>
>> To make the implementation of this project possible, a handful of
>> extensions to both the libtopo enumerator module API and XML schema are
>> necessary.
>>
>> Currently it is not possible to register module methods on nodes that
>> are statically enumerated via XML map files.  Typically, node methods
>> are registered onto a node by the enumerator module after the node is
>> bound to the topology.  However, since statically enumerated modules
>> aren't created by the enumerator module this registration doesn't occur.
>>
>> While there will be cases where we will be forced to statically define
>> psu and fan topologies via XML, these nodes still need to support the
>> node methods that are implemented by the ipmi enumerator module.  In
>> order to allow these methods to be registered on statically defined
>> nodes, the topo_modops_t struct will be extended with a new operation
>> (tmo_meth_reg) as shown below:
>>
>> typedef int topo_meth_reg_f(topo_mod_t *, tnode_t *);
>>
>> typedef struct topo_modops {
>>     topo_enum_f *tmo_enum;        /* enumeration op */
>>     topo_release_f *tmo_release;    /* resource release op */
>>     topo_meth_reg_f *tmo_meth_reg;    /* method registration op */
>> } topo_modops_t;
>>
>> The tmo_meth_reg operation will be optional.  Enumerator modules
>> which implement this operation will register the appropriate set of
>> methods on the topo node that is passed in.
>>
>> To provide a connection between this new operation and nodes that are
>> statically defined in XML, the syntax of the <node> element will be
>> extended to include a new optional "mod" attribute.  The value of this
>> attribute should be set to the name of an enumerator module, whose 
>> methods
>> should be registered on that node.  Below is an example usage of this
>> new attribute:
>>
>>    <range name='fan' min='0' max='2'>
>>          <node instance='0' mod='ipmi'>
>>              . . .
>>          </node>
>>    </range>
>>
>
> I don't see a need for the extra registration op nor the new attribute 
> name, mod.  I prefer to see the XML extended to permit enumerators to 
> be called after the XML parser has created the static node. 
> The enumerators can then register their methods and perform any 
> post-processing of the node (i.e. add other properties).  The 
> attribute name should be 'enum' as for dynamically created nodes.

Ok - so along those lines an alternative approach could do something like:

<range name='fan' min='0' max='2'>
    <node instance='0'>
        . . .
    </node>
    ...
   <enum-process name='ipmi' version='1'>
</range>

This could invoke a new entry point in the module which would walk the 
tree and post-process any nodes that it's responsible for (like fan and 
psu nodes).  Is this closer to what you're thinking of?


>> Additionally, the syntax of the <range> element will also be extended to
>> allow a new "set" attribute.  The intention is to allow for conditional
>> enumeration of a range of nodes based on the platform type.  This is
>> analagous to the conditional specification of properties which is
>> currently supported via the <propset> element.  Below is an example
>> usage of this new attribute:
>>
>>    <range name='fanmodule' min='0' max='4' 
>> set='Sun-Fire-X4500|Sun-Fire-X4540'>
>>        . . .
>>    </range>
>>
>
> This looks backwards.  Shouldn't the set contain the range?

Effectively it does.  What I was trying to avoid was creating a brand 
new element (ala <rangeset>).  By having the set that the range applies 
to be just an attribute of the <range> element greatly simplifies the 
actual code changes that I have to make to libtopo's parser.

I suppose an alternative would be to do something like this:

<rangeset type='product' set="Sun-Fire-X4500|Sun-Fire-X4540'>
    <range name='fanmodule' min='0' max='4'>
        . . .
    </range>
</range>

However this ends up being quite a bit more difficult to implement in 
the parser wothout any functional benefit.  I also think the xml code 
looks a bit cleaner the other way.

> You will also need to specify the set type (someplace) just as we do 
> in propset.

Agreed - I'll add a type attribute, ala propset

thanks,

rob



More information about the fm-discuss mailing list