[fm-discuss] [Fwd: Project Proposal: Sensor Abstraction Layer for the Solaris Fault Manager]

Eric Boutilier Eric.Boutilier at Sun.COM
Fri Apr 27 08:07:22 PDT 2007


Dear FMA Community Group leaders,

This project proposal, Sensor Abstraction Layer, (copied below) was posted
and seconded yesterday, and therefore will be initiated soon. Please reply
if you think there might be trouble with this project securing sponsorship
of the FMA Community Group (and therefore the proposed RAS Community Group,
should that take effect.)

For background, see OGB draft proposals 2007/001 and 2007/002:

http://mail.opensolaris.org/pipermail/ogb-discuss/2007-April/000263.html
http://mail.opensolaris.org/pipermail/ogb-discuss/2007-April/000356.html

Eric Boutilier
OpenSolaris

-------------------------------------------------------------------------

From: cindi <cindi at sun.com>
Date: Thu, 26 Apr 2007 14:07:54 -0700
To: opensolaris-discuss at opensolaris.org
Subject: Project Proposal: Sensor Abstraction Layer for the Solaris Fault Manager

The Project

This project proposes extensions to the fault management architecture (FMA)
to support a sensor abstraction layer for the collection and analysis of
sensor based telemetry that can be used in fault and resource management.

The Problem

How do we manage raw telemetry data kept, maintained and exported by
disparate sources for the purposes of fault, resource management and
budgeting?  Today, there are a number of sensor collection mechanisms
exported by the hardware and software.  For the most part, the information
they export is hap-haphazardly presented and accessed according to ad-hoc
operating system interfaces, per-platform methods or per-subsystem industry
standards (SMBus, SMART and IPMI).  Using this data for fault or resource
management is clumsy and typically requires low-level system knowledge baked
into higher-level management applications.

Key Objectives

As part of an overall sensor abstraction layer based on our current fault
management architecture, we can solve the problem described in section 1.1
and provide a better understanding of the overall health and usage of a
system through more sophisticated diagnosis technologies and fine-grained
observability of sensor data via common access methods. A sensor abstraction
layer must posses:

1. the ability to alert the administrator to conditions observed by
    platform sensors that may impact the operational state of the
    platform.

2. the ability to alert the administrator to conditions that resolve
    themselves as observed by platform sensors.

3. the ability to watch one or more sensors and correlate the data for
    predictive fault analysis or resource management.

4. the ability to continuously record sensor data and retrieve it from
    systems for offline analysis, future system design or development of
    more advanced diagnosis algorithms.

5. the ability for administrators and service personnel to manually
    inspect sensor values without having to understand the exact
    implementation (e.g. IPMI or SMBus).

6. the ability to connect sensor data to higher-level diagnosis (e.g.
    SMART disk data to SCSI and ZFS diagnosis engines)

7. the ability to understand and observe performance and power budgets
    based on raw sensor data.

Cindi

_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss at opensolaris.org



More information about the fm-discuss mailing list