[fm-discuss] [Fwd: [osol-discuss] Project Proposal: Sensor Abstraction Layer for the Solaris Fault Manager]
Garrett D'Amore
garrett at damore.org
Thu Apr 26 14:18:50 PDT 2007
I like the idea of this project. But I'd like it even better, if it
could also endeavor to broaden its scope just a little bit by also
adding in some kind of control features.
In particular, a lot of platforms have fans for cooling. The
relationship between fans and sensors is often closely tied. (For
example, the fans need to be turned on or have their speed adjusted
based upon the temperature reported by thermal sensors. Or, you really,
really want to let the system administrator know if there is fault in
one of the fans that could ultimately lead to thermal crisis.)
As another example, you might want to throttle a CPU (instead of
shutting it off entirely) if thermal sensors peak beyond a certain
threshold. At a higher threshold, you might shut it off entirely or
power off the system.
Right now the picl library and other platform-specific hacks that are
used to solve this are a little unsatisfactory. I'd like to see an
attempt to provide some kind of common central framework for this sort
of thing.
I'd also like to think about ways to integrate this kind of work with
the work being done by the battery and power management folks. For
example, an overtemp alert on a Li-ION battery is certainly a situation
you want to know about! You might also like to be aware of
power-related sensors (voltage levels, current flow, etc.) This is
important information for the administrator. While the battery team is
focused on the mobile market, I think these issues have scope beyond
it... for example if one of your redundant power supplies on a server in
your data center is offline, you really don't want it to go unnoticed.
Just a thought.
-- Garrett
cindi wrote:
> FYI
>
> ------------------------------------------------------------------------
>
> Subject:
> [osol-discuss] Project Proposal: Sensor Abstraction Layer for the
> Solaris Fault Manager
> From:
> cindi <cindi at sun.com>
> Date:
> Thu, 26 Apr 2007 14:07:54 -0700
> To:
> opensolaris-discuss at opensolaris.org
>
> To:
> opensolaris-discuss at opensolaris.org
>
> Return-Path:
> <opensolaris-discuss-bounces at opensolaris.org>
> Received:
> from engmail3mpk.sfbay.Sun.COM (engmail3mpk.SFBay.Sun.COM
> [129.146.11.26]) by jurassic-x4600.sfbay.sun.com (8.14.0+Sun/8.14.0)
> with ESMTP id l3QL7ekh796197 (version=TLSv1/SSLv3
> cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 26 Apr 2007
> 14:07:40 -0700 (PDT)
> Received:
> from sunmail3mpk.sfbay.sun.com (sunmail3mpk.SFBay.Sun.COM
> [129.146.11.52]) by engmail3mpk.sfbay.Sun.COM
> (8.13.6+Sun/8.13.6/ENSMAIL,v2.2) with ESMTP id l3QL7epX005597; Thu, 26
> Apr 2007 14:07:40 -0700 (PDT)
> Received:
> from nwk-avmta-1.SFBay.Sun.COM (nwk-avmta-1.SFBay.Sun.COM
> [129.146.11.74]) by sunmail3mpk.sfbay.sun.com
> (8.13.7+Sun/8.13.7/ENSMAIL,v2.2) with ESMTP id l3QL7ab4025600; Thu, 26
> Apr 2007 14:07:36 -0700 (PDT)
> Received:
> from pmxchannel-daemon.nwk-avmta-1.sfbay.Sun.COM by
> nwk-avmta-1.sfbay.Sun.COM (Sun Java System Messaging Server 6.2-3.04
> (built Jul 15 2005)) id <0JH400627IOOTE00 at nwk-avmta-1.sfbay.Sun.COM>;
> Thu, 26 Apr 2007 14:07:36 -0700 (PDT)
> Received:
> from brmea-mail-2.sun.com ([192.18.98.43]) by
> nwk-avmta-1.sfbay.Sun.COM (Sun Java System Messaging Server 6.2-3.04
> (built Jul 15 2005)) with ESMTP id
> <0JH4002ZWION6S30 at nwk-avmta-1.sfbay.Sun.COM>; Thu, 26 Apr 2007
> 14:07:35 -0700 (PDT)
> Received:
> from relay24.sun.com (ip192-12-251-74.block6.us.syntegra.com
> [192.12.251.74]) by brmea-mail-2.sun.com (8.13.6+Sun/8.12.9) with
> ESMTP id l3QL7Y5J017492; Thu, 26 Apr 2007 21:07:34 +0000 (GMT)
> Received:
> from mms24es.sun.com ([150.143.232.74] [150.143.232.74]) by
> relay24.sun.com with ESMTP id BT-MMP-1297265; Thu, 26 Apr 2007
> 21:07:34 +0000 (Z)
> Received:
> from mms24bas.mms.us.syntegra.com (relay24.mms.us.syntegra.com
> [192.12.251.70]) by mms24es.sun.com with ESMTP id BT-MMP-2183835; Thu,
> 26 Apr 2007 21:07:34 +0000 (Z)
> Received:
> from mail.opensolaris.org ([72.5.123.71] [72.5.123.71]) by
> relay24.sun.com with ESMTP id BT-MMP-6179154; Thu, 26 Apr 2007
> 21:07:34 +0000 (Z)
> Received:
> from oss-mail1.opensolaris.org (localhost [127.0.0.1]) by
> mail.opensolaris.org (Postfix) with ESMTP id 981BEB2714; Thu, 26 Apr
> 2007 14:07:31 -0700 (PDT)
> Received:
> from sca-ea-mail-4.sun.com (sca-ea-mail-4.Sun.COM [192.18.43.22]) by
> mail.opensolaris.org (Postfix) with ESMTP id C06C7B270E for
> <opensolaris-discuss at opensolaris.org>; Thu, 26 Apr 2007 14:07:28 -0700
> (PDT)
> Received:
> from jurassic.eng.sun.com ([129.146.108.31]) by sca-ea-mail-4.sun.com
> (8.13.6+Sun/8.12.9) with ESMTP id l3QL7SAv011507 for
> <opensolaris-discuss at opensolaris.org>; Thu, 26 Apr 2007 21:07:28 +0000
> (GMT)
> Received:
> from [192.9.61.4] (punchin-cindi.SFBay.Sun.COM [192.9.61.4]) by
> jurassic.eng.sun.com (8.13.8+Sun/8.13.8) with ESMTP id l3QL7SrV190116
> for <opensolaris-discuss at opensolaris.org>; Thu, 26 Apr 2007 14:07:28
> -0700 (PDT)
> Sender:
> opensolaris-discuss-bounces at opensolaris.org
> Errors-to:
> opensolaris-discuss-bounces at opensolaris.org
> Message-ID:
> <463114AA.5060003 at sun.com>
> MIME-Version:
> 1.0
> Content-type:
> text/plain; charset=ISO-8859-1; format=flowed
> Content-transfer-encoding:
> 7BIT
> Precedence:
> list
> X-BeenThere:
> opensolaris-discuss at opensolaris.org
> Delivered-to:
> opensolaris-discuss at opensolaris.org
> X-PMX-Version:
> 5.2.0.264296
> X-Original-To:
> opensolaris-discuss at opensolaris.org
> X-Mailman-Version:
> 2.1.4
> List-Post:
> <mailto:opensolaris-discuss at opensolaris.org>
> List-Subscribe:
> <http://mail.opensolaris.org/mailman/listinfo/opensolaris-discuss>,
> <mailto:opensolaris-discuss-request at opensolaris.org?subject=subscribe>
> List-Unsubscribe:
> <http://mail.opensolaris.org/mailman/listinfo/opensolaris-discuss>,
> <mailto:opensolaris-discuss-request at opensolaris.org?subject=unsubscribe>
> List-Archive:
> <http://mail.opensolaris.org/pipermail/opensolaris-discuss>
> List-Help:
> <mailto:opensolaris-discuss-request at opensolaris.org?subject=help>
> List-Id:
> General OpenSolaris Discussion List <opensolaris-discuss.opensolaris.org>
> User-Agent:
> Thunderbird 1.5.0.10 (Macintosh/20070221)
>
>
> The Project
>
> This project proposes extensions to the fault management architecture
> (FMA) to support a sensor abstraction layer for the collection and
> analysis of sensor based telemetry that can be used in fault and
> resource management.
>
> The Problem
>
> How do we manage raw telemetry data kept, maintained and exported by
> disparate sources for the purposes of fault, resource management and
> budgeting? Today, there are a number of sensor collection mechanisms
> exported by the hardware and software. For the most part, the
> information they export is hap-haphazardly presented and accessed
> according to ad-hoc operating system interfaces, per-platform methods
> or per-subsystem industry standards (SMBus, SMART and IPMI). Using
> this data for fault or resource management is clumsy and typically
> requires low-level system knowledge baked into higher-level management
> applications.
>
> Key Objectives
>
> As part of an overall sensor abstraction layer based on our current
> fault management architecture, we can solve the problem described in
> section 1.1 and provide a better understanding of the overall health
> and usage of a system through more sophisticated diagnosis
> technologies and fine-grained observability of sensor data via common
> access methods. A sensor abstraction layer must posses:
>
> 1. the ability to alert the administrator to conditions observed by
> platform sensors that may impact the operational state of the
> platform.
>
> 2. the ability to alert the administrator to conditions that resolve
> themselves as observed by platform sensors.
>
> 3. the ability to watch one or more sensors and correlate the data for
> predictive fault analysis or resource management.
>
> 4. the ability to continuously record sensor data and retrieve it from
> systems for offline analysis, future system design or development of
> more advanced diagnosis algorithms.
>
> 5. the ability for administrators and service personnel to manually
> inspect sensor values without having to understand the exact
> implementation (e.g. IPMI or SMBus).
>
> 6. the ability to connect sensor data to higher-level diagnosis (e.g.
> SMART disk data to SCSI and ZFS diagnosis engines)
>
> 7. the ability to understand and observe performance and power budgets
> based on raw sensor data.
>
> Cindi
>
> _______________________________________________
> opensolaris-discuss mailing list
> opensolaris-discuss at opensolaris.org
> ------------------------------------------------------------------------
>
> _______________________________________________
> fm-discuss mailing list
> fm-discuss at opensolaris.org
More information about the fm-discuss
mailing list