FMA Topology & Retire Agent Refinements [PSARC/2008/569 FastTrack]
Tim Haley
Tim.Haley at sun.com
Mon Sep 8 10:36:39 PDT 2008
I am sponsoring the following case on behalf of Gavin Maltby. This
case seeks patch binding for a Solaris 10 update release.
The case extends FMA infrastructure to support fault management in a
hypervisor context. The case is a straight-forward addition of private
interfaces, and is part of an approved FMA portfolio (2008.035).
Considering this, we feel the case qualifies for self-review and have
filed it as Closed Approved Automatic. Please let us know if there is
any disagreement and I will convert the case to an open fast-track.
Template Version: @(#)sac_nextcase %I% %G% SMI
This information is Copyright 2008 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
FMA Topology & Retire Agent Refinements
1.2. Name of Document Author/Supplier:
Author: Gavin Maltby
1.3 Date of This Document:
08 September, 2008
4. Technical Description
4.1. Project Summary
4.1.1 Project Description:
This case corresponds to the approved FMA portfolio 2008.035 -
see references below.
The materials of this case arise as part of the "FMA/xVM"
project, or fault management for the Solaris x86 xVM hypervisor
context. This cases documents and formalizes the updates to
the common FMA infrastructure introduced by the FMA/xVM project.
To permit fmd retire agents to operate in both native and
virtualized contexts without introducing still more conditional
compilation, interfaces are introduced to allow an agent to
invoke retire/unretire/status operations on a topology library
(libtopo) resource node; the underlying retire/unretire/status
implementation is registered as a libtopo method by the
platform-specific topo enumerator.
Userland FMA components need to communicate with the kernel
for various requests, such as to retire a memory page,
retire (offline) a cpu that is faulted, disable a bad cpu
cacheline, etc. In the past this has been performed by the
introduction of additional ioctls to the /dev/mem driver,
new drivers such as /dev/mem_cache (for US-IV+ cacheline
retire support), and in subsystem-specific drivers such as /dev/zfs
as used by zfs-diagnosis and zfs-retire fmd modules. To avoid
further overloading of existing unsuspecting drivers and the
introduction of further additional custom drivers such as
/dev/mem_cache, we introduce /dev/fm to be the conduit for
most userland FMA <-> kernel [<-> hypervisor] communication.
An associated support library libfmd_agent is provided to
avoid code duplication and the need for consumers to work with
/dev/fma directly.
The x86 chip topology tree structure is also updated with this
case. The existing topology is chip/cpu where "chip" is the
chip socket instance number (APIC ID chipid field) and "cpu"
enumerates the cpu execution resources of the chip - numbering
across multiple cores of the chip and, if applicable, hardware
strands/threads of each core. This is problematic even on
native systems since it obscures chip structure. The recent
Intel Nehalem support PSARC/2008/527 supplemented chip/cpu
with chip/core/strand for chips with more than 1 thread per core;
the present case eliminates the use of chip/cpu in favour of
chip/core/strand on all x86 chip types.
4.5 Interfaces
Exported Interfaces
-------------------
New interfaces added to fmd_api.h; these are Contracted
Consolidation Private as all of fmd_api.h is exported at that
level as per the fmd PRM:
Interface Name Stability
--------------------------------------- --------------------
fmd_nvl_fmri_retire Contracted Consolidation Private
fmd_nvl_fmri_unretire Contracted Consolidation Private
New interfaces exported by libfmd_agent in fmd_agent.h:
Interface Name Stability
--------------------------------------- --------------------
FMD_AGENT_RETIRE_DONE Consolidation Private
FMD_AGENT_RETIRE_ASYNC Consolidation Private
FMD_AGENT_RETIRE_FAIL Consolidation Private
fmd_agent_open Consolidation Private
fmd_agent_close Consolidation Private
fmd_agent_errno Consolidation Private
fmd_agent_errmsg Consolidation Private
fmd_agent_strerr Consolidation Private
fmd_agent_page_retire Consolidation Private
fmd_agent_page_unretire Consolidation Private
fmd_agent_page_isretire Consolidation Private
fmd_agent_physcpu_info Consolidation Private
fmd_agent_cpu_retire Consolidation Private
fmd_agent_cpu_unretire Consolidation Private
fmd_agent_cpu_isretired Consolidation Private
New interfaces exported for /dev/fm:
Interface Name Stability
--------------------------------------- --------------------
/dev/fm Project Private
<sys/devfm.h> Project Private
5. Reference Documents
FMA Portfolio 2008.035:
http://wikihome.sfbay.sun.com/fma-portfolio/Wiki.jsp?page=2008.035.XVMTopRetire
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: Automatic
6.6. ARC Exposure: open
More information about the opensolaris-arc
mailing list