FMA Topology & Retire Agent Refinements [PSARC/2008/569 FastTrack]

Tim Haley Tim.Haley at sun.com
Mon Sep 8 10:36:39 PDT 2008


I am sponsoring the following case on behalf of Gavin Maltby.  This
case seeks patch binding for a Solaris 10 update release.

The case extends FMA infrastructure to support fault management in a
hypervisor context.  The case is a straight-forward addition of private
interfaces, and is part of an approved FMA portfolio (2008.035).
Considering this, we feel the case qualifies for self-review and have
filed it as Closed Approved Automatic.  Please let us know if there is
any disagreement and I will convert the case to an open fast-track.

Template Version: @(#)sac_nextcase %I% %G% SMI
This information is Copyright 2008 Sun Microsystems
1. Introduction
    1.1. Project/Component Working Name:
	 FMA Topology & Retire Agent Refinements
    1.2. Name of Document Author/Supplier:
	 Author:  Gavin Maltby
    1.3  Date of This Document:
	08 September, 2008
4. Technical Description

4.1. Project Summary
   4.1.1 Project Description:

	This case corresponds to the approved FMA portfolio 2008.035 -
	see references below.

	The materials of this case arise as part of the "FMA/xVM"
	project, or fault management for the Solaris x86 xVM hypervisor
	context.  This cases documents and formalizes the updates to
	the common FMA infrastructure introduced by the FMA/xVM project.

	To permit fmd retire agents to operate in both native and
	virtualized contexts without introducing still more conditional
	compilation, interfaces are introduced to allow an agent to
	invoke retire/unretire/status operations on a topology library
	(libtopo) resource node;  the underlying retire/unretire/status
	implementation is registered as a libtopo method by the
	platform-specific topo enumerator.

	Userland FMA components need to communicate with the kernel
	for various requests, such as to retire a memory page,
	retire (offline) a cpu that is faulted, disable a bad cpu
	cacheline, etc.  In the past this has been performed by the
	introduction of additional ioctls to the /dev/mem driver,
	new drivers such as /dev/mem_cache (for US-IV+ cacheline
	retire support), and in subsystem-specific drivers such as /dev/zfs
	as used by zfs-diagnosis and zfs-retire fmd modules.  To avoid
	further overloading of existing unsuspecting drivers and the
	introduction of further additional custom drivers such as 
	/dev/mem_cache, we introduce /dev/fm to be the conduit for
	most userland FMA <-> kernel [<-> hypervisor] communication.
	An associated support library libfmd_agent is provided to
	avoid code duplication and the need for consumers to work with
	/dev/fma directly.

	The x86 chip topology tree structure is also updated with this
	case.  The existing topology is chip/cpu where "chip" is the
	chip socket instance number (APIC ID chipid field) and "cpu"
	enumerates the cpu execution resources of the chip - numbering
	across multiple cores of the chip and, if applicable, hardware
	strands/threads of each core.  This is problematic even on
	native systems since it obscures chip structure.  The recent
	Intel Nehalem support PSARC/2008/527 supplemented chip/cpu
	with chip/core/strand for chips with more than 1 thread per core;
	the present case eliminates the use of chip/cpu in favour of
	chip/core/strand on all x86 chip types.

4.5 Interfaces

	Exported Interfaces
	-------------------

	New interfaces added to fmd_api.h;  these are Contracted
	Consolidation Private as all of fmd_api.h is exported at that
	level as per the fmd PRM:

	Interface Name				Stability
	--------------------------------------- --------------------
	fmd_nvl_fmri_retire			Contracted Consolidation Private
	fmd_nvl_fmri_unretire			Contracted Consolidation Private

	New interfaces exported by libfmd_agent in fmd_agent.h:

	Interface Name				Stability
	--------------------------------------- --------------------
	FMD_AGENT_RETIRE_DONE			Consolidation Private
	FMD_AGENT_RETIRE_ASYNC			Consolidation Private
	FMD_AGENT_RETIRE_FAIL			Consolidation Private
	fmd_agent_open				Consolidation Private
	fmd_agent_close				Consolidation Private
	fmd_agent_errno				Consolidation Private
	fmd_agent_errmsg			Consolidation Private
	fmd_agent_strerr			Consolidation Private
	fmd_agent_page_retire			Consolidation Private
	fmd_agent_page_unretire			Consolidation Private
	fmd_agent_page_isretire			Consolidation Private
	fmd_agent_physcpu_info			Consolidation Private
	fmd_agent_cpu_retire			Consolidation Private
	fmd_agent_cpu_unretire			Consolidation Private
	fmd_agent_cpu_isretired			Consolidation Private

	New interfaces exported for /dev/fm:

	Interface Name				Stability
	--------------------------------------- --------------------
	/dev/fm					Project Private
	<sys/devfm.h>				Project Private

5. Reference Documents

FMA Portfolio 2008.035:

http://wikihome.sfbay.sun.com/fma-portfolio/Wiki.jsp?page=2008.035.XVMTopRetire

6. Resources and Schedule
    6.4. Steering Committee requested information
   	6.4.1. Consolidation C-team Name:
		ON
    6.5. ARC review type: Automatic
    6.6. ARC Exposure: open



More information about the opensolaris-arc mailing list