[osol-discuss] Project Proposal: Fault Management Event Registry

Eric Boutilier Eric.Boutilier at Sun.COM
Thu Mar 1 07:41:56 PST 2007


Thanks, Cindi. You have seconds. I'll contact you offline to get you
set up.

Eric

On Tue, 27 Feb 2007, cindi wrote:
>
>
> This project will export the fault management registry of event specifications
> and diagnosis article content.  The initial delivery to the OpenSolaris
> community will include the registry contents and a set of CLIs
> and a web-based browser tools to access event class and payload specifications,
> diagnosis messages and article details.
>
> The initial target audience of this project is system administrators and
> developers who want a listing of the possible fault diagnosis messages
> and the event class and payload specifications.
>
> For example, the ermsg command may be used to list the diagnosis message IDs
> for all AMD Opteron and Athlon 64 processor diagnoses:
>
>    # ermsg -a AMD
>    Dictionary   Entry No.    ID
>    AMD          1            AMD-8000-1W
>    AMD          2            AMD-8000-2F
>    AMD          3            AMD-8000-3K
>    AMD          4            AMD-8000-48
>    AMD          5            AMD-8000-5M
>    AMD          6            AMD-8000-67
>    AMD          7            AMD-8000-7U
>    AMD          8            AMD-8000-8L
>    AMD          9            AMD-8000-9G
>    AMD          10           AMD-8000-AV
>    AMD          11           AMD-8000-C0
>    AMD          12           AMD-8000-DT
>    AMD          13           AMD-8000-E6
>    AMD          14           AMD-8000-FN
>    AMD          15           AMD-8000-G9
>    ...
>
> Specific message and article detail content may also be displayed:
>
>    # ermsg -a AMD-8000-G9
>    Dictionary   Entry No.    ID
>    AMD          15           AMD-8000-G9
>
>    CPU errors exceeded acceptable levels
>
>    Type
>    Fault
>
>    Severity
>    Major
>
>    Description
>    The number of errors associated with this CPU has exceeded acceptable levels.
>
>    Automated Response
>    An attempt will be made to remove this CPU from service.
>
>    Impact
>    Performance of this system may be affected.
>
>    Suggested Action for System Administrator
>    Schedule a repair procedure to replace the affected CPU.  Use fmdump -v -u
>    <EVENT_ID> to identify the module.
>
>    Details
>    This message indicates that the Solaris Fault Manager has received a report
>    from a CPU indicating that an uncorrectable Level 1 Data Translation
>    Look-aside Buffer error has occurred, and a CPU fault has been diagnosed.
>    System performance may have been affected. Faults of this nature typically
>    result in a system reset and reboot.
>    ...
>
> Similarly, FMA event class and payload specifications may also be displayed.
>
>    # erevent -L "ereport.io.pci.*"
>    ereport.io.pci.dpe -- Detected data parity error
>    ereport.io.pci.dto -- Master never reissued read
>    ...
>
>    # erevent -a "ereport.io.pci.dpe"
>    ereport.io.pci.dpe -- Detected data parity error
>
>    Event Payload
>    Name         Type            Description
>    ENA          uint64_t        Error Numeric Association
>    class        string          The event class
>    detector     fmri            The resource that detected the error
>    version      uint8_t         The major version of this event class
>    pci-bdg-cntl uint16_t        PCI bridge control register
>    pci-command  uint16_t        PCI Local Bus configuration command register
>    pci-pa       uint64_t        PCI errant physical address
>    pci-status   uint16_t        PCI Local Bus configuration status register
>
> The OpenSolaris event registry source will be regularly updated to coincide
> with updates to message IDs at http:///sun.com/msg.  Community contributions
> to the event registry source will be permitted and sponsored for developers
> contributing fault management error handling and diagnosis software for
> hardware and software components that are FMA capable and aware.
>
>



More information about the opensolaris-discuss mailing list