[fm-discuss] Re: Re: FMRI Reference Documentation?

Cibrario, Robert J. robert.cibrario at gs.com
Wed Sep 27 08:45:03 PDT 2006


I have 4GB (8 x 512MB) installed in banks 0-3 & 16-19:

p/o prtdiag:

Bank Table:
-----------------------------------------------------------
           Physical Location
ID       ControllerID  GroupID   Size       Interleave Way
-----------------------------------------------------------
0        0             0         512MB           0,1,2,3
1        0             1         512MB
2        0             1         512MB
3        0             0         512MB
16       1             0         512MB           0,1,2,3
17       1             1         512MB
18       1             1         512MB
19       1             0         512MB


If I do need to adjust unum, please tell me how. Otherwise I'll give it
a shot.
thanks
Rc



-----Original Message-----
From: Gavin.Maltby at Sun.COM [mailto:Gavin.Maltby at Sun.COM] 
Sent: Wednesday, September 27, 2006 10:57 AM
To: Cibrario, Robert J.
Cc: fm-discuss at opensolaris.org
Subject: Re: [fm-discuss] Re: Re: FMRI Reference Documentation?

On 09/27/06 13:48, Cibrario, Robert J. wrote:
> I'm on a Sun Fire V240 - many (many) thanks

Attached is an injection file which can be used to simulate
a memory uncorrectable error to the fault diagnosis engine.
This won't be seen by the kernel at all and won't panic
or reboot your system (but note the caveat below).

This should work on your system (I took it off another v240).
There's a chance we may have to vary the "unum" in the
resource if you don't have memory installed in those
slots.

You can aim the injection at the real fmd, or use fmsim to
create a simulated environment (I assume the snmp trap will
still fire from this simulated environment).  Just a quick
disclaimer on injecting errors: they look real and the
diagnosis and response software do not know the difference;
this means that on platforms that support it they may
offline and/or blacklist components from future use.

In one window:

# /usr/lib/fm/fmd/fmsim -i
fmsim: creating simulation world /tmp/fmd.29183 ... done.
fmsim: populating /var ... done.
fmsim: populating /usr/lib/fm from / ... 6624 blocks
fmsim: populating /usr/platform/SUNW,SPARC-Enterprise/lib/fm from / ...
336 blocks
fmsim: populating /usr/platform/SUNW,Sun-Fire-15000/lib/fm from / ...
192 blocksfmsim: populating 
/usr/platform/SUNW,Sun-Fire-T200/lib/fm from / ... 16 blocks
fmsim: populating /usr/platform/SUNW,Sun-Fire/lib/fm from / ... 176
blocks
fmsim: populating /usr/platform/SUNW,UltraAX-i2/lib/fm from / ... 1104
blocks
fmsim: populating /usr/platform/sun4u/lib/fm from / ... 1104 blocks
fmsim: populating /usr/platform/sun4v/lib/fm from / ... 1104 blocks
fmsim: populating /usr/lib/locale/C from / ... 1472 blocks
fmsim: populating /usr/sbin from / ... 224 blocks
fmsim: adding customizations: done.
fmsim: generating script ... done.
fmsim: simulation 29183 running fmd(1M) version 1.1
fmd: [ loading modules ... fmd: failed to load 
/tmp/fmd.29183/usr/platform/sun4u/lib/fm/fmd/plugins/USII-io-diagnosis.s
o: client handle wasn't initialized by _fmd_init
fmd: failed to load
/tmp/fmd.29183/usr/platform/sun4u/lib/fm/fmd/plugins/datapath-retire.so:
client handle wasn't 
initialized by _fmd_init
fmsim: rpc adm requests can rendezvous at 1073741824
fmsim: injectors should use channel com.sun:fm:fmd29183
fmsim: debuggers should attach to PID 29215
fmd: failed to load
/tmp/fmd.29183/usr/lib/fm/fmd/plugins/ip-transport.so: client handle
wasn't initialized by _fmd_init
done ]
fmd: [ awaiting events ]

The last few lines give us a channel to inject on that won't reach the
real fmd; we use this with -c to fminject.  If you want to target the
real fmd just leave off the -c below, and the -P on fmstat etc.

In another window:

# /usr/lib/fm/fmd/fminject -c com.sun:fm:fmd29183 ~gavinm/tmp/ue.inj
sending event e1 ... done

Now fmadm faulty aimed at that fmd shows the cached state of the
resource:

# fmadm -P 1073741824 faulty
    STATE RESOURCE / UUID
--------
----------------------------------------------------------------------
degraded mem:///unum=MB/P0/B1:B1/D0,B1/D1
          fa8bfc8d-70cd-6c97-e6ec-9b16ef6281ff
--------
----------------------------------------------------------------------

It is "degraded" rather than "faulted" because we cannot offline a whole
bank of memory on the v240.

We can fmdump the errlog (all ereports are appended to this):

# fmdump -e /tmp/fmd.29183/var/fm/fmd/errlog
TIME                 CLASS
Sep 27 15:46:46.6578 ereport.cpu.ultraSPARC-IIIi.ue

(use -V and you'll see the ereport the injection file produced).
The -e actually does nothing here since we have specified a log file.

And the fault log (all fault diagnoses are appended to this):

# fmdump -v /tmp/fmd.29183/var/fm/fmd/fltlog
TIME                 UUID                                 SUNW-MSG-ID
Sep 27 15:46:46.6578 fa8bfc8d-70cd-6c97-e6ec-9b16ef6281ff SUN4U-8000-35
    95%  fault.memory.bank

         Problem in: mem:///unum=MB/P0/B1:B1/D0,B1/D1
            Affects: mem:///unum=MB/P0/B1:B1/D0,B1/D1
                FRU: mem:///unum=MB/P0/B1:B1/D0,B1/D1

You can fmadm -P 1073741824 repair if you wish, but since it is a
simulated environment you can also just ctrl-C fmsim.

Hope that helps

Gavin


More information about the fm-discuss mailing list