[fm-discuss] having problems with mem option on 6800

Mike LaRosa Mc.Larosa at Sun.COM
Fri Nov 9 07:12:50 PST 2007


Thanks Kenneth,

Since this e-mail i've also managed to get a SUN4U-8000-35  
fault.memory.bank to produce,  so getting the
fault.memory.page and fault.memory.dimm to produce would be great,  I'd 
like to look at your script for the
mtst syntax of how you produced fault.memory.page and fault.memory.dimm ;)

Thanks Again,

Mike,

Kenneth Wong wrote:
> Michael,
>
> I have written a tool that uses mtst to inject error and then query 
> the domain and the SC for FMA ereports and faults. Attached is a 
> summary of what gets reported. In my testing I have seen the 
> fault.memory.page and fault.memory.dimm faults. Note I developed this 
> tool on the OPL platform but should work on other platforms. Please 
> ignore the first 2 lines regarding the PASS/FAIL status. I need to 
> implement a lookup table to do the analysis of what is expected to 
> what is generated. But you should be able to follow what happens when 
> a particular mtst error is injected. The ereports and faults are 
> reported with their timestamps. Also the fault is made up of the fault 
> class, msg-id, ereport mapping, fru, and uuid. The strategy I used is 
> take a snapshot of the before and after picture and process the diff 
> to figure out what happend. It is a Perl/Expect script. If you are 
> interested in using this tool please let me know.
>
> - Report of injecting kdue
> 0. kdue         MEM,UE  Domain : sca-dc3-0-b    SUCCESS         
> UNKNOWN PASS            PASS            FAIL
> Wed Oct 31 19:58:04 PDT 2007 SC : sca-dc3-0-sc0                 
> UNKNOWN PASS            PASS            FAIL
> DOMAIN_EREPORT_CNT:  ereport.cpu.SPARC64-VI.ue-mem: 1
> DOMAIN_EREPORT_LIST: Oct:31:19:43:31.7898+ereport.cpu.SPARC64-VI.ue-mem
> DOMAIN_FAULT_CNT:   fault.memory.page: 1
> DOMAIN_FAULT_LIST: 
> Oct:31:19:56:50.4059+fault.memory.page+SUN4U-8000-1A+Oct:31:19:43:31.7898+ereport.cpu.SPARC64-VI.ue-mem+mem:///u 
>
> num=/CMU00/MEM03B,MEM02B/physaddr=3cb51d60000+d8449033-dc50-e771-9b3c-bad6c6285b45 
>
> SC_EREPORT_CNT:  ereport.chassis.domain.panic: 
> 1+ereport.chassis.SPARC-Enterprise.mem.block.ue: 2
> SC_EREPORT_LIST: 
> Oct:31:19:46:26.4178+ereport.chassis.SPARC-Enterprise.mem.block.ue|Oct:31:19:46:27.1432+ereport.chassis.SPARC-Enter 
>
> prise.mem.block.ue|Oct:31:19:51:20.7347+ereport.chassis.domain.panic
> SC_FAULT_CNT:   fault.chassis.SPARC-Enterprise.memory.block.ue: 2
> SC_FAULT_LIST: 
> Oct:31:19:46:30.1130+fault.chassis.SPARC-Enterprise.memory.block.ue+SCF-8001-0Q+Oct:31:19:46:27.1432+ereport.chassis 
>
> .SPARC-Enterprise.mem.block.ue+hc:///chassis=0/cmu=0/mem=2+beff2bef-25b3-402e-b422-bece7fa891cc|Oct:31:19:46:27.0714+fault.chassis.S 
>
> PARC-Enterprise.memory.block.ue+SCF-8001-0Q+Oct:31:19:46:26.4178+ereport.chassis.SPARC-Enterprise.mem.block.ue+hc:///chassis=0/cmu=0 
>
> /mem=3+43ce7c0f-1ac6-493e-9282-1da3e4493aaf
> SC_SHOWLOGSERROR_CNT:  Msg: XSCF command: System status change (OS 
> panic) (DID#01, path: 00): 1+Msg: DIMM serious error: 2
> SC_SHOWLOGSERROR_LIST: Oct:31:19:46:26.373:PDT:2007+FRU: 
> /CMU#0/MEM#02B+Msg: DIMM serious 
> error|Oct:31:19:46:20.616:PDT:2007+FRU: /C
> MU#0/MEM#03B+Msg: DIMM serious error|Oct:31:19:46:20.640:PDT:2007+FRU: 
> /UNSPECIFIED,/UNSPECIFIED+Msg: XSCF command: System status ch
> ange (OS panic) (DID#01, path: 00)
>
> - The following errors are injected and their outout in own directory
> # uname -a
> SunOS sca-v240-0 5.10 Generic_118833-33 sun4u sparc SUNW,Sun-Fire-V240
> # pwd
> /betlog/sca-dc3-0-sc0/8-sca-dc3-0-sc0_10_31
> # ls
> 0_sca-dc3-0-b_mtst_v_kdue            2_sca-dc3-0-b_mtst_v_udue   
> ereportFaultTableConverted
> 10_sca-dc3-0-b_mtst_v_ce             3_sca-dc3-0-b_mtst_v_uiue   fh.env
> 11_sca-dc3-0-b_mtst_v_mmisc1=1_ce    4_sca-dc3-0-b_mtst_v_kduetl1   
> injections
> 12_sca-dc3-0-b_mtst_v_mmisc1=1_kdue  5_sca-dc3-0-b_mtst_v_kiuetl1   keep
> 13_sca-dc3-0-b_mtst_v_mmisc1=1_kiue  6_sca-dc3-0-b_mtst_v_kdmtlb   
> monitorlog
> 14_sca-dc3-0-b_mtst_v_mmisc1=1_udue  7_sca-dc3-0-b_mtst_v_kimtlb   rerun
> 15_sca-dc3-0-b_mtst_v_mmisc1=1_uiue  8_sca-dc3-0-b_mtst_v_udmtlb   
> sc_before
> 16_sca-dc3-0-b_mtst_v_pue            9_sca-dc3-0-b_mtst_v_uimtlb   
> sc_diff
> 17_sca-dc3-0-b_mtst_v_ice            SummaryAll-8-sca-dc3-0-sc0_10_31 
>   sc_final
> 18_sca-dc3-0-b_mtst_v_mmisc1=1_ice   ereportFaultTable   
> summary_8-sca-dc3-0-sc0_10_31
> 1_sca-dc3-0-b_mtst_v_kiue            ereportFaultTableCombined
> #
>
> - Each error logs the following files
> # ls 0_sca-dc3-0-b_mtst_v_kdue/
> 20071031_194315                     domain_before sc_before
> Detail.report.log.20071031_194315   domain_diff sc_diff
> Summary.report.log.20071031_194315  faultserverlog_after 
> sgfmaclient.log.20071031_194315
> betconf-mtst_v_kdue                 faultserverlog_before
> domain_after                        sc_after
> #
>
> Thanks.
>
> kenneth
>
> Michael Larosa Jr wrote:
>> all of these commands....
>>
>>         mtst -b cpuid=0 -v kdwdu
>>         mtst -b cpuid=2 -v kdue
>>         mtst -b cpuid=0 -v udedus
>>         mtst -b cpuid=0 -v kdemu
>>         mtst -b cpuid=0 -v kducutl1
>>         mtst -b cpuid=0 -v kiucutl1
>>
>> cause...
>>
>> SUN4U-8000-6H  fault.cpu.ultraSPARC-III.l2cachedata
>>
>> I went thru all the uncorrectable commands for mtst, all i get is 
>> SUN4U-8000-6H,
>>
>> is there a way to get these msg-ids's to inject ?
>>
>> SUN4U-8000-1A  fault.memory.page
>> SUN4U-8000-2S  fault.memory.dimm
>> SUN4U-8000-35  fault.memory.bank
>>
>> SUN4U-8000-7D  fault.cpu.ultraSPARC-III.l2cachetag
>> SUN4U-8007-1Y  fault.asic.cds.dp
>>
>>
>> thanks,
>>
>> Mike,
>>
>>
>> Rob Johnston wrote:
>>
>>
>>> Michael Larosa Jr wrote:
>>>
>>>
>>>> Morning Rob,
>>>>
>>>> You can say no but i have to ask ;)
>>>
>>>
>>>> Do you know the mtst syntax that would produce the following fma 
>>>> msg-id #'s ?
>>>>
>>>> SUN4U-8000-1A  fault.memory.page
>>>> SUN4U-8000-2S  fault.memory.dimm
>>>> SUN4U-8000-35  fault.memory.bank
>>>> SUN4U-8000-6H  fault.cpu.ultraSPARC-III.l2cachedata
>>>> SUN4U-8000-7D  fault.cpu.ultraSPARC-III.l2cachetag
>>>> SUN4U-8007-1Y  fault.asic.cds.dp
>>>>
>>>> I have a script written by somebody else that produced specific 
>>>> msg-id's.   It was for 490/890
>>>> daktari machines.
>>>>
>>>> I'm shooting in the dark trying to modify his mtst commands to 
>>>> produce specific msg-id's
>>>> on a 3800-6900 machine, serengeti.
>>>
>>>
>>> Hi Mike,
>>>
>>> What specific output are you trying to get examples of?  Do you just 
>>> want to see these diagnosis messages dumped to the console?  If so, 
>>> you can use the attached program  as follows (no need to actually 
>>> inject errors for that):
>>>
>>> ./dump_msg.sparc <MSG-ID>
>>>
>>> i.e.
>>>
>>> ./dump_msg.sparc SUN4U-8000-6H
>>>
>>> rob
>>>
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> fm-discuss mailing list
>>> fm-discuss at opensolaris.org
>>>
>>>
>>
>>
>>
> ------------------------------------------------------------------------
>
> _______________________________________________
> fm-discuss mailing list
> fm-discuss at opensolaris.org



More information about the fm-discuss mailing list