[fm-discuss] having problems with mem option on 6800
Mike LaRosa
Mc.Larosa at Sun.COM
Fri Nov 9 07:12:50 PST 2007
Thanks Kenneth,
Since this e-mail i've also managed to get a SUN4U-8000-35
fault.memory.bank to produce, so getting the
fault.memory.page and fault.memory.dimm to produce would be great, I'd
like to look at your script for the
mtst syntax of how you produced fault.memory.page and fault.memory.dimm ;)
Thanks Again,
Mike,
Kenneth Wong wrote:
> Michael,
>
> I have written a tool that uses mtst to inject error and then query
> the domain and the SC for FMA ereports and faults. Attached is a
> summary of what gets reported. In my testing I have seen the
> fault.memory.page and fault.memory.dimm faults. Note I developed this
> tool on the OPL platform but should work on other platforms. Please
> ignore the first 2 lines regarding the PASS/FAIL status. I need to
> implement a lookup table to do the analysis of what is expected to
> what is generated. But you should be able to follow what happens when
> a particular mtst error is injected. The ereports and faults are
> reported with their timestamps. Also the fault is made up of the fault
> class, msg-id, ereport mapping, fru, and uuid. The strategy I used is
> take a snapshot of the before and after picture and process the diff
> to figure out what happend. It is a Perl/Expect script. If you are
> interested in using this tool please let me know.
>
> - Report of injecting kdue
> 0. kdue MEM,UE Domain : sca-dc3-0-b SUCCESS
> UNKNOWN PASS PASS FAIL
> Wed Oct 31 19:58:04 PDT 2007 SC : sca-dc3-0-sc0
> UNKNOWN PASS PASS FAIL
> DOMAIN_EREPORT_CNT: ereport.cpu.SPARC64-VI.ue-mem: 1
> DOMAIN_EREPORT_LIST: Oct:31:19:43:31.7898+ereport.cpu.SPARC64-VI.ue-mem
> DOMAIN_FAULT_CNT: fault.memory.page: 1
> DOMAIN_FAULT_LIST:
> Oct:31:19:56:50.4059+fault.memory.page+SUN4U-8000-1A+Oct:31:19:43:31.7898+ereport.cpu.SPARC64-VI.ue-mem+mem:///u
>
> num=/CMU00/MEM03B,MEM02B/physaddr=3cb51d60000+d8449033-dc50-e771-9b3c-bad6c6285b45
>
> SC_EREPORT_CNT: ereport.chassis.domain.panic:
> 1+ereport.chassis.SPARC-Enterprise.mem.block.ue: 2
> SC_EREPORT_LIST:
> Oct:31:19:46:26.4178+ereport.chassis.SPARC-Enterprise.mem.block.ue|Oct:31:19:46:27.1432+ereport.chassis.SPARC-Enter
>
> prise.mem.block.ue|Oct:31:19:51:20.7347+ereport.chassis.domain.panic
> SC_FAULT_CNT: fault.chassis.SPARC-Enterprise.memory.block.ue: 2
> SC_FAULT_LIST:
> Oct:31:19:46:30.1130+fault.chassis.SPARC-Enterprise.memory.block.ue+SCF-8001-0Q+Oct:31:19:46:27.1432+ereport.chassis
>
> .SPARC-Enterprise.mem.block.ue+hc:///chassis=0/cmu=0/mem=2+beff2bef-25b3-402e-b422-bece7fa891cc|Oct:31:19:46:27.0714+fault.chassis.S
>
> PARC-Enterprise.memory.block.ue+SCF-8001-0Q+Oct:31:19:46:26.4178+ereport.chassis.SPARC-Enterprise.mem.block.ue+hc:///chassis=0/cmu=0
>
> /mem=3+43ce7c0f-1ac6-493e-9282-1da3e4493aaf
> SC_SHOWLOGSERROR_CNT: Msg: XSCF command: System status change (OS
> panic) (DID#01, path: 00): 1+Msg: DIMM serious error: 2
> SC_SHOWLOGSERROR_LIST: Oct:31:19:46:26.373:PDT:2007+FRU:
> /CMU#0/MEM#02B+Msg: DIMM serious
> error|Oct:31:19:46:20.616:PDT:2007+FRU: /C
> MU#0/MEM#03B+Msg: DIMM serious error|Oct:31:19:46:20.640:PDT:2007+FRU:
> /UNSPECIFIED,/UNSPECIFIED+Msg: XSCF command: System status ch
> ange (OS panic) (DID#01, path: 00)
>
> - The following errors are injected and their outout in own directory
> # uname -a
> SunOS sca-v240-0 5.10 Generic_118833-33 sun4u sparc SUNW,Sun-Fire-V240
> # pwd
> /betlog/sca-dc3-0-sc0/8-sca-dc3-0-sc0_10_31
> # ls
> 0_sca-dc3-0-b_mtst_v_kdue 2_sca-dc3-0-b_mtst_v_udue
> ereportFaultTableConverted
> 10_sca-dc3-0-b_mtst_v_ce 3_sca-dc3-0-b_mtst_v_uiue fh.env
> 11_sca-dc3-0-b_mtst_v_mmisc1=1_ce 4_sca-dc3-0-b_mtst_v_kduetl1
> injections
> 12_sca-dc3-0-b_mtst_v_mmisc1=1_kdue 5_sca-dc3-0-b_mtst_v_kiuetl1 keep
> 13_sca-dc3-0-b_mtst_v_mmisc1=1_kiue 6_sca-dc3-0-b_mtst_v_kdmtlb
> monitorlog
> 14_sca-dc3-0-b_mtst_v_mmisc1=1_udue 7_sca-dc3-0-b_mtst_v_kimtlb rerun
> 15_sca-dc3-0-b_mtst_v_mmisc1=1_uiue 8_sca-dc3-0-b_mtst_v_udmtlb
> sc_before
> 16_sca-dc3-0-b_mtst_v_pue 9_sca-dc3-0-b_mtst_v_uimtlb
> sc_diff
> 17_sca-dc3-0-b_mtst_v_ice SummaryAll-8-sca-dc3-0-sc0_10_31
> sc_final
> 18_sca-dc3-0-b_mtst_v_mmisc1=1_ice ereportFaultTable
> summary_8-sca-dc3-0-sc0_10_31
> 1_sca-dc3-0-b_mtst_v_kiue ereportFaultTableCombined
> #
>
> - Each error logs the following files
> # ls 0_sca-dc3-0-b_mtst_v_kdue/
> 20071031_194315 domain_before sc_before
> Detail.report.log.20071031_194315 domain_diff sc_diff
> Summary.report.log.20071031_194315 faultserverlog_after
> sgfmaclient.log.20071031_194315
> betconf-mtst_v_kdue faultserverlog_before
> domain_after sc_after
> #
>
> Thanks.
>
> kenneth
>
> Michael Larosa Jr wrote:
>> all of these commands....
>>
>> mtst -b cpuid=0 -v kdwdu
>> mtst -b cpuid=2 -v kdue
>> mtst -b cpuid=0 -v udedus
>> mtst -b cpuid=0 -v kdemu
>> mtst -b cpuid=0 -v kducutl1
>> mtst -b cpuid=0 -v kiucutl1
>>
>> cause...
>>
>> SUN4U-8000-6H fault.cpu.ultraSPARC-III.l2cachedata
>>
>> I went thru all the uncorrectable commands for mtst, all i get is
>> SUN4U-8000-6H,
>>
>> is there a way to get these msg-ids's to inject ?
>>
>> SUN4U-8000-1A fault.memory.page
>> SUN4U-8000-2S fault.memory.dimm
>> SUN4U-8000-35 fault.memory.bank
>>
>> SUN4U-8000-7D fault.cpu.ultraSPARC-III.l2cachetag
>> SUN4U-8007-1Y fault.asic.cds.dp
>>
>>
>> thanks,
>>
>> Mike,
>>
>>
>> Rob Johnston wrote:
>>
>>
>>> Michael Larosa Jr wrote:
>>>
>>>
>>>> Morning Rob,
>>>>
>>>> You can say no but i have to ask ;)
>>>
>>>
>>>> Do you know the mtst syntax that would produce the following fma
>>>> msg-id #'s ?
>>>>
>>>> SUN4U-8000-1A fault.memory.page
>>>> SUN4U-8000-2S fault.memory.dimm
>>>> SUN4U-8000-35 fault.memory.bank
>>>> SUN4U-8000-6H fault.cpu.ultraSPARC-III.l2cachedata
>>>> SUN4U-8000-7D fault.cpu.ultraSPARC-III.l2cachetag
>>>> SUN4U-8007-1Y fault.asic.cds.dp
>>>>
>>>> I have a script written by somebody else that produced specific
>>>> msg-id's. It was for 490/890
>>>> daktari machines.
>>>>
>>>> I'm shooting in the dark trying to modify his mtst commands to
>>>> produce specific msg-id's
>>>> on a 3800-6900 machine, serengeti.
>>>
>>>
>>> Hi Mike,
>>>
>>> What specific output are you trying to get examples of? Do you just
>>> want to see these diagnosis messages dumped to the console? If so,
>>> you can use the attached program as follows (no need to actually
>>> inject errors for that):
>>>
>>> ./dump_msg.sparc <MSG-ID>
>>>
>>> i.e.
>>>
>>> ./dump_msg.sparc SUN4U-8000-6H
>>>
>>> rob
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> fm-discuss mailing list
>>> fm-discuss at opensolaris.org
>>>
>>>
>>
>>
>>
> ------------------------------------------------------------------------
>
> _______________________________________________
> fm-discuss mailing list
> fm-discuss at opensolaris.org
More information about the fm-discuss
mailing list