[fm-discuss] having problems with mem option on 6800
Mike LaRosa
Mc.Larosa at Sun.COM
Fri Nov 9 07:34:03 PST 2007
Doug Baker - Sun UK - Support Engineer wrote:
> Mike LaRosa wrote:
>> Thanks Kenneth,
>>
>> Since this e-mail i've also managed to get a SUN4U-8000-35
>> fault.memory.bank to produce, so getting the
>> fault.memory.page and fault.memory.dimm to produce would be great,
>> I'd like to look at your script for the
>> mtst syntax of how you produced fault.memory.page and
>> fault.memory.dimm ;)
>
> fault.memory.page requires no user action so would never appear in the
> messages files.
>
> If you have already been injecting multiple CEs you are likely already
> to have generated some of these faults in the fltlog. Try running
> fmadm faulty -a to see the faults present which require no action.
nothing in fmadm faulty -a
>
> fault.memory.dimm on a V490 would require you to generate potentially
> thousands of fault.memory.page faults on a single DIMM which is
> probably impossible using mtst.
that's my experience ;) ran it 350 times in a script... nothing in
messages or fmdump although i do see all the entries
in the ereports...
ereport.cpu.ultraSPARC-III.ce
Mike,
>
> Regards,
>
> Douglas
>
>>
>> Thanks Again,
>>
>> Mike,
>>
>> Kenneth Wong wrote:
>>> Michael,
>>>
>>> I have written a tool that uses mtst to inject error and then query
>>> the domain and the SC for FMA ereports and faults. Attached is a
>>> summary of what gets reported. In my testing I have seen the
>>> fault.memory.page and fault.memory.dimm faults. Note I developed
>>> this tool on the OPL platform but should work on other platforms.
>>> Please ignore the first 2 lines regarding the PASS/FAIL status. I
>>> need to implement a lookup table to do the analysis of what is
>>> expected to what is generated. But you should be able to follow what
>>> happens when a particular mtst error is injected. The ereports and
>>> faults are reported with their timestamps. Also the fault is made up
>>> of the fault class, msg-id, ereport mapping, fru, and uuid. The
>>> strategy I used is take a snapshot of the before and after picture
>>> and process the diff to figure out what happend. It is a Perl/Expect
>>> script. If you are interested in using this tool please let me know.
>>>
>>> - Report of injecting kdue
>>> 0. kdue MEM,UE Domain : sca-dc3-0-b SUCCESS
>>> UNKNOWN PASS PASS FAIL
>>> Wed Oct 31 19:58:04 PDT 2007 SC : sca-dc3-0-sc0
>>> UNKNOWN PASS PASS FAIL
>>> DOMAIN_EREPORT_CNT: ereport.cpu.SPARC64-VI.ue-mem: 1
>>> DOMAIN_EREPORT_LIST: Oct:31:19:43:31.7898+ereport.cpu.SPARC64-VI.ue-mem
>>> DOMAIN_FAULT_CNT: fault.memory.page: 1
>>> DOMAIN_FAULT_LIST:
>>> Oct:31:19:56:50.4059+fault.memory.page+SUN4U-8000-1A+Oct:31:19:43:31.7898+ereport.cpu.SPARC64-VI.ue-mem+mem:///u
>>>
>>> num=/CMU00/MEM03B,MEM02B/physaddr=3cb51d60000+d8449033-dc50-e771-9b3c-bad6c6285b45
>>>
>>> SC_EREPORT_CNT: ereport.chassis.domain.panic:
>>> 1+ereport.chassis.SPARC-Enterprise.mem.block.ue: 2
>>> SC_EREPORT_LIST:
>>> Oct:31:19:46:26.4178+ereport.chassis.SPARC-Enterprise.mem.block.ue|Oct:31:19:46:27.1432+ereport.chassis.SPARC-Enter
>>>
>>> prise.mem.block.ue|Oct:31:19:51:20.7347+ereport.chassis.domain.panic
>>> SC_FAULT_CNT: fault.chassis.SPARC-Enterprise.memory.block.ue: 2
>>> SC_FAULT_LIST:
>>> Oct:31:19:46:30.1130+fault.chassis.SPARC-Enterprise.memory.block.ue+SCF-8001-0Q+Oct:31:19:46:27.1432+ereport.chassis
>>>
>>> .SPARC-Enterprise.mem.block.ue+hc:///chassis=0/cmu=0/mem=2+beff2bef-25b3-402e-b422-bece7fa891cc|Oct:31:19:46:27.0714+fault.chassis.S
>>>
>>> PARC-Enterprise.memory.block.ue+SCF-8001-0Q+Oct:31:19:46:26.4178+ereport.chassis.SPARC-Enterprise.mem.block.ue+hc:///chassis=0/cmu=0
>>>
>>> /mem=3+43ce7c0f-1ac6-493e-9282-1da3e4493aaf
>>> SC_SHOWLOGSERROR_CNT: Msg: XSCF command: System status change (OS
>>> panic) (DID#01, path: 00): 1+Msg: DIMM serious error: 2
>>> SC_SHOWLOGSERROR_LIST: Oct:31:19:46:26.373:PDT:2007+FRU:
>>> /CMU#0/MEM#02B+Msg: DIMM serious
>>> error|Oct:31:19:46:20.616:PDT:2007+FRU: /C
>>> MU#0/MEM#03B+Msg: DIMM serious
>>> error|Oct:31:19:46:20.640:PDT:2007+FRU:
>>> /UNSPECIFIED,/UNSPECIFIED+Msg: XSCF command: System status ch
>>> ange (OS panic) (DID#01, path: 00)
>>>
>>> - The following errors are injected and their outout in own directory
>>> # uname -a
>>> SunOS sca-v240-0 5.10 Generic_118833-33 sun4u sparc SUNW,Sun-Fire-V240
>>> # pwd
>>> /betlog/sca-dc3-0-sc0/8-sca-dc3-0-sc0_10_31
>>> # ls
>>> 0_sca-dc3-0-b_mtst_v_kdue 2_sca-dc3-0-b_mtst_v_udue
>>> ereportFaultTableConverted
>>> 10_sca-dc3-0-b_mtst_v_ce 3_sca-dc3-0-b_mtst_v_uiue fh.env
>>> 11_sca-dc3-0-b_mtst_v_mmisc1=1_ce 4_sca-dc3-0-b_mtst_v_kduetl1
>>> injections
>>> 12_sca-dc3-0-b_mtst_v_mmisc1=1_kdue 5_sca-dc3-0-b_mtst_v_kiuetl1
>>> keep
>>> 13_sca-dc3-0-b_mtst_v_mmisc1=1_kiue 6_sca-dc3-0-b_mtst_v_kdmtlb
>>> monitorlog
>>> 14_sca-dc3-0-b_mtst_v_mmisc1=1_udue 7_sca-dc3-0-b_mtst_v_kimtlb
>>> rerun
>>> 15_sca-dc3-0-b_mtst_v_mmisc1=1_uiue 8_sca-dc3-0-b_mtst_v_udmtlb
>>> sc_before
>>> 16_sca-dc3-0-b_mtst_v_pue 9_sca-dc3-0-b_mtst_v_uimtlb
>>> sc_diff
>>> 17_sca-dc3-0-b_mtst_v_ice
>>> SummaryAll-8-sca-dc3-0-sc0_10_31 sc_final
>>> 18_sca-dc3-0-b_mtst_v_mmisc1=1_ice ereportFaultTable
>>> summary_8-sca-dc3-0-sc0_10_31
>>> 1_sca-dc3-0-b_mtst_v_kiue ereportFaultTableCombined
>>> #
>>>
>>> - Each error logs the following files
>>> # ls 0_sca-dc3-0-b_mtst_v_kdue/
>>> 20071031_194315 domain_before sc_before
>>> Detail.report.log.20071031_194315 domain_diff sc_diff
>>> Summary.report.log.20071031_194315 faultserverlog_after
>>> sgfmaclient.log.20071031_194315
>>> betconf-mtst_v_kdue faultserverlog_before
>>> domain_after sc_after
>>> #
>>>
>>> Thanks.
>>>
>>> kenneth
>>>
>>> Michael Larosa Jr wrote:
>>>> all of these commands....
>>>>
>>>> mtst -b cpuid=0 -v kdwdu
>>>> mtst -b cpuid=2 -v kdue
>>>> mtst -b cpuid=0 -v udedus
>>>> mtst -b cpuid=0 -v kdemu
>>>> mtst -b cpuid=0 -v kducutl1
>>>> mtst -b cpuid=0 -v kiucutl1
>>>>
>>>> cause...
>>>>
>>>> SUN4U-8000-6H fault.cpu.ultraSPARC-III.l2cachedata
>>>>
>>>> I went thru all the uncorrectable commands for mtst, all i get is
>>>> SUN4U-8000-6H,
>>>>
>>>> is there a way to get these msg-ids's to inject ?
>>>>
>>>> SUN4U-8000-1A fault.memory.page
>>>> SUN4U-8000-2S fault.memory.dimm
>>>> SUN4U-8000-35 fault.memory.bank
>>>>
>>>> SUN4U-8000-7D fault.cpu.ultraSPARC-III.l2cachetag
>>>> SUN4U-8007-1Y fault.asic.cds.dp
>>>>
>>>>
>>>> thanks,
>>>>
>>>> Mike,
>>>>
>>>>
>>>> Rob Johnston wrote:
>>>>
>>>>
>>>>> Michael Larosa Jr wrote:
>>>>>
>>>>>
>>>>>> Morning Rob,
>>>>>>
>>>>>> You can say no but i have to ask ;)
>>>>>
>>>>>> Do you know the mtst syntax that would produce the following fma
>>>>>> msg-id #'s ?
>>>>>>
>>>>>> SUN4U-8000-1A fault.memory.page
>>>>>> SUN4U-8000-2S fault.memory.dimm
>>>>>> SUN4U-8000-35 fault.memory.bank
>>>>>> SUN4U-8000-6H fault.cpu.ultraSPARC-III.l2cachedata
>>>>>> SUN4U-8000-7D fault.cpu.ultraSPARC-III.l2cachetag
>>>>>> SUN4U-8007-1Y fault.asic.cds.dp
>>>>>>
>>>>>> I have a script written by somebody else that produced specific
>>>>>> msg-id's. It was for 490/890
>>>>>> daktari machines.
>>>>>>
>>>>>> I'm shooting in the dark trying to modify his mtst commands to
>>>>>> produce specific msg-id's
>>>>>> on a 3800-6900 machine, serengeti.
>>>>>
>>>>> Hi Mike,
>>>>>
>>>>> What specific output are you trying to get examples of? Do you
>>>>> just want to see these diagnosis messages dumped to the console?
>>>>> If so, you can use the attached program as follows (no need to
>>>>> actually inject errors for that):
>>>>>
>>>>> ./dump_msg.sparc <MSG-ID>
>>>>>
>>>>> i.e.
>>>>>
>>>>> ./dump_msg.sparc SUN4U-8000-6H
>>>>>
>>>>> rob
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> fm-discuss mailing list
>>>>> fm-discuss at opensolaris.org
>>>>>
>>>>>
>>>>
>>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> fm-discuss mailing list
>>> fm-discuss at opensolaris.org
>>
>> _______________________________________________
>> fm-discuss mailing list
>> fm-discuss at opensolaris.org
>
>
More information about the fm-discuss
mailing list