[fm-discuss] extracting fm ereport and other fm data from crashdumps
Victor Latushkin
Victor.Latushkin at Sun.COM
Mon Sep 25 14:04:21 PDT 2006
Hello All,
I have a crashdump from S10 6/06 box which experienced Fatal System Bus
error and generated crash dump. After recovery I cannot find any signs
of ereports saved into persistent storage. I supposed that after a panic
fault manager should extract such data but it looks like this is not the
case.
So I had to look for that data in the crashdump manually. I managed to
find and extract corresponding cb_errstate_t from the crash dump and
analyze it, but I wonder if there are more friendly tools (mdb macros,
dcmds etc) to look for fm data in crashdumps.
In the process of analyzing cb_errstate contents I found small
bug(typo). See below
SolarisCAT(vmcore.0/10U)> sdump 0x2a10739dd98 cb_errstate_t
{
cb_err_class = 0x1959fb0 "saf.parb" <---- indicates PERR on Leaf B
cb_bridge_type = 0x1959f68 "xmits"
cb_csr = 0x155555401a00006
cb_err = 0xf8000000000003e0
cb_intr = 0x80000000000fc017
cb_elog = 0x40000 <---- bit 18 is set, hence PERR on Leaf A
cb_ecc = 0xe000000000000000
cb_pcr = 0
cb_ue_afsr = 0x9405017a
cb_ue_afar = 0x50600
cb_ce_afsr = 0x1401011f
cb_ce_afar = 0x450620
cb_first_elog = 0x40000
cb_first_eaddr = 0
cb_leaf_status = 0
cb_pbm = [ {
pbm_err_class = NULL
pbm_pri = 0
pbm_log = 0
pbm_err = 0
pbm_multi = 0
pbm_bridge_type = 0x195a568 "xmits"
pbm_ctl_stat = 0x60000011f003f
pbm_afsr = 0
pbm_afar = 0
pbm_va_log = 0
pbm_err_sl = 0x6
pbm_iommu = {
iommu_stat = 0x70003
iommu_tfar = 0
}
pbm_pcix_stat = 0
pbm_pcix_pfar = 0
pbm_pci = {
pci_err_class = NULL
pci_cfg_stat = 0x2a0
pci_cfg_comm = 0x146
pci_pa = 0
}
pbm_terr_class = NULL
} {
pbm_err_class = NULL
pbm_pri = 0
pbm_log = 0
pbm_err = 0
pbm_multi = 0
pbm_bridge_type = 0x195a568 "xmits"
pbm_ctl_stat = 0x2011f003f
pbm_afsr = 0
pbm_afar = 0
pbm_va_log = 0
pbm_err_sl = 0
pbm_iommu = {
iommu_stat = 0x70003
iommu_tfar = 0
}
pbm_pcix_stat = 0
pbm_pcix_pfar = 0
pbm_pci = {
pci_err_class = NULL
pci_cfg_stat = 0x2a0
pci_cfg_comm = 0x146
pci_pa = 0
}
pbm_terr_class = NULL
} ]
}
SolarisCAT(vmcore.0/10U)>
Bit 18 in cb_elog is set and this means that we have
XMITS_CB_ELOG_PAR_ERR_INT_PCIA and this contradict with cb_err_class
which is "saf.parb".
Corresponding description strings look like this:
http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/sys/fm/io/sun4upci.h
95 #define SAFARI_BAD_CMD_PCIA "saf.bca"
96 #define SAFARI_BAD_CMD_PCIB "saf.bcb"
97 #define SAFARI_PAR_ERR_INT_PCIB "saf.para"
98 #define SAFARI_PAR_ERR_INT_PCIA "saf.parb"
99 #define SAFARI_PAR_ERR_INT_SAF "saf.pars"
100 #define SAFARI_PLL_ERR_PCIB "saf.pllb"
101 #define SAFARI_PLL_ERR_PCIA "saf.plla"
So there is a typo in lines 97 and 98. I filed bug # 6474487 for this.
Wbr,
Victor Latushkin
More information about the fm-discuss
mailing list