[fm-discuss] extracting fm ereport and other fm data from crashdumps

Victor Latushkin Victor.Latushkin at Sun.COM
Mon Sep 25 14:04:21 PDT 2006


Hello All,

I have a crashdump from S10 6/06 box which experienced Fatal System Bus 
error and generated crash dump. After recovery I cannot find any signs 
of ereports saved into persistent storage. I supposed that after a panic 
fault manager should extract such data but it looks like this is not the 
case.

So I had to look for that data in the crashdump manually. I managed to 
find and extract corresponding cb_errstate_t from the crash dump and 
analyze it, but I wonder if there are more friendly tools (mdb macros, 
dcmds etc) to look for fm data in crashdumps.

In the process of analyzing cb_errstate contents I found small 
bug(typo). See below

SolarisCAT(vmcore.0/10U)> sdump 0x2a10739dd98 cb_errstate_t
{
    cb_err_class = 0x1959fb0 "saf.parb" <---- indicates PERR on Leaf B
    cb_bridge_type = 0x1959f68 "xmits"
    cb_csr = 0x155555401a00006
    cb_err = 0xf8000000000003e0
    cb_intr = 0x80000000000fc017
    cb_elog = 0x40000        <---- bit 18 is set, hence PERR on Leaf A
    cb_ecc = 0xe000000000000000
    cb_pcr = 0
    cb_ue_afsr = 0x9405017a
    cb_ue_afar = 0x50600
    cb_ce_afsr = 0x1401011f
    cb_ce_afar = 0x450620
    cb_first_elog = 0x40000
    cb_first_eaddr = 0
    cb_leaf_status = 0
    cb_pbm = [ {
          pbm_err_class = NULL
          pbm_pri = 0
          pbm_log = 0
          pbm_err = 0
          pbm_multi = 0
          pbm_bridge_type = 0x195a568 "xmits"
          pbm_ctl_stat = 0x60000011f003f
          pbm_afsr = 0
          pbm_afar = 0
          pbm_va_log = 0
          pbm_err_sl = 0x6
          pbm_iommu = {
             iommu_stat = 0x70003
             iommu_tfar = 0
          }
          pbm_pcix_stat = 0
          pbm_pcix_pfar = 0
          pbm_pci = {
             pci_err_class = NULL
             pci_cfg_stat = 0x2a0
             pci_cfg_comm = 0x146
             pci_pa = 0
          }
          pbm_terr_class = NULL
       } {
          pbm_err_class = NULL
          pbm_pri = 0
          pbm_log = 0
          pbm_err = 0
          pbm_multi = 0
          pbm_bridge_type = 0x195a568 "xmits"
          pbm_ctl_stat = 0x2011f003f
          pbm_afsr = 0
          pbm_afar = 0
          pbm_va_log = 0
          pbm_err_sl = 0
          pbm_iommu = {
             iommu_stat = 0x70003
             iommu_tfar = 0
          }
          pbm_pcix_stat = 0
          pbm_pcix_pfar = 0
          pbm_pci = {
             pci_err_class = NULL
             pci_cfg_stat = 0x2a0
             pci_cfg_comm = 0x146
             pci_pa = 0
          }
          pbm_terr_class = NULL
       } ]
}
SolarisCAT(vmcore.0/10U)>

Bit 18 in cb_elog is set and this means that we have 
XMITS_CB_ELOG_PAR_ERR_INT_PCIA and this contradict with cb_err_class 
which is "saf.parb".

Corresponding description strings look like this:
http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/sys/fm/io/sun4upci.h
      95 #define	SAFARI_BAD_CMD_PCIA		"saf.bca"
      96 #define	SAFARI_BAD_CMD_PCIB		"saf.bcb"
      97 #define	SAFARI_PAR_ERR_INT_PCIB		"saf.para"
      98 #define	SAFARI_PAR_ERR_INT_PCIA		"saf.parb"
      99 #define	SAFARI_PAR_ERR_INT_SAF		"saf.pars"
     100 #define	SAFARI_PLL_ERR_PCIB		"saf.pllb"
     101 #define	SAFARI_PLL_ERR_PCIA		"saf.plla"

So there is a typo in lines 97 and 98. I filed bug # 6474487 for this.


Wbr,
Victor Latushkin



More information about the fm-discuss mailing list