[fm-discuss] fmd gone insane

Cynthia A. McGuire Cynthia.McGuire at Sun.COM
Mon Nov 19 12:25:58 PST 2007


Actually, I was suggesting that you unload the etm (I think that's what it's called).  ETM is the module responsible for sending ereports from the SP to the solaris domain.  Anyway, one of the T2000 engineers responded to my query about your problem as follows:

"My guess is most likely the customer has hit PLX switch errata #50.  A "Receiver Error" Correctable Error bit is stuck.

There is a SW workaround.  If customer doesn't want to do a patch...  Add the following in the /etc/system
   set pcie:pcie_aer_ce_mask = 1

and reboot the system.

This will mask a "Receiver Errors" for ALL PCIe device.  The side effect is that you won't see these CE errors.  Too many of these may lead up to CRC or DLP errors, which will be detected as a UE.  So nothing is quite lost except some history.

I don't remember if you can confirm this easily by looking at the ereports.  I'm not sure if the CE log register will always show something.  Though for s10u4, you should be getting a bunch of fire.fabric ereports, with the msg_id = 30 (which means CE error)"

Cindi
--
This message posted from opensolaris.org


More information about the fm-discuss mailing list