[fm-discuss] Fmadm faulty not working
Michael Shapiro
mws at zion.eng.sun.com
Fri Feb 9 18:23:42 PST 2007
>
> Howdy,
>
> While doing some testing on a Sun Enterprise 4500, I started to get 100s of mess
> ages similar to the following:
>
> Feb 9 17:16:32 rx SUNW,UltraSPARC-II: WARNING: [AFT0] Sticky Softerror
> encountered on Memory Module Board 2 J3800
> Feb 9 17:16:32 rx unix: NOTICE: Scheduling removal of page 0x00000001.e
> e41e258
> Feb 9 17:16:32 rx SUNW,UltraSPARC-II: WARNING: [AFT0] Sticky Softerror
> encountered on Memory Module Board 2 J3800
> Feb 9 17:16:32 rx last message repeated 1 time
> Feb 9 17:16:32 rx unix: NOTICE: Page 0x00000001.ee41e000 removed from s
> ervice
> Feb 9 17:16:32 rx SUNW,UltraSPARC-II: WARNING: [AFT0] Sticky Softerror
> encountered on Memory Module Board 2 J3800
> Feb 9 17:16:32 rx unix: NOTICE: Scheduling removal of page 0x00000001.e
> e41a258
> Feb 9 17:16:32 rx SUNW,UltraSPARC-II: WARNING: [AFT0] Sticky Softerror
> encountered on Memory Module Board 2 J3800
> Feb 9 17:16:32 rx last message repeated 19 times
> Feb 9 17:16:32 rx unix: NOTICE: Page 0x00000001.ee41a000 removed from s
> ervice
>
> When I run fmdump to view the error log, I see lots of entries:
> ...
The memory errors there are pre-FMA messages: there is no FMA support
for UltraSPARC-II. So you're seeing the old pre-FMA code. Meantime,
your faulty memory is being accessed for DMA by your I/O path, and that
is producing FMA error reports, but our I/O diagnosis code knows that
they are secondary effects, so it's ignoring them and not diagnosing anything.
So if you had FMA for US-II, you'd see a DIMM diagnosis. Anyway,
this is why we did FMA :) But I'm sorry it's not retroactively
available for the really old UltraSPARCS ...
-Mike
--
Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/
More information about the fm-discuss
mailing list