[fm-discuss] Fmadm faulty not working

Michael Shapiro mws at zion.eng.sun.com
Fri Feb 9 18:23:42 PST 2007


> 
> Howdy,
> 
> While doing some testing on a Sun Enterprise 4500, I started to get 100s of mess
> ages similar to the following:
> 
> Feb  9 17:16:32 rx SUNW,UltraSPARC-II: WARNING: [AFT0] Sticky Softerror
> encountered on Memory Module Board 2 J3800
> Feb  9 17:16:32 rx unix: NOTICE: Scheduling removal of page 0x00000001.e
> e41e258
> Feb  9 17:16:32 rx SUNW,UltraSPARC-II: WARNING: [AFT0] Sticky Softerror
> encountered on Memory Module Board 2 J3800
> Feb  9 17:16:32 rx last message repeated 1 time
> Feb  9 17:16:32 rx unix: NOTICE: Page 0x00000001.ee41e000 removed from s
> ervice
> Feb  9 17:16:32 rx SUNW,UltraSPARC-II: WARNING: [AFT0] Sticky Softerror
> encountered on Memory Module Board 2 J3800
> Feb  9 17:16:32 rx unix: NOTICE: Scheduling removal of page 0x00000001.e
> e41a258
> Feb  9 17:16:32 rx SUNW,UltraSPARC-II: WARNING: [AFT0] Sticky Softerror
> encountered on Memory Module Board 2 J3800
> Feb  9 17:16:32 rx last message repeated 19 times
> Feb  9 17:16:32 rx unix: NOTICE: Page 0x00000001.ee41a000 removed from s
> ervice
> 
> When I run fmdump to view the error log, I see lots of entries:
> ...

The memory errors there are pre-FMA messages: there is no FMA support
for UltraSPARC-II.  So you're seeing the old pre-FMA code.  Meantime,
your faulty memory is being accessed for DMA by your I/O path, and that
is producing FMA error reports, but our I/O diagnosis code knows that
they are secondary effects, so it's ignoring them and not diagnosing anything.

So if you had FMA for US-II, you'd see a DIMM diagnosis.  Anyway,
this is why we did FMA :)  But I'm sorry it's not retroactively
available for the really old UltraSPARCS ...

-Mike
 
-- 
Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/



More information about the fm-discuss mailing list