[fm-discuss] panic[cpu0]/thread=ffffff000961dc60: Unrecoverable Machine-Check Exception

Gavin Maltby Gavin.Maltby at Sun.COM
Mon Jan 19 16:51:26 PST 2009


Hi,

Ed Kaczmarek wrote:
> 
>>
>> Changing that with kmdb involves setting deferred breakpoints.  We'll 
>> cheat
>> by first disabling everything and setting what we want in /etc/system:
>>
>> 1) Boot into kmdb as before (add -kd to unix line in grub).  At the 
>> prompt
>>    utter 'cmi_no_init/W1' then ':c' to continue.  We'll boot with loading
>>    and cpu module support, and that should get you booted I think (if not
>>    there are bigger problems)
> 
> I got big problems then...
 >
> Welcome to kmdb
> kmdb: unable to determine terminal type: assuming `vt100'
> Loaded modules: [ unix krtld genunix ]
> [0]> cmi_no_init/W1
> cmi_no_init:    0               =       0x1

Good news for me - that switches off just about all my code :-)

> [0]> :c
> SunOS Release 5.11 Version snv_106 64-bit
> Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
> Use is subject to license terms.
> WARNING: Time-of-day chip unresponsive; dead batteries?
> Configuring /dev
> \

A few more tests to try

1) Is this a dual-core system?  If so could you force use of
    just a single cpu as follows and see if we boot ok:

	boot kmdb (-kd on unix line in grub)
	use_mp/W0
	:c

2) We can force the NB watchdog to be disabled when the cpu
    module is loaded (we're getting that far because we
    saw the machine check details).  We have to use
    a deferred breakpoint for this:

	boot kmdb
	::bp cpu_ms.AuthenticAMD.15`ao_ms_init
	:c

    When the breakpoint hits on the boot cpu you'll return to kmdb.  Now

	ao_nb_watchdog_policy/W1
	:z
	:c

    That sets policy AO_NB_WDOG_DISABLE which will unconditionally disable the
    watchdog.  The :z clears breakpoints so we don't hit them on other cpus.

3) If the BIOS is enabling the watchdog Solaris does not touch it by default.
    We can force Solaris to apply its chosen watchdog rate (longest possible
    timeout) with AO_NB_WDOG_ENABLE_FORCE_RATE:

	boot kmdb
	::bp cpu_ms.AuthenticAMD.15`ao_ms_init
	:c

    When the breakpoint hits on the boot cpu you'll return to kmdb.  Now

	ao_nb_watchdog_policy/W3
	:z
	:c

4) Now here's a a real stab in the dark.  If your BIOS offers an option
    to present your SATA disks as AHCI devices (rather than the old
    and busted IDE mode) make sure that is set.  I guess ata may still
    be involved if you have an IDE DVD drive, but we may get further.
    Not sure if path to disk devices will change if you do this - machine
    may not boot for other reasons!

Thanks

Gavin

> 
> 
> I searched thru BIOS screens for any mention of any watchdog timer 
> settings.
> None.
> 
> 
>>
>> 2) Append the following to /etc/system:
>>
>>     set cpu_ms\.AuthenticAMD\.15:ao_nb_watchdog_policy=0
>>
>> 3) Reboot normally
>>
>> That will leave the watchdog as the BIOS had it, and I suspect it's
>> off by default, while leaving other functionality operational.
>>
>> I think ata has necessitated this workaround on one or two other 
>> motherboards
>> before now.  I don't know the true root cause.
>>
>> Gavin
> 



More information about the fm-discuss mailing list