[indiana-discuss] System freeze in snv_106

Brian Ruthven - Sun UK Brian.Ruthven at Sun.COM
Thu Feb 5 06:44:46 PST 2009


Hi Chris,

If you are talking about the delay before the clock thread declaring the 
system has hung, then yes, this is tunable. The variable is 
snoop_interval and the units are expressed as microseconds. Default 
value is 50000000 (50 seconds). See 
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/conf/param.c 
for the definition of SNOOP_INTERVAL_DEFAULT and snoop_interval.

For example, put the following line in /etc/system and reboot:

    set snooping=1
    set snoop_interval = 10000000

This will set the deadman timer to 10 seconds and enable the deadman 
checking code in clock.c. I would not advise setting either variable on 
a live system using mdb.

The value is a balance between getting the system back quickly vs. 
accidentally tripping the system for a period of inactivity due to 
hardware intervention (e.g. repeated correction of a system bus event by 
the hardware). You can set it to whatever you want (minimum value is 1 
second) for debugging purposes, but watch out for some false hits if you 
are too aggressive with the setting.

Also bear in mind that with a large system which takes 15 minutes to 
reboot anyway, 50 seconds is not a lot :-)

Hope that helps,
Brian


Chris Ridd wrote:
>>>> Eric Saxe has an indepth example at http://blogs.sun.com/esaxe/entry/debugging_solaris_scheduling_problems_and
>>>>         
>>> Nothing detailed then? :-)
>>>       
>> basically, a very-high priority timer is set up, and all that's done  
>> when it fires is check whether the system clock (which is driven by  
>> a lower-priority interrupt) has progressed. If the clock does not  
>> progress for a given number of times the timer fires, we assume the  
>> machine is hung and panic the machine.
>>     
>
> Is the given number (50?) tunable at all? It feels like it might make  
> sense to have smaller values on a server (non-responsiveness means the  
> company is losing money) and higher values on a workstation (non- 
> responsiveness means I go and have a cup of tea). Different values,  
> anyway.
>   

-- 
Brian Ruthven                                        Sun Microsystems UK
Solaris Revenue Product Engineering             Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.opensolaris.org/pipermail/indiana-discuss/attachments/20090205/e1436084/attachment.html>


More information about the indiana-discuss mailing list