[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again

Dan Mick dan.mick at sun.com
Fri Dec 15 14:02:43 PST 2006


Jon Haslam wrote:
> Hi,
> 
> Yes, this looks to be an issue. I've had the same problem on some
> dual core M2 systems and so has a colleague.
> 
> By the look of it, our high res timer gets severely out of whack every
> now and then which causes us to think that things have hung. I'll
> let you know when I know more.

If that's the problem, it might be fixed by

6342823 Unable to offline CPU 0 on x86 systems

which is in snv_34, and supposed to be in Update 2.  I don't know if the 
system below is update 2 or not.

(the problem was actually a concomitant fix to timestamp.c under that bug 
number, which probably should have had its own bug number; it only affects 
multi-cpu machines where the TSCs are very different at boot time, which is 
something that's only started happening with the latest Intel Core2 and AMD 
RevE/RevF processor BIOSes, apparently...so it's a latent bug exposed by 
newer hardware)


> 
> Jon.
> 
>> Hello,
>>
>> we have 2-way (4-cores) Opteron server (x220M2):
>>
>> bash-3.00# uname -a
>> SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc
>> bash-3.00# psrinfo -v
>> Status of virtual processor 0 as of: 12/14/2006 14:08:12
>>  on-line since 12/13/2006 11:41:29.
>>  The i386 processor operates at 2613 MHz,
>>     and has an i387 compatible floating point processor.
>> Status of virtual processor 1 as of: 12/14/2006 14:08:12
>>  on-line since 12/13/2006 11:41:34.
>>  The i386 processor operates at 2613 MHz,
>>     and has an i387 compatible floating point processor.
>> Status of virtual processor 2 as of: 12/14/2006 14:08:12
>>  on-line since 12/13/2006 11:41:36.
>>  The i386 processor operates at 2613 MHz,
>>     and has an i387 compatible floating point processor.
>> Status of virtual processor 3 as of: 12/14/2006 14:08:12
>>  on-line since 12/13/2006 11:41:38.
>>  The i386 processor operates at 2613 MHz,
>>     and has an i387 compatible floating point processor.
>>
>> I was trying to catch syscalls and got unexpected message:
>> bash-3.00# dtrace -n 'syscall::: { @[execname] = count (); }'
>> dtrace: description 'syscall::: ' matched 454 probes
>> dtrace: processing aborted: Abort due to systemic unresponsiveness
>>
>> Because the server is not busy and is being prepared for a production
>> I was surprised and did dtrace-ing again with vmstat-ing it. Below is 
>> output from vmstat:
>>
>> [...]
>>
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  668  160  316  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  671  130  268  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  3  3  1  1  738  250  326  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  2  2  1  1  690  145  284  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  651  161  285  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  687  284  316  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  634  146  268  
>> 0  0 100
>> 0 0 0 14546888 7392452 0 0  0  0  0  0  0  0  0  0  0  644  150  271  
>> 0  0 100
>> 0 0 0 14546284 7391632 143 288 0 0 0 0  0  0  0  0  0  692  910  342  
>> 0  0 100
>> 0 0 0 14459332 7312120 228 3332 0 0 0 0 0  0  0  0  0  884 2050  293  
>> 5  4 91        [1]
>> 0 0 0 14459332 7312116 0 1  0  0  0  0  0  0  0  0  0  865  204  286  
>> 0  1 99
>> kthr      memory            page            disk          faults      cpu
>> r b w   swap  free  re  mf pi po fr de sr cd cd m2 m3   in   sy   cs 
>> us sy id
>> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  0  0  0  0  889  232  306  
>> 0  0 100
>> 0 0 0 14459332 7312116 0 0  0  0  0  0  0  1  1  0  0  938  326  353  
>> 0  0 100
>> 0 0 0 14546272 7391616 0 1  0  0  0  0  0  0  0  0  0 53402 1389 1109 
>> 4  3 93        [2]
>> 0 0 0 14546272 7391616 148 362 0 0 0 0  0  1  1  1  1 1691 3243 1915  
>> 3  1 96
>> 0 0 0 14545648 7390776 0 0  0  0  0  0  0  0  0  0  0  635   97  264  
>> 0  0 100
>> 0 0 0 14545648 7390776 1 3  0  0  0  0  0  0  0  0  0  687  210  340  
>> 0  0 100
>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  718  289  380  
>> 0  0 100
>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  676  224  322  
>> 0  0 100
>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  641  126  278  
>> 0  0 100
>> 0 0 0 14545648 7390772 0 0  0  0  0  0  0  0  0  0  0  642  127  263  
>> 0  0 100
>> [...]
>>
>> The [1] is when I ran the dtrace and [2] is when I got the
>> message ("unresponsiveness").
>>
>> I have read the relevant topic:
>> http://www.opensolaris.org/jive/thread.jspa?messageID=15073&
>> and am aware that:
>> - enabling destructive actions (-w)
>> or
>> - tuning below parameters for deadmen
>>    dtrace_deadman_user
>>    dtrace_deadman_interval
>>    dtrace_deadman_timeout
>> can be helpfull.  I agree that all of them are useful while server is 
>> really busy
>> but I wouldn't expect such behaviour on an idle server !
>>
>> Is there any way to solve the problem without the tweaks ? I would 
>> like to get
>> more knowledge about a nature of the problem.
>>
>> Regards
>> przemol
>>
>>
>> ----------------------------------------------------------------------
>> smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret
>> http://link.interia.pl/f19d4 - najlepsze filmy w intermecie
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>>  
>>
> 
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org



More information about the dtrace-discuss mailing list