[dtrace-discuss] dtrace: processing aborted: Abort due to systemic
unresponsiveness - again
Dan Mick
dan.mick at sun.com
Fri Dec 15 14:02:43 PST 2006
Jon Haslam wrote:
> Hi,
>
> Yes, this looks to be an issue. I've had the same problem on some
> dual core M2 systems and so has a colleague.
>
> By the look of it, our high res timer gets severely out of whack every
> now and then which causes us to think that things have hung. I'll
> let you know when I know more.
If that's the problem, it might be fixed by
6342823 Unable to offline CPU 0 on x86 systems
which is in snv_34, and supposed to be in Update 2. I don't know if the
system below is update 2 or not.
(the problem was actually a concomitant fix to timestamp.c under that bug
number, which probably should have had its own bug number; it only affects
multi-cpu machines where the TSCs are very different at boot time, which is
something that's only started happening with the latest Intel Core2 and AMD
RevE/RevF processor BIOSes, apparently...so it's a latent bug exposed by
newer hardware)
>
> Jon.
>
>> Hello,
>>
>> we have 2-way (4-cores) Opteron server (x220M2):
>>
>> bash-3.00# uname -a
>> SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc
>> bash-3.00# psrinfo -v
>> Status of virtual processor 0 as of: 12/14/2006 14:08:12
>> on-line since 12/13/2006 11:41:29.
>> The i386 processor operates at 2613 MHz,
>> and has an i387 compatible floating point processor.
>> Status of virtual processor 1 as of: 12/14/2006 14:08:12
>> on-line since 12/13/2006 11:41:34.
>> The i386 processor operates at 2613 MHz,
>> and has an i387 compatible floating point processor.
>> Status of virtual processor 2 as of: 12/14/2006 14:08:12
>> on-line since 12/13/2006 11:41:36.
>> The i386 processor operates at 2613 MHz,
>> and has an i387 compatible floating point processor.
>> Status of virtual processor 3 as of: 12/14/2006 14:08:12
>> on-line since 12/13/2006 11:41:38.
>> The i386 processor operates at 2613 MHz,
>> and has an i387 compatible floating point processor.
>>
>> I was trying to catch syscalls and got unexpected message:
>> bash-3.00# dtrace -n 'syscall::: { @[execname] = count (); }'
>> dtrace: description 'syscall::: ' matched 454 probes
>> dtrace: processing aborted: Abort due to systemic unresponsiveness
>>
>> Because the server is not busy and is being prepared for a production
>> I was surprised and did dtrace-ing again with vmstat-ing it. Below is
>> output from vmstat:
>>
>> [...]
>>
>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 668 160 316
>> 0 0 100
>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 671 130 268
>> 0 0 100
>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 3 3 1 1 738 250 326
>> 0 0 100
>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 690 145 284
>> 0 0 100
>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 651 161 285
>> 0 0 100
>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 687 284 316
>> 0 0 100
>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 634 146 268
>> 0 0 100
>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 644 150 271
>> 0 0 100
>> 0 0 0 14546284 7391632 143 288 0 0 0 0 0 0 0 0 0 692 910 342
>> 0 0 100
>> 0 0 0 14459332 7312120 228 3332 0 0 0 0 0 0 0 0 0 884 2050 293
>> 5 4 91 [1]
>> 0 0 0 14459332 7312116 0 1 0 0 0 0 0 0 0 0 0 865 204 286
>> 0 1 99
>> kthr memory page disk faults cpu
>> r b w swap free re mf pi po fr de sr cd cd m2 m3 in sy cs
>> us sy id
>> 0 0 0 14459332 7312116 0 0 0 0 0 0 0 0 0 0 0 889 232 306
>> 0 0 100
>> 0 0 0 14459332 7312116 0 0 0 0 0 0 0 1 1 0 0 938 326 353
>> 0 0 100
>> 0 0 0 14546272 7391616 0 1 0 0 0 0 0 0 0 0 0 53402 1389 1109
>> 4 3 93 [2]
>> 0 0 0 14546272 7391616 148 362 0 0 0 0 0 1 1 1 1 1691 3243 1915
>> 3 1 96
>> 0 0 0 14545648 7390776 0 0 0 0 0 0 0 0 0 0 0 635 97 264
>> 0 0 100
>> 0 0 0 14545648 7390776 1 3 0 0 0 0 0 0 0 0 0 687 210 340
>> 0 0 100
>> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 718 289 380
>> 0 0 100
>> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 676 224 322
>> 0 0 100
>> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 641 126 278
>> 0 0 100
>> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 642 127 263
>> 0 0 100
>> [...]
>>
>> The [1] is when I ran the dtrace and [2] is when I got the
>> message ("unresponsiveness").
>>
>> I have read the relevant topic:
>> http://www.opensolaris.org/jive/thread.jspa?messageID=15073&
>> and am aware that:
>> - enabling destructive actions (-w)
>> or
>> - tuning below parameters for deadmen
>> dtrace_deadman_user
>> dtrace_deadman_interval
>> dtrace_deadman_timeout
>> can be helpfull. I agree that all of them are useful while server is
>> really busy
>> but I wouldn't expect such behaviour on an idle server !
>>
>> Is there any way to solve the problem without the tweaks ? I would
>> like to get
>> more knowledge about a nature of the problem.
>>
>> Regards
>> przemol
>>
>>
>> ----------------------------------------------------------------------
>> smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret
>> http://link.interia.pl/f19d4 - najlepsze filmy w intermecie
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>>
>>
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
More information about the dtrace-discuss
mailing list