[indiana-discuss] spontaneos shutdowns and log messages
Brian Ruthven - Sun UK
Brian.Ruthven at Sun.COM
Wed Apr 1 02:03:20 PDT 2009
Hi Harry,
The usual clue I look for is:
Mar 18 17:21:56 opensol genunix: [ID 672855 kern.notice] syncing file
systems...
Mar 18 17:21:57 opensol genunix: [ID 904073 kern.notice] done
Mar 18 17:22:50 opensol genunix: [ID 540533 kern.notice] ^MSunOS Release
5.11 Version snv_110 64-bit
This at usually means the kernel was instructed to quit, and did so
relatively cleanly (at least syncing filesystems before resetting the
host and rebooting). This could be due to halt, reboot, shutdown, init,
but also uadmin command (or uadmin(2) syscall into the kernel). It can
also be due to a panic.
Look at the lines immediately above it, and you will hopefully find
something like this:
Mar 18 17:21:50 opensol syslogd: going down on signal 15
Mar 18 17:21:51 opensol rpcbind: [ID 564983 daemon.error] rpcbind
terminating on signal.
Again, a clue that a clean shutdown was issued rather than panic or
power loss. Other clues include lines like:
Apr 1 09:49:35 lab662 reboot: [ID 330035 auth.crit] initiated by root
on /dev/console
This was the /usr/sbin/reboot command. A similar line is printed for
/usr/sbin/halt.
Also, the output from "last -10 reboot" will show roughly what time the
system went down (in case you didn't already know). This should be
accurate to within 60 seconds in the case of an unclean shutdown (e.g. a
crash, power loss, or reset button). Sadly, it cannot distinguish
between a clean or unclean shutdown at present.
This isn't an exhaustive list of things, but hopefully a starting point.
Other big clues are warnings in /var/adm/messages relating to
temperature, or the string "panic", usually accompanied by a line on
reboot along the lines of "reboot after panic...", and the generation of
a large vmcore.X file in /var/crash/<hostname>.
The time-slider messages are probably not relevant to this issue (but
there have been bugs against time-slider in the past...).
Hope that helps move you forward.
Brian
Harry Putnam wrote:
> setup:
> Athlon64 2.2ghz 3400+ - AK86-L Aopen mobo (Topped out at 3gb ram)
> 4 500gb IDE drives on IDE controllers
> 2 750gb SATA drives on PCI sata controller
> (adaptec 1205sa [Sil3112a chip])
> Currently: osol-2008.11 build 110
> ===== * ===== * ===== * =====
>
> First, this is not something that I can say is related to build 110.
> It was going on before I upgraded from 109.
>
> I'm experiencing spontaneos shutdowns and am not finding anything in
> the logs /var/adm/messages or /var/log/syslog that I recognize as
> being a clue to why.
>
> I can post an extract including the time frame of shutdown but to me
> it looks totally normal... (I'm not experienced in debugging though)
>
> Also I'm not really sure where to look for clues beyond
> /var/log/syslog and /var/adm/messages.
>
> I've got a hunch this may be about hdd overheating, but is only
> because I feel what seem to me to be abnormal heat when I touch
> drives. Especially 2 sata drives on an
>
> Adaptec 1205sa (Sil3112achip).
>
> It may be normal heat... I'm not sure... but I really have no idea
> what else might provoke a shutdown ... not really sure overheating of
> hdd would do that (force a shutdown).
>
> The biggest change I've made most recently was to upgrade the size of
> a mirrored 200gb pool to 750gb. Those drives are on the sata
> controller referenced above. But the 200gb had been running on that
> controller for some time.
>
> I also had to flash the bios of that controller during the upgrade, to
> make it recognize the new 750gb Sata II drives.
>
> I don't remember seeing a spontaneous shutdwon before making those
> changes.
>
> However, I am getting some errors from something to do with the
> timeslider mechanism. I see them on boot up from the `startd'
> service. Where
>
> svc:/application/time-slider:default
>
> is moved to maintenance by request of time-slider `frequent' and
> `hourly' services.
>
> Attempting to restart time-slider service results it being moved to
> `Maintenance mode' again. The `frequent' timeslider service is not
> finding a crontab according to that services log.
>
> That sounds like some kind of permissions problem and not something
> that would invoke a shutdown.
>
> I guess that might be related to the shutdowns though, so inlined the
> output of `svcs -vx' below:
>
> If that isn't it, where else should I look for clues, and are there
> other logs I should be examining?
>
> svcs -vx:
> svc:/system/filesystem/zfs/auto-snapshot:frequent (ZFS auto snap..)
> State: maintenance since Tue Mar 31 12:20:19 2009
>
> Reason: Maintenance requested by
> "svc:/system/filesystem/zfs/auto-snapshot:frequent"
>
> See: /var/svc/log/system-filesystem-zfs-auto-sn..:frequent.log
> See: http://sun.com/msg/SMF-8000-R4
> See: /var/svc/log/system-filesystem-zfs-auto-sn..:frequent.log
> Impact: 1 dependent service is not running:
> svc:/application/time-slider:default
>
> svc:/system/filesystem/zfs/auto-snapshot:hourly (ZFS auto sn..)
>
> State: maintenance since Tue Mar 31 12:20:17 2009
>
> Reason: Maintenance requested by
> "svc:/system/filesystem/zfs/auto-snapshot:hourly" See:
> /var/svc/log/system-filesystem-zfs-auto-snapshot:hourly.log see:
> http://sun.com/msg/SMF-8000-R4 See:
> /var/svc/log/system-filesystem-zfs-auto-snapshot:hourly.log
> 1 dependent service is not running:
> svc:/application/time-slider:default
>
> _______________________________________________
> indiana-discuss mailing list
> indiana-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/indiana-discuss
>
--
Brian Ruthven Sun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG
More information about the indiana-discuss
mailing list