[indiana-discuss] /devices (devfs) access hanging in snv_98 - devfsadmd wedged
Nils Goroll
slink at schokola.de
Tue Nov 18 09:33:10 PST 2008
Hi,
back in August there was a thread on indiana-discuss and sysadmin-discuss named
"/dev/null disappeared and unable to recreate it" (
http://www.opensolaris.org/jive/thread.jspa?messageID=271702 ).
At the time, I gave some useless advise, but was already suspecting some issues
with the devfs pseudo filesystem.
I am currently working on an issue which somehow looks similar on an snv_98 quad
core Xeon machine:
* various zfs snapshot -r cron jobs were all waiting for devfsadm:
core 'core.10516' of 10516: zfs snapshot -r
rpool at replication-2008-11-18-15:56:03_GMT
feebf108 door (7, 80438c8, 0, 0, 0, 3)
feeac888 door_call (7, 80438c8) + 78
fe963a6a daemon_call (fe96b784, 804392c) + 17e
fe96379b devlink_create (fe96b784, fedb55e8, 200) + 4f
fe961c0d di_devlink_init_impl (fe96b784, fedb55e8, 1) + 35
fe961c7d di_devlink_init (fedb55e8, 1) + 25
fed5fb13 zvol_create_link_common (807a648, 8045390, 0) + 117
fed5e98c zfs_create_link_cb (807fa48, 8046818) + 7c
fed5cfa0 zfs_iter_filesystems (807fcc8, fed5e910, 8046818) + b0
fed5ebc3 zfs_snapshot (807a648, 8047ed0, 1, 8075f88) + 20f
08057587 zfs_do_snapshot (3, 8047e04) + d7
0805ae31 main (4, 8047e00, 8047e14) + 265
08053bbe _start (4, 8047ec0, 8047ec4, 8047ecd, 8047ed0, 0) + 7a
* syseventd didn't immediately respond to SIGTERM (but after some time)
* metaset -s <set> -r hanging (could not get a usable core file of this one)
* Access on /devices hanging
server:/dev# find . -ls
2 6 drwxr-xr-x 251 root sys 251 Nov 13 09:30 .
1011727635 2 drwxr-xr-x 4 root root 4 Nov 4 17:11 ./agp
1011727571 1 lrwxrwxrwx 1 root root 52 Nov 4 17:11
./agp/cpugart0 -> ../../devices/pci at 0,0/pci1022,1103 at 18,3:amd64_gart-0
1011727603 1 lrwxrwxrwx 1 root root 52 Nov 4 17:11
./agp/cpugart1 -> ../../devices/pci at 0,0/pci1022,1103 at 19,3:amd64_gart-1
^C
^Z
* Access on /dev OK (find /dev without ls is fine)
server:~# find /dev | head
/dev
/dev/agp
/dev/agp/cpugart0
/dev/agp/cpugart1
/dev/agpgart
/dev/allkmem
/dev/arp
/dev/bl
/dev/bnx
/dev/bnx0
Since I've started analyzing the issue, syseventd has eventually got restarted,
but devfsadmd looks wedged:
server:/var/tmp/zfs_snapshot_hang# ps -ef | grep devfs
root 46 1 0 Nov 13 ? 0:04 devfsadmd
root 13451 13443 0 18:23:09 pts/7 0:00 grep devfs
server:/var/tmp/zfs_snapshot_hang# ps -ef | grep syseve
root 10782 1 0 17:01:00 ? 0:00 /usr/lib/sysevent/syseventd
root 13454 13443 0 18:23:22 pts/7 0:00 grep syseve
server:/var/tmp/zfs_snapshot_hang# pstack 46
pstack: cannot examine 46: no such process
server:/var/tmp/zfs_snapshot_hang# pargs 46
pargs: cannot examine 46: no such process
I've written a live crash dump of the system plus cores of the relevant
processes, so I'll hopefully be able to investigate further.
At this point I would like to ask if
- this rings a bell anywhere (searched for devfs bugs, but didn't find anything
pointing in this direction)
- anyone can provide any other helpful pointers
Thank you, Nils
More information about the indiana-discuss
mailing list