[indiana-discuss] /devices (devfs) access hanging in snv_98 - devfsadmd wedged

Nils Goroll slink at schokola.de
Tue Nov 18 09:33:10 PST 2008


Hi,

back in August there was a thread on indiana-discuss and sysadmin-discuss named 
"/dev/null disappeared and unable to recreate it" ( 
http://www.opensolaris.org/jive/thread.jspa?messageID=271702 ).

At the time, I gave some useless advise, but was already suspecting some issues 
with the devfs pseudo filesystem.

I am currently working on an issue which somehow looks similar on an snv_98 quad 
core Xeon machine:

* various zfs snapshot -r cron jobs were all waiting for devfsadm:

core 'core.10516' of 10516:     zfs snapshot -r 
rpool at replication-2008-11-18-15:56:03_GMT
  feebf108 door     (7, 80438c8, 0, 0, 0, 3)
  feeac888 door_call (7, 80438c8) + 78
  fe963a6a daemon_call (fe96b784, 804392c) + 17e
  fe96379b devlink_create (fe96b784, fedb55e8, 200) + 4f
  fe961c0d di_devlink_init_impl (fe96b784, fedb55e8, 1) + 35
  fe961c7d di_devlink_init (fedb55e8, 1) + 25
  fed5fb13 zvol_create_link_common (807a648, 8045390, 0) + 117
  fed5e98c zfs_create_link_cb (807fa48, 8046818) + 7c
  fed5cfa0 zfs_iter_filesystems (807fcc8, fed5e910, 8046818) + b0
  fed5ebc3 zfs_snapshot (807a648, 8047ed0, 1, 8075f88) + 20f
  08057587 zfs_do_snapshot (3, 8047e04) + d7
  0805ae31 main     (4, 8047e00, 8047e14) + 265
  08053bbe _start   (4, 8047ec0, 8047ec4, 8047ecd, 8047ed0, 0) + 7a

* syseventd didn't immediately respond to SIGTERM (but after some time)

* metaset -s <set> -r hanging (could not get a usable core file of this one)

* Access on /devices hanging

server:/dev# find . -ls
     2    6 drwxr-xr-x  251 root     sys           251 Nov 13 09:30 .
1011727635    2 drwxr-xr-x   4 root     root            4 Nov  4 17:11 ./agp
1011727571    1 lrwxrwxrwx   1 root     root           52 Nov  4 17:11 
./agp/cpugart0 -> ../../devices/pci at 0,0/pci1022,1103 at 18,3:amd64_gart-0
1011727603    1 lrwxrwxrwx   1 root     root           52 Nov  4 17:11 
./agp/cpugart1 -> ../../devices/pci at 0,0/pci1022,1103 at 19,3:amd64_gart-1


^C

^Z

* Access on /dev OK (find /dev without ls is fine)

server:~# find /dev | head
/dev
/dev/agp
/dev/agp/cpugart0
/dev/agp/cpugart1
/dev/agpgart
/dev/allkmem
/dev/arp
/dev/bl
/dev/bnx
/dev/bnx0


Since I've started analyzing the issue, syseventd has eventually got restarted, 
but devfsadmd looks wedged:

server:/var/tmp/zfs_snapshot_hang# ps -ef | grep devfs
     root    46     1   0   Nov 13 ?           0:04 devfsadmd
     root 13451 13443   0 18:23:09 pts/7       0:00 grep devfs
server:/var/tmp/zfs_snapshot_hang# ps -ef | grep syseve
     root 10782     1   0 17:01:00 ?           0:00 /usr/lib/sysevent/syseventd
     root 13454 13443   0 18:23:22 pts/7       0:00 grep syseve
server:/var/tmp/zfs_snapshot_hang# pstack 46
pstack: cannot examine 46: no such process
server:/var/tmp/zfs_snapshot_hang# pargs 46
pargs: cannot examine 46: no such process

I've written a live crash dump of the system plus cores of the relevant 
processes, so I'll hopefully be able to investigate further.

At this point I would like to ask if

- this rings a bell anywhere (searched for devfs bugs, but didn't find anything 
pointing in this direction)
- anyone can provide any other helpful pointers

Thank you, Nils



More information about the indiana-discuss mailing list