[zfs-discuss] ZFS still crashing after patch
Rustam
rustam at code.az
Mon May 5 04:45:32 PDT 2008
Hello Robert,
> Which would happen if you have problem with HW and you're getting
> wring checksums on both side of your mirrors. Maybe PS?
>
> Try memtest anyway or sunvts
Unfortunately, SunVTS doesn't run on non-Sun/OEM hardware. And memtest requires too much downtime which I cannot afford right now.
However, I've interesting observations and now I can reproduce crash. It seems that I've bad checksum(s) and ZFS crashes each time when it tries to read it. Below are two cases:
Case1: I've got a checksum error not striped over mirrors, this time it was checksum for a file and not <0x0>. I tried to read file twice. First try returned I/O error, second try caused panic. Here's the log:
core# zpool status -xv
pool: box5
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
box5 ONLINE 0 0 2
mirror ONLINE 0 0 0
c1d0 ONLINE 0 0 0
c2d0 ONLINE 0 0 0
mirror ONLINE 0 0 2
c2d1 ONLINE 0 0 4
c1d1 ONLINE 0 0 4
errors: Permanent errors have been detected in the following files:
box5:<0x0>
/u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
core# ll /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
-rw------- 1 user group 489 Apr 20 2006 /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
core# cat /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
cat: input error on /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file: I/O error
core# zpool status -xv
pool: box5
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
box5 ONLINE 0 0 4
mirror ONLINE 0 0 0
c1d0 ONLINE 0 0 0
c2d0 ONLINE 0 0 0
mirror ONLINE 0 0 4
c2d1 ONLINE 0 0 8
c1d1 ONLINE 0 0 8
errors: Permanent errors have been detected in the following files:
box5:<0x0>
/u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
core# cat /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
(Kernel Panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 addr=fffffe80882b7000)
...
(after system boot up)
core# rm /u02/domains/somedomain/0/1/5/data/sub1/sub2/1145543794.file
core# zpool status -xv
pool: box5
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
box5 ONLINE 0 0 0
mirror ONLINE 0 0 0
c1d0 ONLINE 0 0 0
c2d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2d1 ONLINE 0 0 0
c1d1 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
box5:<0x0>
box5:<0x4a049a>
core# mdb unix.17 vmcore.17
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp ufs ip hook neti sctp arp usba uhci fctl nca lofs zfs random nfs ipc sppp crypto ptm ]
> ::status
debugging crash dump vmcore.17 (64-bit) from core
operating system: 5.10 Generic_127128-11 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 addr=fffffe80882b7000
dump content: kernel pages only
> ::stack
fletcher_2_native+0x13()
zio_checksum_verify+0x27()
zio_next_stage+0x65()
zio_wait_for_children+0x49()
zio_wait_children_done+0x15()
zio_next_stage+0x65()
zio_vdev_io_assess+0x84()
zio_next_stage+0x65()
vdev_cache_read+0x14c()
vdev_disk_io_start+0x135()
vdev_io_start+0x12()
zio_vdev_io_start+0x7b()
zio_next_stage_async+0xae()
zio_nowait+9()
vdev_mirror_io_start+0xa9()
vdev_io_start+0x12()
zio_vdev_io_start+0x7b()
zio_next_stage_async+0xae()
zio_nowait+9()
vdev_mirror_io_start+0xa9()
zio_vdev_io_start+0x116()
zio_next_stage+0x65()
zio_ready+0xec()
zio_next_stage+0x65()
zio_wait_for_children+0x49()
zio_wait_children_ready+0x15()
zio_next_stage_async+0xae()
zio_nowait+9()
arc_read+0x414()
dbuf_read_impl+0x1a0()
dbuf_read+0x95()
dmu_buf_hold_array_by_dnode+0x217()
dmu_buf_hold_array+0x81()
dmu_read_uio+0x49()
zfs_read+0x13c()
fop_read+0x31()
read+0x188()
read32+0xe()
_sys_sysenter_post_swapgs+0x14b()
> ::msgbuf
MESSAGE
...
panic[cpu0]/thread=ffffffff8b98e240:
BAD TRAP: type=e (#pf Page fault) rp=fffffe8001112490 addr=fffffe80882b7000
cat:
#pf Page fault
Bad kernel fault at addr=0xfffffe80882b7000
pid=17993, pc=0xfffffffff1192923, sp=0xfffffe8001112588, eflags=0x10287
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse>
cr2: fffffe80882b7000 cr3: df609000 cr8: c
rdi: fffffe80882b7000 rsi: fffffe80884ae200 rdx: fffffe80011125c0
rcx: 30fceb4cd1a3ca3f r8: 93e4f7b6a3198113 r9: 73457f74abfdcdf5
rax: dc3bf3d311149f rbx: ffffffff8f68a700 rbp: fffffe8001112630
r10: ffffffff8ecb5c30 r11: 0 r12: fffffe80884ae200
r13: c0 r14: 200200 r15: fffffe80882ae000
fsb: ffffffff80000000 gsb: fffffffffbc24e40 ds: 43
es: 43 fs: 0 gs: 1c3
trp: e err: 0 rip: fffffffff1192923
cs: 28 rfl: 10287 rsp: fffffe8001112588
ss: 30
fffffe80011123a0 unix:real_mode_end+71e1 ()
fffffe8001112480 unix:trap+5e6 ()
fffffe8001112490 unix:cmntrap+140 ()
fffffe8001112630 zfs:zfsctl_ops_root+30d9707b ()
fffffe8001112650 zfs:zio_checksum_verify+27 ()
fffffe8001112660 zfs:zio_next_stage+65 ()
fffffe8001112690 zfs:zio_wait_for_children+49 ()
fffffe80011126a0 zfs:zio_wait_children_done+15 ()
fffffe80011126b0 zfs:zio_next_stage+65 ()
fffffe80011126f0 zfs:zio_vdev_io_assess+84 ()
fffffe8001112700 zfs:zio_next_stage+65 ()
fffffe80011127e0 zfs:vdev_cache_read+14c ()
fffffe8001112820 zfs:vdev_disk_io_start+135 ()
fffffe8001112830 zfs:vdev_io_start+12 ()
fffffe8001112860 zfs:zio_vdev_io_start+7b ()
fffffe8001112870 zfs:zfsctl_ops_root+30db75c6 ()
fffffe8001112880 zfs:zio_nowait+9 ()
fffffe80011128e0 zfs:vdev_mirror_io_start+a9 ()
fffffe80011128f0 zfs:vdev_io_start+12 ()
fffffe8001112920 zfs:zio_vdev_io_start+7b ()
fffffe8001112930 zfs:zfsctl_ops_root+30db75c6 ()
fffffe8001112940 zfs:zio_nowait+9 ()
fffffe80011129a0 zfs:vdev_mirror_io_start+a9 ()
fffffe80011129d0 zfs:zio_vdev_io_start+116 ()
fffffe80011129e0 zfs:zio_next_stage+65 ()
fffffe8001112a00 zfs:zio_ready+ec ()
fffffe8001112a10 zfs:zio_next_stage+65 ()
fffffe8001112a40 zfs:zio_wait_for_children+49 ()
fffffe8001112a50 zfs:zio_wait_children_ready+15 ()
fffffe8001112a60 zfs:zfsctl_ops_root+30db75c6 ()
fffffe8001112a70 zfs:zio_nowait+9 ()
fffffe8001112af0 zfs:arc_read+414 ()
fffffe8001112b70 zfs:dbuf_read_impl+1a0 ()
fffffe8001112bb0 zfs:zfsctl_ops_root+30d812dd ()
fffffe8001112c20 zfs:dmu_buf_hold_array_by_dnode+217 ()
fffffe8001112c70 zfs:dmu_buf_hold_array+81 ()
fffffe8001112cd0 zfs:zfsctl_ops_root+30d84641 ()
fffffe8001112d30 zfs:zfs_read+13c ()
fffffe8001112d80 genunix:fop_read+31 ()
fffffe8001112eb0 genunix:read+188 ()
fffffe8001112ec0 genunix:read32+e ()
fffffe8001112f10 unix:brand_sys_sysenter+1f2 ()
syncing file systems...
14
10
8
...
8
done (not all i/o completed)
dumping to /dev/dsk/c0d0s1, offset 215547904, content: kernel
> ^D
-----------------------------------------
Case2: At this point I've started to thinking how to reproduce such error. Obviously, I need to read all blocks to stumble upon bad checksum(s). So I decided to backup whole box5 pool using zfs send/receive to another pool I had in the system - box7. I've got very interesting results:
# zpool status -v
pool: box5
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
box5 ONLINE 0 0 0
mirror ONLINE 0 0 0
c1d0 ONLINE 0 0 0
c2d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2d1 ONLINE 0 0 0
c1d1 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
box5:<0x0>
pool: box7
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
box7 ONLINE 0 0 0
mirror ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
errors: No known data errors
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
box5 175G 282G 175G /u02
box7 4.26G 681G 4.26G /box7
# zfs snapshot box5 at backup1
# zfs send box5 at backup1 | zfs receive box7/backup1 &
[1] 11056
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
box5 175G 282G 175G /u02
box5 at backup1 2.07M - 175G -
box7 4.26G 681G 4.26G /box7
box7/backup1 1.50K 681G 1.50K /box7/backup1
(Kernel Panic. See panic message below)
...
(after system boot up)
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
box5 175G 282G 175G /u02
box5 at backup1 24.2M - 175G -
box7 6.35G 679G 6.35G /box7
# mdb unix.20 vmcore.20
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp ufs ip hook neti sctp arp usba uhci fctl nca lofs zfs random nfs ipc sppp crypto ptm ]
> ::status
debugging crash dump vmcore.20 (64-bit) from core
operating system: 5.10 Generic_127128-11 (i86pc)
panic message:
ZFS: bad checksum (read on <unknown> off 0: zio ffffffff8b504c80 [L0 DMU dnode] 4000L/e00P DVA[0]=<0:711cca000:e00> DVA[1]=<1:2535927000:e00> fletcher4 lzjb LE contiguous birth=10088934 fill=19 cksum=dd5d13d3d3:1b19d851501a2:2034a615f84743c:c3ddc5ca046ccb96): error 50
dump content: kernel pages only
> ::stack
vpanic()
0xfffffffff11b1af4()
zio_next_stage+0x65()
zio_wait_for_children+0x49()
zio_wait_children_done+0x15()
zio_next_stage+0x65()
zio_vdev_io_assess+0x84()
zio_next_stage+0x65()
vdev_mirror_io_done+0xc1()
zio_vdev_io_done+0x14()
taskq_thread+0xbc()
thread_start+8()
> ::msgbuf
MESSAGE
...
panic[cpu0]/thread=fffffe800081dc80:
ZFS: bad checksum (read on <unknown> off 0: zio ffffffff8b504c80 [L0 DMU dnode] 4000L/e00P DVA[0]=<0:711cca000:e00> DVA[1]=<1:2535927000:e00> fletcher4 lzjb LE contiguous birth=10088934 fill=19 cksum=dd5d13d3d3:1b19d851501a2:2034a615f84743c:c3ddc5ca046ccb96): error 50
fffffe800081dac0 zfs:zfsctl_ops_root+30db624c ()
fffffe800081dad0 zfs:zio_next_stage+65 ()
fffffe800081db00 zfs:zio_wait_for_children+49 ()
fffffe800081db10 zfs:zio_wait_children_done+15 ()
fffffe800081db20 zfs:zio_next_stage+65 ()
fffffe800081db60 zfs:zio_vdev_io_assess+84 ()
fffffe800081db70 zfs:zio_next_stage+65 ()
fffffe800081dbd0 zfs:vdev_mirror_io_done+c1 ()
fffffe800081dbe0 zfs:zio_vdev_io_done+14 ()
fffffe800081dc60 genunix:taskq_thread+bc ()
fffffe800081dc70 unix:thread_start+8 ()
syncing file systems...
5
3
1
...
1
done (not all i/o completed)
dumping to /dev/dsk/c0d0s1, offset 215547904, content: kernel
>
You can see checksum type fletcher4 (with lzjb compression?) in panic message. However, all of the ZFS file systems are fletcher2 w/o compression. So this is obviously corrupted checksum. I've run backup (send/receive) test described above 3 times. Every time I've got the same error and each time system panic approx. 45 minutes after start.
>From all these I can conclude that it happens when "zfs send" tries to read the same bad checksum. Bad checksum could appear in the result of software or hardware bug. However, it's obvious that zfs cannot properly handle and fix this checksum problem.
I tried also to run zdb, but it run out of the memory:
# zdb box5
version=4
name='box5'
state=0
txg=11057710
pool_guid=989471958079851180
vdev_tree
type='root'
id=0
guid=989471958079851180
children[0]
type='mirror'
id=0
guid=9879820675701757866
metaslab_array=16
metaslab_shift=31
ashift=9
asize=250045530112
children[0]
type='disk'
id=0
guid=6440359594760637663
path='/dev/dsk/c1d0s0'
devid='id1,cmdk at AST3250820AS=____________9QE0LYMD/a'
whole_disk=1
DTL=125
children[1]
type='disk'
id=1
guid=80751879044845160
path='/dev/dsk/c2d0s0'
devid='id1,cmdk at AST3250824AS=____________5ND3M6BQ/a'
whole_disk=1
DTL=23
children[1]
type='mirror'
id=1
guid=4781215615782453677
whole_disk=0
metaslab_array=13
metaslab_shift=31
ashift=9
asize=250045530112
children[0]
type='disk'
id=0
guid=4849048357332929360
path='/dev/dsk/c2d1s0'
devid='id1,cmdk at AST3250824AS=____________5ND3LQLB/a'
whole_disk=1
DTL=21
children[1]
type='disk'
id=1
guid=15140711491939156235
path='/dev/dsk/c1d1s0'
devid='id1,cmdk at AST3250820AS=____________3QE01C2H/a'
whole_disk=1
DTL=19
Uberblock
magic = 0000000000bab10c
version = 4
txg = 11061627
guid_sum = 5267891425222527909
timestamp = 1209925302 UTC = Sun May 4 23:21:42 2008
Dataset mos [META], ID 0, cr_txg 4, 176M, 201 objects
Dataset box5 at backup1 [ZPL], ID 199, cr_txg 11060758, 175G, 4997266 objects
Dataset box5 [ZPL], ID 5, cr_txg 4, 175G, 4998785 objects
Traversing all blocks to verify checksums and verify nothing leaked ...
out of memory -- generating core dump
Abort (core dumped)
So what I'm supposed to do now with all these 5 million objects? I cannot even backup them.
Regards,
Rustam.
This message posted from opensolaris.org
More information about the zfs-discuss
mailing list