[zfs-discuss] ZFS Honesty after a power failure
Dennis Clarke
dclarke at blastwave.org
Tue Mar 24 10:37:44 PDT 2009
I'm happy to see that someone else brought up this topic. I had a nasty
long power failure last night that drained the APC/UPS batteries dry.[1]
:-(
I changed the subject line somewhat because I feel that the issue is one
of honesty as opposed to reliability.
I *feel* that ZFS is reliable out past six nines ( rho=0.999999 ) flawless
for two reasons; I have never seen it fail me and I have pounded it with
some fairly offensive abuse under terrible conditions[2], and secondly
because everyone in the computer industry is trying to
steal^H^H^H^H^Himplement it into their OS of choice. There must be a
reason for that.
However, I have repeatedly run into problems when I need to boot after a
power failure. I see vdevs being marked as FAULTED regardless if there are
actually any hard errors reported by the on disk SMART Firmware. I am able
to remove these FAULTed devices temporarily and then re-insert the same
disk again and then run fine for months. Until the next long power
failure.
This is where "honestly" becomes a question because I have to question the
severity of the FAULT when I know from past experience that the disk(s) in
question can be removed and then re-inserted and life is fine for months.
Were harddisk manufacturers involved in this error message logic? :-P
A power failure, a really nice long one, happened last night and again
when I boot up I see nasty error messages.
Here is *precisely* what I saw last night :
{3} ok boot -s
Resetting ...
Sun Fire 480R, No Keyboard
Copyright 2007 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.22.34, 16384 MB memory installed, Serial #53264354.
Ethernet address 0:3:ba:2c:bf:e2, Host ID: 832cbfe2.
Rebooting with command: boot -s
Boot device: /pci at 9,600000/SUNW,qlc at 2/fp at 0,0/disk at w21000004cfb6f0ff,0:a
File and args: -s
SunOS Release 5.10 Version Generic_138888-03 64-bit
Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Booting to milestone "milestone/single-user:default".
Hostname: jupiter
Requesting System Maintenance Mode
SINGLE USER MODE
Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode
Mar 24 01:28:04 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc. SunOS 5.10 Generic January 2005
#
/***************************************************/
/* the very first thing I check is zpool fibre0 */
/***************************************************/
# zpool status fibre0
pool: fibre0
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
fibre0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c2t17d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
spares
c2t22d0 AVAIL
errors: No known data errors
*************************************************
* everything looks fine, okay, thank you to ZFS *
* ... and then I try to boot to full init 3
*
*************************************************
# exit
svc.startd: Returning to milestone all.
Reading ZFS config: done.
Mounting ZFS filesystems: (1/51)
jupiter console l(51/51)
root
Password:
Last login: Sat Mar 7 19:39:00 on console
Sun Microsystems Inc. SunOS 5.10 Generic January 2005
# zpool status fibre0
pool: fibre0
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
fibre0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c2t17d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
spares
c2t22d0 AVAIL
errors: No known data errors
* everything STILL looks fine, and only seconds have passed.
* Then .. I get bombarded with SEVERITY: Major faults
#
SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Mar 24 01:29:00 GMT 2009
PLATFORM: SUNW,Sun-Fire-480R, CSN: -, HOSTNAME: jupiter
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 3780a2dd-7381-c053-e186-8112b463c2b7
DESC: The number of I/O errors associated with a ZFS device exceeded
acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD
for more information.
AUTO-RESPONSE: The device has been offlined and marked as faulted. An
attempt
will be made to activate a hot spare if available.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.
SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Mar 24 01:29:00 GMT 2009
PLATFORM: SUNW,Sun-Fire-480R, CSN: -, HOSTNAME: jupiter
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 146dad1d-f195-c2d6-c630-c1adcd58b288
DESC: The number of I/O errors associated with a ZFS device exceeded
acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD
for more information.
AUTO-RESPONSE: The device has been offlined and marked as faulted. An
attempt
will be made to activate a hot spare if available.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.
********************************************************
I know that I have been here before after a power failure
with similar messages. They were not entirely honest about
the SEVERITY of the device faults.
The faults are certainly not "Major faults"
*********************************************************
# zpool status fibre0
pool: fibre0
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in
a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scrub: resilver in progress for 0h0m, 0.02% done, 21h7m to go
config:
NAME STATE READ WRITE CKSUM
fibre0 DEGRADED 0 0 0
mirror ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
mirror DEGRADED 0 0 0
c5t1d0 ONLINE 0 0 0
spare DEGRADED 0 0 0
c2t17d0 FAULTED 0 0 0 too many errors
c2t22d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
spares
c2t22d0 INUSE currently in use
errors: No known data errors
# Mar 24 01:29:53 jupiter ntpdate[733]: no server suitable for
synchronization found
***********************************************************
* at this point I go look at my cisco routers and check my AC
* and get things booting, I also curse my new APC gear for not
* signalings a power failure ... but that is another story.
***********************************************************
So can I *trust* what I am seeing?
Do I really believe that I have a SEVERE fault in a disk? Last time I did
this ( last month actually ) there were two disks faulted. Today there is
just one.
As usual I will NOT order a new replacement disk.
I just let that ZPool sort itself out. It will take an hour or so to sync
up that hot spare.
The machine in question is a production Solaris 10 server :
# uname -a
SunOS jupiter 5.10 Generic_138888-03 sun4u sparc SUNW,Sun-Fire-480R # cat
/etc/release
Solaris 10 5/08 s10s_u5wos_10 SPARC
Copyright 2008 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 24 March 2008
The zpool in question looks like so :
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
fibre0 680G 536G 144G 78% DEGRADED -
z0 40.2G 103K 40.2G 0% ONLINE -
# zpool status fibre0
pool: fibre0
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in
a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scrub: resilver completed after 1h35m with 0 errors on Tue Mar 24
03:04:49 2009
config:
NAME STATE READ WRITE CKSUM
fibre0 DEGRADED 0 0 0
mirror ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
mirror DEGRADED 0 0 0
c5t1d0 ONLINE 0 0 0
spare DEGRADED 0 0 0
c2t17d0 FAULTED 0 0 0 too many errors
c2t22d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
spares
c2t22d0 INUSE currently in use
errors: No known data errors
Is there *really* a severe fault in that disk ?
# luxadm -v display 21000018625d599d
Displaying information for: 21000018625d599d
Searching directory /dev/es for links to enclosures
DEVICE PROPERTIES for disk: 21000018625d599d
Vendor: HPQ
Product ID: BD1465822C
Revision: HP04
Serial Num: 3KS36V5N000076218F5R
Unformatted capacity: 140014.406 MBytes
Write Cache: Enabled
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0xffff
Device Type: Disk device
Path(s):
/dev/rdsk/c2t17d0s2
/devices/pci at 8,600000/SUNW,qlc at 1/fp at 0,0/ssd at w21000018625d599d,0:c,raw
LUN path port WWN: 21000018625d599d
Host controller port WWN: 210000e08b08f1a1
Path status: O.K.
What does the SMART Firmware say ?
# /root/bin/smartctl -a /dev/rdsk/c2t17d0s0
smartctl version 5.33 [sparc-sun-solaris2.8] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device: HPQ BD1465822C Version: HP04
Serial number: 3KS36V5N000076218F5R
Device type: disk
Transport protocol: IEEE 1394 (SBP-2)
Local Time is: Tue Mar 24 14:09:07 2009 GMT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature: 33 C
Drive Trip Temperature: 68 C
Vendor (Seagate) cache information
Blocks sent to initiator = 615507364
Blocks received from initiator = 3004562974
Blocks read from cache and sent to initiator = 94569699
Number of read and write commands whose size <= segment size = 185763910
Number of read and write commands whose size > segment size = 0
Error counter log:
Errors Corrected by Total Correction Gigabytes
Total
EEC rereads/ errors algorithm processed
uncorrected
fast | delayed rewrites corrected invocations [10^9
bytes] errors
read: 8952309 0 0 8952309 8952309 999.277
0
write: 0 0 0 0 12 1328.105
0
verify: 934290 0 0 934290 934290 146.816
0
Non-medium error count: 1
Error Events logging not supported
SMART Self-test log
Num Test Status segment LifeTime
LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 31
- [- - -]
It is hard to see but the total uncorrected errors is zero.
***********************************************
* So let's just correct the "SEVERE" fault.
***********************************************
# zpool detach fibre0 c2t17d0
# zpool detach fibre0 c2t22d0
# zpool status fibre0
pool: fibre0
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: resilver completed after 1h35m with 0 errors on Tue Mar 24
03:04:49 2009
config:
NAME STATE READ WRITE CKSUM
fibre0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
errors: No known data errors
# zpool attach fibre0 c5t1d0 c2t17d0
# zpool add fibre0 spare c2t22d0
# zpool status fibre0
pool: fibre0
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h2m, 2.86% done, 1h18m to go
config:
NAME STATE READ WRITE CKSUM
fibre0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c2t17d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
spares
c2t22d0 AVAIL
errors: No known data errors
#
I have also learned that you can not trust that silver progress report
either. It will not take 1h18m to complete. If I wait 20 minutes I'll get
*nearly* the same estimate. The process must not be deterministic in
nature.
# zpool status
pool: fibre0
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h39m, 34.24% done, 1h15m to go
config:
NAME STATE READ WRITE CKSUM
fibre0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c2t17d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
spares
c2t22d0 AVAIL
errors: No known data errors
pool: z0
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
z0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c1t0d0s7 ONLINE 0 0 0
c1t1d0s7 ONLINE 0 0 0
errors: No known data errors
# fmadm faulty -afg
#
I do TOTALLY trust that last line that says "No known data errors" which
makes me wonder if the Severe FAULTs are for unknown data errors :-)
--
Dennis Clarke
sig du jour : "An appeaser is one who feeds a crocodile, hoping it will
eat him last.", Winston Churchill
[1] I really want to know where PowerChute for Solaris went to.
[2] I would create a ZPool of striped mirrors based on multiple USB keys
and on disks on IDE/SATA with or without compression and with
copies={1|2|3} and while running a ON compile I'd pull the USB keys out
and yank the power on the IDE/SATA or fibre disks. ZFS would not throw a
fatal error nor drop a bit of data. Performance suffered but data did not.
More information about the zfs-discuss
mailing list