[zfs-discuss] [on-discuss] Reliability at power failure?

Uwe Dippel udippel at gmail.com
Sun Apr 19 07:38:47 PDT 2009


Casper.Dik at Sun.COM wrote:
>> We are back at square one; or, at the subject line.
>> I did a zpool status -v, everything was hunky dory.
>> Next, a power failure, 2 hours later, and this is what zpool status -v 
>> thinks:
>>
>> zpool status -v
>>  pool: rpool
>> state: ONLINE
>> status: One or more devices has experienced an error resulting in data
>>    corruption.  Applications may be affected.
>> action: Restore the file in question if possible.  Otherwise restore the
>>    entire pool from backup.
>>   see: http://www.sun.com/msg/ZFS-8000-8A
>> scrub: none requested
>> config:
>>
>>    NAME        STATE     READ WRITE CKSUM
>>    rpool       ONLINE       0     0     0
>>      c1d0s0    ONLINE       0     0     0
>>
>> errors: Permanent errors have been detected in the following files:
>>
>>        //etc/svc/repository-boot-20090419_174236
>>
>> I know, the hord-core defenders of ZFS will repeat for the umpteenth 
>> time that I should be grateful that ZFS can NOTICE and inform about the 
>> problem.
>>     
>
> :-)
>
> The file is created on boot and I assume this was created directly after 
> the boot after the  power-failure.
>
> Am I correct in thinking that:
> 	the last boot happened on 2009/04/19_17:42:36
> 	the system hasn't reboot since that time
>   

Good guess, but wrong. Another two to go ...   :)
>   
>> Others might want to repeat that this is not supposed to happen in the 
>> first place.
>>     
>
> ZFS guarantees that does cannot happen, unless the hardware is bad.  Bad 
> means here "the hardware doesn't promise what ZFS believes the hardware 
> promises".
>
> But anything can cause this:
>
> 	hardware problems:
> 		- bad memory
> 		- bad disk
> 		- bad disk controller
> 		- bad power supply
> 		
> 	software problem
> 		- memory corruption through any odd driver
> 		- any part of the zfs stack
>
> My memory would still be a hardware problem.  I remember a particular case 
> where ZFS continuously found checksums; replacing the power supply fixed 
> that.
>   

Chances are. That Ubuntu as double boot here never finds anything wrong, 
crashes, etc.
And again, someone will inform me that this is the beauty of ZFS: That I 
know of the corruption.

After a scrub, what I see is:

 zpool status -v
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 0h48m with 1 errors on Sun Apr 19 19:09:26 
2009
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     1
      c1d0s0    ONLINE       0     0     2

errors: Permanent errors have been detected in the following files:

        <0xa6>:<0x4f002>

Which file to replace?

Serious, what would a normal user expected to do here? No, I don't have 
a backup of a file that has recently been created, true, at 17:42 on 
April 19th.
Reinstall? While everything was okay 12 hours ago, after some 30 crashes 
due to power-failures, that were - until recently - rectified with 
crashes at boot, Failsafe, reboot.
A system that has been going up and down without much hassle for 1.5 
years, both on OpenSolaris on UFS and Ubuntu?

(Let's not forget the thread started with my question "Why do I have to 
Failsafe so frequently after a power failure, to correct a corrupted 
bootarchive?")

Uwe





More information about the zfs-discuss mailing list