[zfs-auto-snapshot] Code review request for time-slider defect 8667

Niall Power Niall.Power at Sun.COM
Thu Jul 2 01:29:06 PDT 2009


Hi,

I'd like to put back the following fix for defect 8667:
http://defect.opensolaris.org/bz/show_bug.cgi?id=8667

This fix attempts to substantially overhaul the zfs.py module and the 
cleanup
manager code by:

- Caching results for later reuse, instead of calling zfs(1m) multiple 
times unnecessarily
- Obtaining more data up front for commonly used properties in a large 
batch instead of
  invoking multiple zfs(1m) commands on each dataset

I've tested the changes for the last several days including leaving the 
service running
on my machine and have not found any regressionss. Areas of test 
coverage include:
 - Running of the zfs.py __main__ module testing function which 
excercises most of the
   functionality in zfs.

 - Running time-slider-setup, time-slider-delete and time-slider-cleanup 
(in various scenarios)
   to make sure everything still functioned as expected.

The goal of this fix is to reduce the number of spawned processes 
created by time-slider,
particularly by time-slider-cleanup. I profiled the existing code by 
identifying all popen()
calls made by the zfs.py module and kept a running total of the number 
of popen() calls
by the various time-slider commands. The comparison between the old and 
newly modifie
code is listed below. While not 100% clinically precise and reproducible 
(particularily the
time-slider-cleanup code) I think the results give some positive 
indications.


System: 9 filesystems, 2 volumes (dump & swap), 37 - 40 snapshots

Test: Startup of time-slider-delete
# of spawned processes by zfs.py            OLD: 3            NEW:3

Test: Startup of time-slider-setup
# of spawned processes by zfs.py            OLD:20           NEW:10

Test: running time-slider-cleanup with no zero sized snapshots deleted 
capacity OK.
# of spawned processes by zfs.py            OLD:78           NEW:15

Test: running time-slider-cleanup, 2 zero sized snapshots destroyed, 
capacity OK.
# of spawned processes by zfs.py             OLD:87          NEW:24

Test: running time-slider-cleanup, capacity exceeding warning level by 
1.5% of pool size
# of spawned processes by zfs.py             OLD:327 (8 snapshots destroyed)
                                                                   
NEW:70  (18 snapshots destroyed)

Test: running time-slider-cleanup, capacity exceeding warning level, but 
with no available snapshots to destroy
# of spawned processes by zfs.py             OLD:154          NEW:22


Test: running zfs.py test (invokes it's test function)
# of spawned processes by zfs.py             OLD: 53           NEW:16



The webrev of the changes is available here:

http://cr.opensolaris.org/~niall/time-slider-8667/


Overview of code changes:
-------------------------------------------

cleanupmanager.py
-------------------------------
Removed several unnecessary calls to zpool.get_capacity() in 
run_[warning/critical/emergency]_cleanup()
The call is an unecessary duplicate because the same check is performed 
before in the calling function.
Elimanates up to 4 unnecessary spawned processes.
Removed "percentover" and "sizeover" variables whose values are never 
used. Eliminates 2 spawned processes.
Minor code refactoring to correspond to changes to zfs.py API

deletegui.py
-------------------
Minor code refactoring to correspond to changes to zfs.py API

setupgui.py
------------------
Removed references to zfscontroller.py. Now uses zfs.py exclusively for 
info about zfs filesystems.
Minor code refactoring to correspond to changes to zfs.py API

zfscontroller.py
-------------------------
Removed. Had minimal funcionality that really belonged to or already 
provided by zfs.py instead anyway.

zfs.py
---------
New class definition "Datasets" which as a toplevel data store for all 
snapshots, filesystems and volumes
on the system. All other classes (ReadWritableDataset, Filesystem, 
Volume, Snapshot) with list/accesor
methods build their own private lists from this store instead of calling 
the zfs(1m) command. The lists
get populated on demand on the first invocation of 
list_filesystems/volumes/snapshots().
The snapshot list can be purged to trigger a rescan following the 
deletion of a snapshot by Snapshot.destroy_snapshot().
The lists kept in "Datasets" are class wide and shared between all 
instances to reduce memory footprint
(similar to "static" in C)

Zpool.get_health() is dropped and instead a persistent value is stored 
in ZPool.health after initial invocation of
Zpool.__get_health() in the constructor.

Zpool.get_capacity() is now a bit smarter and obtains both "used" and 
"available" values in a single zfs(1M)
invocation. This halves the number of spawned commands previously necessary.

Zpool.get_available_size(): eliminated call to Zpool.list_filesystems() 
since the root filesystem matching the
pool name always exists.

Zpool.get_total_size(): Removed because it wasn't being used except by 
the __main__ test function.

Zpool.list_filesystems(): Cache filesystems internally for later reuse. 
Construct a regular expression filter
instead of piping through grep on the command line to filter results.

Zpool.list_volumes(): Same as for Zpool.list_filesystems()

Zpool.list_snapshots(): Same as for Zpool.list_filesystems()

ReadableDataset.__init__()
 Simplified creation time property initialisation. Always assign it to 
the optional "creation" parameter which
 defaults to None.

Snapshot.destroy_snapshot()
 Call zfs.Datasets.refresh_snapshots()  which forces the cache to be 
purged and rescanned on the next invocation
 of Datasets.list_snapshots().

Snapshot.__str__()
Tidy up to fit inside 80 character line width boundary.

ReadWritableDataset.__init__(): Add a private store for snapshots 
(.__snapshots) in constructor.

ReadWritableDataset.list_snapshots(): As for Zpool.list_snapshots(), 
generate an internal cache derived
from Datasets.list_snapshots() and use a regular expression to filter 
the results.

Filesystem.__init__(): Initialise with a pre-supplied mountpoint if 
available. Avoids the need to spawn
a zfs(1M) call later if asked for later. Very commonly requested.

Volume.__init__() Was incorrectly calling it's grandparents constructor 
instead of it's direct parent's
  (ReadableDataset, should be ReadWritableDataset.__init__()

zfs.list_filesysystems/volumes/snapshots() are now methods of 
zfs.Datasets class instead of functions.

Cheers,
Niall






More information about the zfs-auto-snapshot mailing list