[zfs-auto-snapshot] Code review request for time-slider defect 8667
Niall Power
Niall.Power at Sun.COM
Thu Jul 2 01:29:06 PDT 2009
Hi,
I'd like to put back the following fix for defect 8667:
http://defect.opensolaris.org/bz/show_bug.cgi?id=8667
This fix attempts to substantially overhaul the zfs.py module and the
cleanup
manager code by:
- Caching results for later reuse, instead of calling zfs(1m) multiple
times unnecessarily
- Obtaining more data up front for commonly used properties in a large
batch instead of
invoking multiple zfs(1m) commands on each dataset
I've tested the changes for the last several days including leaving the
service running
on my machine and have not found any regressionss. Areas of test
coverage include:
- Running of the zfs.py __main__ module testing function which
excercises most of the
functionality in zfs.
- Running time-slider-setup, time-slider-delete and time-slider-cleanup
(in various scenarios)
to make sure everything still functioned as expected.
The goal of this fix is to reduce the number of spawned processes
created by time-slider,
particularly by time-slider-cleanup. I profiled the existing code by
identifying all popen()
calls made by the zfs.py module and kept a running total of the number
of popen() calls
by the various time-slider commands. The comparison between the old and
newly modifie
code is listed below. While not 100% clinically precise and reproducible
(particularily the
time-slider-cleanup code) I think the results give some positive
indications.
System: 9 filesystems, 2 volumes (dump & swap), 37 - 40 snapshots
Test: Startup of time-slider-delete
# of spawned processes by zfs.py OLD: 3 NEW:3
Test: Startup of time-slider-setup
# of spawned processes by zfs.py OLD:20 NEW:10
Test: running time-slider-cleanup with no zero sized snapshots deleted
capacity OK.
# of spawned processes by zfs.py OLD:78 NEW:15
Test: running time-slider-cleanup, 2 zero sized snapshots destroyed,
capacity OK.
# of spawned processes by zfs.py OLD:87 NEW:24
Test: running time-slider-cleanup, capacity exceeding warning level by
1.5% of pool size
# of spawned processes by zfs.py OLD:327 (8 snapshots destroyed)
NEW:70 (18 snapshots destroyed)
Test: running time-slider-cleanup, capacity exceeding warning level, but
with no available snapshots to destroy
# of spawned processes by zfs.py OLD:154 NEW:22
Test: running zfs.py test (invokes it's test function)
# of spawned processes by zfs.py OLD: 53 NEW:16
The webrev of the changes is available here:
http://cr.opensolaris.org/~niall/time-slider-8667/
Overview of code changes:
-------------------------------------------
cleanupmanager.py
-------------------------------
Removed several unnecessary calls to zpool.get_capacity() in
run_[warning/critical/emergency]_cleanup()
The call is an unecessary duplicate because the same check is performed
before in the calling function.
Elimanates up to 4 unnecessary spawned processes.
Removed "percentover" and "sizeover" variables whose values are never
used. Eliminates 2 spawned processes.
Minor code refactoring to correspond to changes to zfs.py API
deletegui.py
-------------------
Minor code refactoring to correspond to changes to zfs.py API
setupgui.py
------------------
Removed references to zfscontroller.py. Now uses zfs.py exclusively for
info about zfs filesystems.
Minor code refactoring to correspond to changes to zfs.py API
zfscontroller.py
-------------------------
Removed. Had minimal funcionality that really belonged to or already
provided by zfs.py instead anyway.
zfs.py
---------
New class definition "Datasets" which as a toplevel data store for all
snapshots, filesystems and volumes
on the system. All other classes (ReadWritableDataset, Filesystem,
Volume, Snapshot) with list/accesor
methods build their own private lists from this store instead of calling
the zfs(1m) command. The lists
get populated on demand on the first invocation of
list_filesystems/volumes/snapshots().
The snapshot list can be purged to trigger a rescan following the
deletion of a snapshot by Snapshot.destroy_snapshot().
The lists kept in "Datasets" are class wide and shared between all
instances to reduce memory footprint
(similar to "static" in C)
Zpool.get_health() is dropped and instead a persistent value is stored
in ZPool.health after initial invocation of
Zpool.__get_health() in the constructor.
Zpool.get_capacity() is now a bit smarter and obtains both "used" and
"available" values in a single zfs(1M)
invocation. This halves the number of spawned commands previously necessary.
Zpool.get_available_size(): eliminated call to Zpool.list_filesystems()
since the root filesystem matching the
pool name always exists.
Zpool.get_total_size(): Removed because it wasn't being used except by
the __main__ test function.
Zpool.list_filesystems(): Cache filesystems internally for later reuse.
Construct a regular expression filter
instead of piping through grep on the command line to filter results.
Zpool.list_volumes(): Same as for Zpool.list_filesystems()
Zpool.list_snapshots(): Same as for Zpool.list_filesystems()
ReadableDataset.__init__()
Simplified creation time property initialisation. Always assign it to
the optional "creation" parameter which
defaults to None.
Snapshot.destroy_snapshot()
Call zfs.Datasets.refresh_snapshots() which forces the cache to be
purged and rescanned on the next invocation
of Datasets.list_snapshots().
Snapshot.__str__()
Tidy up to fit inside 80 character line width boundary.
ReadWritableDataset.__init__(): Add a private store for snapshots
(.__snapshots) in constructor.
ReadWritableDataset.list_snapshots(): As for Zpool.list_snapshots(),
generate an internal cache derived
from Datasets.list_snapshots() and use a regular expression to filter
the results.
Filesystem.__init__(): Initialise with a pre-supplied mountpoint if
available. Avoids the need to spawn
a zfs(1M) call later if asked for later. Very commonly requested.
Volume.__init__() Was incorrectly calling it's grandparents constructor
instead of it's direct parent's
(ReadableDataset, should be ReadWritableDataset.__init__()
zfs.list_filesysystems/volumes/snapshots() are now methods of
zfs.Datasets class instead of functions.
Cheers,
Niall
More information about the zfs-auto-snapshot
mailing list