[zfs-discuss] dedup question

Victor Latushkin Victor.Latushkin at Sun.COM
Mon Nov 2 09:07:36 PST 2009


Enda O'Connor wrote:
> it works at a pool wide level with the ability to exclude at a dataset 
> level, or the converse, if set to off at top level dataset can then set 
> lower level datasets to on, ie one can include and exclude depending on 
> the datasets contents.
> 
> so largefile will get deduped in the example below.

And you can use 'zdb -S' (which is a lot better now than it used to be 
before dedup) to see how much benefit is there (without even turning 
dedup on):

bash-3.2# zdb -S rpool
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
      1     625K    9.9G   7.90G   7.90G     625K    9.9G   7.90G   7.90G
      2     9.8K    184M    132M    132M    20.7K    386M    277M    277M
      4    1.21K   16.6M   10.8M   10.8M    5.71K   76.9M   48.6M   48.6M
      8      395    764K    745K    745K    3.75K   6.90M   6.69M   6.69M
     16      125   2.71M    888K    888K    2.60K   54.2M   17.9M   17.9M
     32       56   2.10M    750K    750K    2.33K   85.6M   29.8M   29.8M
     64        9   22.0K   22.0K   22.0K      778   2.04M   2.04M   2.04M
    128        4   6.00K   6.00K   6.00K      594    853K    853K    853K
    256        2      8K      8K      8K      711   2.78M   2.78M   2.78M
    512        2   4.50K   4.50K   4.50K    1.47K   3.52M   3.52M   3.52M
     8K        1    128K    128K    128K    15.9K   1.99G   1.99G   1.99G
    16K        2      8K      8K      8K    50.7K    203M    203M    203M
  Total     637K   10.1G   8.04G   8.04G     730K   12.7G   10.5G   10.5G

dedup = 1.30, compress = 1.22, copies = 1.00, dedup * compress / copies 
= 1.58

bash-3.2#

Be careful - can eat lots of RAM!


Many thanks to Jeff and all the team!

Regards,
Victor

> Enda
> 
> Breandan Dezendorf wrote:
>> Does dedup work at the pool level or the filesystem/dataset level?
>> For example, if I were to do this:
>>
>> bash-3.2$ mkfile 100m /tmp/largefile
>> bash-3.2$ zfs set dedup=off tank
>> bash-3.2$ zfs set dedup=on tank/dir1
>> bash-3.2$ zfs set dedup=on tank/dir2
>> bash-3.2$ zfs set dedup=on tank/dir3
>> bash-3.2$ cp /tmp/largefile /tank/dir1/largefile
>> bash-3.2$ cp /tmp/largefile /tank/dir2/largefile
>> bash-3.2$ cp /tmp/largefile /tank/dir3/largefile
>>
>> Would largefile get dedup'ed?  Would I need to set dedup on for the
>> pool, and then disable where it isn't wanted/needed?
>>
>> Also, will we need to move our data around (send/recv or whatever your
>> preferred method is) to take advantage of dedup?  I was hoping the
>> blockpointer rewrite code would allow an admin to simply turn on dedup
>> and let ZFS process the pool, eliminating excess redundancy as it
>> went.
>>
> 



More information about the zfs-discuss mailing list