[PC-BSD Dev] ZFS dedup
kris at pcbsd.org
Tue Jul 23 05:15:59 PDT 2013
On 07/23/2013 07:16, Pavel Arefiev wrote:
> В письме от 22 июля 2013 16:05:31 пользователь Kris Moore написал:
>> On 07/22/2013 15:31, Radio młodych bandytów wrote:
>>> On 22/07/2013 21:02, Kris Moore wrote:
>>>> Plus everything I hear is to avoid dedup at all costs right now.
>> From my understanding, dedup stores the matching blocks list entirely
>> inside of memory. ZFS being rather memory hungry, its possible to exaust
>> this memory pool *very* quickly. The result can be a panic, and losing
>> your data on disk, since the memory tables are now lost :(
>> For more details you'd have to ask some of the FreeNAS ZFS guys, but
>> that was the gist of it. It was a serious enough problem that they
>> recommended we not even offer the option, until a solution can be found.
> Well, I've heard the same things that's why dedup may be a thing we should
> avoid using right now. I'll try to make manual installation with dedup set on
> root dataset and discover if there is any gain.
> Dev mailing list
> Dev at lists.pcbsd.org
Here's what I got back from Xin Li, our iX resident ZFS expert:
You were misinformed but you are right, ZFS dedup is not intended for
users who use only hard drives and have tight budget (e.g. if they
care about a few GBs of hard drive, then dedup is not for them) and
that's the case for most desktop users nowadays. Dedup is useful for
users who have plenty memory and SSDs.
Dedup itself does not cause data loss nor cause panic, but it does
increase cache requirement significantly, which means more memory
consumption and without proper system configuration, that would mean
Note that when the system do not have enough memory to load the whole
DDT into memory, the recovery (e.g. after a panic reboot) could be
very slow, or even not possible if the system do not have enough main
memory. We did see unimportable ZFS pools at customer who refused to
buy enough memory and turned on dedup themselves, which we end up
leasing them additional memory to get their data back.
And yes, DDT is stored with multiple copies. However, by its nature,
by doing dedup you have less data redundancy.
Dedup also increases the need of I/Os, for instance, if verify is
requested, it would need additional read I/O. It do not necessarily
help performance for rotating disks because doing dedup also reduces
the likelyhood for file system to store related data (e.g. they belong
to one file) together, this needs larger cache or more seeks,
depending on the configuration.
More information about the Dev