[PC-BSD Dev] ZFS dedup
Radio młodych bandytów
radiomlodychbandytow at o2.pl
Tue Jul 23 10:44:59 PDT 2013
Thanks for the answer.
The performance issues apply only to the case when dedupe is on. When
it's off, already deduped data only improves performance be improving
When it comes to reduced redundancy, yes, it's lower. I wonder why
dedupditto values lower than 100 are prohibited....
Overall would it be good? I think yes, but I'm not sure. Is it worth
bothering? I think no, but again I'm not sure. That's why I asked for
numbers and haven't tested it myself.
On 23/07/2013 14:15, Kris Moore wrote:
> On 07/23/2013 07:16, Pavel Arefiev wrote:
>> В письме от 22 июля 2013 16:05:31 пользователь Kris Moore написал:
>>> On 07/22/2013 15:31, Radio młodych bandytów wrote:
>>>> On 22/07/2013 21:02, Kris Moore wrote:
>>>>> Plus everything I hear is to avoid dedup at all costs right now.
>>> From my understanding, dedup stores the matching blocks list entirely
>>> inside of memory. ZFS being rather memory hungry, its possible to exaust
>>> this memory pool *very* quickly. The result can be a panic, and losing
>>> your data on disk, since the memory tables are now lost :(
>>> For more details you'd have to ask some of the FreeNAS ZFS guys, but
>>> that was the gist of it. It was a serious enough problem that they
>>> recommended we not even offer the option, until a solution can be found.
>> Well, I've heard the same things that's why dedup may be a thing we should
>> avoid using right now. I'll try to make manual installation with dedup set on
>> root dataset and discover if there is any gain.
>> Dev mailing list
>> Dev at lists.pcbsd.org
> Here's what I got back from Xin Li, our iX resident ZFS expert:
> You were misinformed but you are right, ZFS dedup is not intended for
> users who use only hard drives and have tight budget (e.g. if they
> care about a few GBs of hard drive, then dedup is not for them) and
> that's the case for most desktop users nowadays. Dedup is useful for
> users who have plenty memory and SSDs.
> Dedup itself does not cause data loss nor cause panic, but it does
> increase cache requirement significantly, which means more memory
> consumption and without proper system configuration, that would mean
> more panic.
> Note that when the system do not have enough memory to load the whole
> DDT into memory, the recovery (e.g. after a panic reboot) could be
> very slow, or even not possible if the system do not have enough main
> memory. We did see unimportable ZFS pools at customer who refused to
> buy enough memory and turned on dedup themselves, which we end up
> leasing them additional memory to get their data back.
> And yes, DDT is stored with multiple copies. However, by its nature,
> by doing dedup you have less data redundancy.
> Dedup also increases the need of I/Os, for instance, if verify is
> requested, it would need additional read I/O. It do not necessarily
> help performance for rotating disks because doing dedup also reduces
> the likelyhood for file system to store related data (e.g. they belong
> to one file) together, this needs larger cache or more seeks,
> depending on the configuration.
More information about the Dev