ZFS stalls on gzip-9 while writing incompressible files #7373

RubenKelevra · 2018-03-30T20:47:03Z

Hey guys,

I've just started to switch my notebooks second hard drive, (an old rusty spindle), from BTRFS to ZFS. It's mainly a media storage with a picture backup, some youtube videos etc. mainly not that great to compress, but there are also git repository backups of many larger projects, text files etc. so a compression of those would be nice...

I've used BTRFS before with compression on and had no issues, apart from some kernel panics over the last year which remounts it to read-only or make it very flunky. So I had to reboot from time to time just because BTRFS is still unstable and unreliable. I used ZFS, on the other hand, many times before from small to large servers, so I know this is rock solid.

System information

Type	Version/Name
Distribution Name	ArchLinux
Distribution Version	latest
Linux Kernel	4.15.13-1-ARCH
Architecture	x64 (Intel)
ZFS Version	0.7.7-1 (dkms)
SPL Version	0.7.7-1 (dkms)

Describe the problem you're observing

I just use a USB2-SATA connector and an 1TB spare Western Digital Black series to connect a third hard drive to the notebook, created a ZFS pool, set atime to off, checksum to edonr, recordsize to 1M, dnodesize to auto, sync to disabled and compression to gzip-9.

Then I started rsync, which started to stall from time to time after a while for several seconds (I use it with --progress), while three z_wr_iss-tasks are cooking my CPU.

Is ZFS really trying to compress 1MB blocks of MP4 data, and discard all the work if the block is larger than before the compression? I remember to have read that there's a smart decision if the compression effort is worth it, to skip incompressible blocks.

I also experiencing complete stalls of rsync, which can only be resolved by restarting rsync or waiting 2-3 minutes (no I/O on both devices and no apparently cpu usage by zfs or btrfs observable via atop ). But I have no idea if this is due to a btrfs bug or a zfs one. When I restart rsync it sometimes output an error message about a "broken pipe" and that a "write error" occurred, while just starting to immediately continue to sync fine.

Describe how to reproduce the problem

Copy MP4 video files onto a ZFS with compress=gzip-9 set, gives very high CPU-usages and spiky write-performance - which is sometimes high, and sometimes very slow.

Possible solution

Since we're talking about a file system, it's maybe worth considering to detect a file-type by magic and do a blacklist for mime-types not worth trying to be compressed - if detection of compressibility isn't that easy?

ahrens · 2018-03-30T20:59:33Z

@RubenKelevra Welcome!

I think @skiselkov did some work to make compression bail out early if the file looks incompressible. Maybe we can encourage him to open a PR :-)

Aside from that, there are a few other solutions:

create multiple filesystems - one for your videos that isn't compressed.
use lz4, not gzip-9
use ZSTD (with a moderate compression level), which is coming "soon"

Is ZFS really trying to compress ..., and discard all the work if the block is larger than before the compression?

Yes.

RubenKelevra · 2018-03-30T21:06:17Z

Thanks for the hints. I was wondering about a more "automatic" solution since I also store my work projects there, which contains a very mixed type of filetypes.

Here some numbers reported by rsync (with --progress):

 12.03M 100%   11.92MB/s
 14.91M 100%   10.29MB/s
 12.74M 100%    1.13MB/s 
 16.80M 100%   25.39MB/s 
339.13M 100%   16.98MB/s
 65.05M 100%   53.80MB/s
 63.94M 100%    4.79MB/s
100.96M 100%    6.95MB/s
 86.46M 100%   75.02MB/s 
 92.13M 100%    7.13MB/s

We're just talking about a folder of mp4s here...

DeHackEd · 2018-03-30T21:10:14Z

Actually the cutoff is ~87.5% of the original file size, but that's still a lot of wasted effort especially with gzip-9 set.

Lz4 is about as close as you're getting for "automatic". Besides being crazy fast by itself, it does specifically detect uncompressible content and avoid spending the CPU cycles analyzing it. But it does have a compression ratio of somewhere around gzip-1.

Go with the multiple filesystem method and set different compression settings on each.

RubenKelevra · 2018-03-30T21:11:48Z

Using lz4 might help by reducing the CPU usage, but I guess it's just masquerading the actual issue - that often files seems to be considered compressable but at the end aren't and much CPU power is used to determine this - is there a way to see how often the heuristic (if there is any) for this decision is "right"? :)

RubenKelevra · 2018-04-07T00:42:55Z

Thanks a lot @ahrens, I've created a Feature Request ticket, as a successor from this one, from the idea I've written at the end here.

You might want to head over to there:

#7403

behlendorf · 2018-04-07T02:05:27Z

Reopening, this is still a real issue. If someone has the time to tackle this I'd also love to see @skiselkov's proposed solution opened as a PR. Here's a link to his slides and talk:

http://open-zfs.org/w/images/4/4d/Compression-Saso_Kiselkov.pdf
http://www.youtube.com/watch?v=TZF92taa_us

rbrewer123 · 2018-04-16T12:23:33Z

Looking at the numbers in the slides (I didn't watch the video), in the case where high compression levels are desired (e.g. gzip 6-9), perhaps lz4 could be the heuristic function to determine if the data is compressible.

The algorithm would be to first attempt compression with lz4. If that meets a compression ratio threshold, go back and run gzip for the actual compression, which will likely compress much better. If instead the threshold isn't met by lz4, bail out. This could be sensible since lz4 is so much faster than gzip. It only adds 11% time (compared to gzip alone) in the compressible case (and gzip is already slow), but saves 90% time in the incompressible case.

This change might be very localized and could be independent of @skiselkov's adaptive proposal.

behlendorf · 2018-04-16T22:00:16Z

@rbrewer123 that's an interesting idea. You're right it should be possible to implement what you're suggesting as a relatively small change in zio_compress_data(). Leveraging the lz4 compressibility check for gzip 6-9 could potentially get us the majority of the benefits with very little additional complexity. If you're game, I'd love to see performance results from this approach.

RubenKelevra · 2018-04-17T07:31:27Z

Great idea @rbrewer123 this would even make much slower compression reasonable in the future

rbrewer123 · 2018-04-17T12:20:51Z

@behlendorf @RubenKelevra thanks for the encouragement. I'll work on getting a PR together to play around with. As this is my first time building a dev version of zfs, it may take me a week or two to get something set up. The tip on the function to look at really helps too.

rincebrain · 2018-04-17T12:31:48Z

https://reviews.csiden.org/r/266/ seems to have been discarded without an obvious reason ~2y ago, so unless someone is really interested, it's unlikely it'll suddenly end up upstream. So either @rbrewer123's idea or #5928 if it gets rebased against recent git are likely to be the ways forward here.

rbrewer123 · 2018-04-27T03:01:57Z

I added an initial PR at #7481. I haven't done any benchmarks yet but will post as I do them.

RubenKelevra · 2018-04-27T18:38:42Z

@rincebrain auto compression is actually a completely different idea. It just chooses, on-the-fly based on the system load, if there's enough CPU power left to do the compression and in which demanding way. To have a fast bailout for non-compressable content would still help a lot, even when auto compression will be merged in the future.

RubenKelevra · 2018-05-02T23:07:29Z

@kpande you seem to be confused, I responded to rincebrain's comment. :)

RubenKelevra · 2018-05-05T14:07:10Z

@kpande well, no. Sure the decision starts at this point aka if the device is very fast and there's no queue build up, it will never start even bothering with compression. But in most cases, the write-speed to a device is the bottleneck these days.

Then it measures the average latency introduces by compression, which depends on the current CPU load and with this metric it decides which compression level is adequate at this moment.

To get back to my point in the first place: The proposed change by @rbrewer123 does not interfere with auto compression - it's making it even better, since the delay for gzip compression is only introduced for compressible data, leaving the CPU more cycles to invest in actually compressible blocks, doing other stuff or idling and saving energy.

On the other hand: I still don't get your point if you want to use auto compression (if it's ever merged), don't use it.

RubenKelevra mentioned this issue Apr 6, 2018

Feature Request: MIME-type aware compression decision for whole files #7403

Closed

RubenKelevra closed this as completed Apr 6, 2018

behlendorf reopened this Apr 7, 2018

behlendorf added the Type: Feature Feature request or new feature label Apr 7, 2018

rbrewer123 mentioned this issue Apr 27, 2018

[WIP] gzip compression heuristic #7481

Closed

13 tasks

RubenKelevra mentioned this issue Aug 2, 2024

[Feature Request] Implement Media Type-Based Compression Exclusion #16407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZFS stalls on gzip-9 while writing incompressible files #7373

ZFS stalls on gzip-9 while writing incompressible files #7373

RubenKelevra commented Mar 30, 2018 •

edited

Loading

ahrens commented Mar 30, 2018

RubenKelevra commented Mar 30, 2018 •

edited

Loading

DeHackEd commented Mar 30, 2018

RubenKelevra commented Mar 30, 2018

RubenKelevra commented Apr 7, 2018

behlendorf commented Apr 7, 2018 •

edited

Loading

rbrewer123 commented Apr 16, 2018

behlendorf commented Apr 16, 2018

RubenKelevra commented Apr 17, 2018

rbrewer123 commented Apr 17, 2018

rincebrain commented Apr 17, 2018

rbrewer123 commented Apr 27, 2018

RubenKelevra commented Apr 27, 2018 •

edited

Loading

RubenKelevra commented May 2, 2018

RubenKelevra commented May 5, 2018 •

edited

Loading

ZFS stalls on gzip-9 while writing incompressible files #7373

ZFS stalls on gzip-9 while writing incompressible files #7373

Comments

RubenKelevra commented Mar 30, 2018 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Possible solution

ahrens commented Mar 30, 2018

RubenKelevra commented Mar 30, 2018 • edited Loading

DeHackEd commented Mar 30, 2018

RubenKelevra commented Mar 30, 2018

RubenKelevra commented Apr 7, 2018

behlendorf commented Apr 7, 2018 • edited Loading

rbrewer123 commented Apr 16, 2018

behlendorf commented Apr 16, 2018

RubenKelevra commented Apr 17, 2018

rbrewer123 commented Apr 17, 2018

rincebrain commented Apr 17, 2018

rbrewer123 commented Apr 27, 2018

RubenKelevra commented Apr 27, 2018 • edited Loading

RubenKelevra commented May 2, 2018

RubenKelevra commented May 5, 2018 • edited Loading

RubenKelevra commented Mar 30, 2018 •

edited

Loading

RubenKelevra commented Mar 30, 2018 •

edited

Loading

behlendorf commented Apr 7, 2018 •

edited

Loading

RubenKelevra commented Apr 27, 2018 •

edited

Loading

RubenKelevra commented May 5, 2018 •

edited

Loading