Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compression=lz4hc support #1900

Open
ZeroChaos- opened this issue Nov 25, 2013 · 27 comments
Open

compression=lz4hc support #1900

ZeroChaos- opened this issue Nov 25, 2013 · 27 comments
Labels
Type: Feature Feature request or new feature

Comments

@ZeroChaos-
Copy link

The title says it all. I have a lot more processing power than IO throughput and I believe lz4hc would be advantageous over lz4. Can support for compression=lz4hc be added?

@DeHackEd
Copy link
Contributor

If processing power is really that much higher, I'd suggest just using gzip instead. No one's done lz4hc, mainly because the compression speed is staggeringly worse than lz4 itself.

See http://wiki.illumos.org/display/illumos/LZ4+Compression -- The "LZ4 Performance" section links to a spreadsheet detailing compression statistics. Other than decompression speed, using gzip-6 (the default compression level) is superior to lz4hc. Is decompression speed still a big concern?

@ZeroChaos-
Copy link
Author

it seemed like it would be a good thing, but if it is a lot of effort for no expected return then feel free to close. If it is perceived a trivial task though, it may be worth it for the testing alone.

@aarcane
Copy link

aarcane commented Nov 26, 2013

This would be useful for a number of situations. Namely, the WORM cases.
Lz4hc does more work on compression to create a smaller, completely binary
compatible, compressed chunk that decompressed even faster than lz4.

This would be useful for a number of situations. The main ones being
virtual machine distribution, vm master images, and long term data
archival.
I envision setting compression=lz4hc, writing out the basic data of an
image, setting compression to lz4, then cloning to create a live
representation.

Because the on disk format is binary compatible, there's no reason to
differentiate between lz4 and lz4hc when storing data to disk and every
reason not to.

I think the WORM cases provide enough merit to undertake this feature,
assuming it's not difficult to accomplish.
On Nov 25, 2013 6:46 PM, "ZeroChaos" notifications@github.com wrote:

it seemed like it would be a good thing, but if it is a lot of effort for
no expected return then feel free to close. If it is perceived a trivial
task though, it may be worth it for the testing alone.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1900#issuecomment-29263450
.

@behlendorf
Copy link
Contributor

I've no objection to leaving this open as possible feature if someone would like to work on it.

@DeHackEd
Copy link
Contributor

DeHackEd commented Dec 3, 2013

I'll give it a spin if nobody else wants to.

First step is determining if I need a new feature flag or if I can manage without one.

@aarcane
Copy link

aarcane commented Dec 3, 2013

The only thing you need to change on disk is you need to somehow store
whether you're using lz4 or lz4hc, but that can be a user property which
will require no feature flag.
On Dec 3, 2013 1:50 PM, "DeHackEd" notifications@github.com wrote:

I'll give it a spin if nobody else wants to.

First step is determining if I need a new feature flag or if I can manage
without one.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1900#issuecomment-29755094
.

@behlendorf behlendorf removed this from the 0.7.0 milestone Oct 29, 2014
@Sachiru
Copy link

Sachiru commented Oct 1, 2015

I'd like to add another use case for lz4HC:

/, /bin, /boot and /lib

These are generally immutable and rarely updated, however an increase in compression ratio will always be welcome. Having rootFS and libraries/binaries on LZ4HC would be a godsend because of smaller transfers.

@vozhyk-
Copy link
Contributor

vozhyk- commented Oct 4, 2015

I've made a naive port of LZ4HC (looking at how lz4 was introduced and doing the same) in https://github.com/vozhyk-/zfs/tree/lz4hc. What is left to do is to clean up the code (which isn't fully adapted to the illumos || Linux coding style, and contains parts of the code in lz4.c, lz4.h), test it properly and update the manpages.

So far the compression seems to work correctly (diff not finding any differences between copies of a Linux tree compressed with lz4/lz4hc, and no problems with correctness observed otherwise).
Tests done with a 5317MiB (lz4-compressed size) game (The Walking Dead) (with echo 2 > drop_caches done between tests until ARC size dropped to 32M):

  • lz4hc-compressed copy takes up 4630MiB
  • making a copy of the lz4-compressed directory with lz4 takes ~12min, with lz4hc - ~13min
  • the game seems to start a bit faster when it's compressed with lz4hc (max. ~9s difference)

@vozhyk-
Copy link
Contributor

vozhyk- commented Oct 4, 2015

The current code doesn't add any feature flags, but adds a compression=lz4hc property. I'm not sure if this is good, since other ZFS implementations would probably get confused when seeing an unknown compression type. There doesn't seem to be a way to set compression=lz4hc, but have the disk format tell that the blocks can be decompressed with lz4 in current ZFS.
Also, LZ4HC supports specifying the compression level (1..16), but currently the default (9) is used.

@behlendorf
Copy link
Contributor

@vozhyk- thanks for working on this. Yes, we may need to add another feature flag to support this even through the existing lz4 implementation can technically decompress it. We want to be very careful about breaking the on-disk format in any way. Although I admit it would be nice to somehow cleanly avoid the need for another feature flag. Clever ideas welcome.

If you could finish any remaining cleanup and open a pull request we can try and get you some feedback.

@ilovezfs
Copy link
Contributor

ilovezfs commented Oct 5, 2015

It seems like the most important thing would be replicating the "gzip | gzip-N" semantics with "lz4 | lz4hc | lz4hc-N."

@DeHackEd
Copy link
Contributor

DeHackEd commented Oct 6, 2015

A dataset property like lz4mode=[lz4 | lz4hc-9 | lz4gc-15] might make sense in that regard. Users may want to choose what compression speeds are on a per-dataset basis.

A module parameter to control L2ARC compression strength is where I was thinking of going with it.

@ilovezfs
Copy link
Contributor

ilovezfs commented Oct 6, 2015

@DeHackEd So legal values involving lz4 would be
on (default is now lz4)
lz4
lz4hc (== lz4hc-9)
lz4hc-1, lz4hc-2, …, lz4hc-16

Any others?

@ilovezfs
Copy link
Contributor

ilovezfs commented Oct 6, 2015

@DeHackEd The CLI should match the CLI for gzip, not introduce a separate mode, unless the same change is also being made for gzip and we're introducing a separate gzip mode dataset property, too.

@DeHackEd
Copy link
Contributor

DeHackEd commented Oct 6, 2015

In that case you'll need a feature flag because the table of compression algorithms used internally will change.

(Not that my suggestion wasn't a dirty hack to begin with. Anything that avoids feature flags is going to be messy).

I'm stepping out.

@Bronek
Copy link

Bronek commented Oct 6, 2015

@DeHackEd @ilovezfs I was afraid of that, but there does not seem to be sane way out.

@ilovezfs
Copy link
Contributor

ilovezfs commented Oct 6, 2015

@DeHackEd As you said doing it as dataset property could work (assuming it's truly always on-disk compatible), but I'd hope that would be invisible to the user and that zfs create -o compression=lz4hc-13 foo/bar would do the right thing (set compression=lz4 and lz4mode=lz4hc-13), and that zfs get compression foo/bar would return lz4hc-13 not lz4.

@Bronek
Copy link

Bronek commented Oct 6, 2015

Here is an idea : introduce a new property compressionlevel and make it apply both to gzip and lz4 compression. Actual semantics and validity of compressionlevel will be determined by compression format in context of which it is set.

If compression=gzip-N is set, enforce that compressionlevel is either absent (older pools) or set to the same N (we might also agree on different reconciliation rule, e.g. smaller number wins). If compression=gzip compressionlevel=N is set, make it valid and equivalent to compression=gzip-N .

For lz4 , let the compressionlevel alone decide compression level and style: 0 is lz4 , anything greater than 0 is lz4hc (for example, you might set compression=lz4 compressionlevel=9 to get strong lz4hc compression ; or compression=lz4 compressionlevel=0 to get the default, fastest compression available to ZFS)

For compression formats which do not support choice levels (ie. anything that is not gzip or lz4), this property would be ignored, thus allowing for compression override in parts of filesystem without the necessity to update compressionlevel

The fundamental property of this model is that it decouples compression algorithm (called "level") from compression format , i.e. you can have data with compatible compression format (e.g. gzip, lz4 etc) but written by different algorithms (i.e. "levels", identified by number specific to format)

One downside of this proposal is that we will have to live, for some time, with duplication in gzip compression levels (which I explain at the start). In the face of this downside, perhaps it would make sense to deprecate gzip-N in some future version and at that time issue a new feature flag (i.e. allow import of a pool with "legacy" ``compression=gzip-Nset, but update tocompression=gzip compressionlevel=N` property when the pool is upgraded). It would be good to discuss this with other OpenZFS members, if the idea makes sense at all.

Basically, this allows us to postpone creating new feature flag for some time (giving more time for discussion), while allowing to enjoy lz4hc without it.

@ilovezfs
Copy link
Contributor

ilovezfs commented Oct 6, 2015

I think to most users this will feel like a regression:

old syntax:
zfs create -o compression=gzip-9 foo/bar

new syntax:
zfs create -o compression=gzip -o compressionlevel=9 foo/bar

Call me crazy.

@vozhyk-
Copy link
Contributor

vozhyk- commented Oct 6, 2015

That's what I've been thinking about too (avoiding the use of a feature flag).
It appears that properties with : in them can be set without adding any incompatibilities:

zfs set foo.bar:aoeu=htns hts/test

works.

@vozhyk-
Copy link
Contributor

vozhyk- commented Oct 6, 2015

By the way, now I've implemented compression level setting for lz4hc the way it's done for gzip.

@vozhyk-
Copy link
Contributor

vozhyk- commented Oct 6, 2015

A rough benchmark of different lz4hc compression levels and lz4: https://gist.github.com/vozhyk-/1dd8eb04613b641deab2 .

@Bronek
Copy link

Bronek commented Oct 7, 2015

If we are going to add compression=lz4hc and/or compression=lz4hc-N then new feature flag should be added as well, correct?

@vozhyk-
Copy link
Contributor

vozhyk- commented Oct 7, 2015

We could use something like a org.zfsonlinux:lz4_mode property instead, as @DeHackEd suggested.

Looks like this property can be accessed in zio_write_bp_init() and a different ZIO_COMPRESS_* value passed to zio_compress_data() according to its value (with something like another zio_compress_table).

Anyway, I'm going to clean up the code as much as possible and open a pull request (still without adding a feature flag or doing it this way).

@DeHackEd
Copy link
Contributor

DeHackEd commented Oct 8, 2015

Off-topic, please don't use the @-username notation if you aren't trying to actually get someone's attention with your post. That's what it's for, not for just making a link to a user's profile.

@vozhyk-
Copy link
Contributor

vozhyk- commented Oct 8, 2015

@DeHackEd
Sorry about that, will think before making an @-mention next time.

vozhyk- added a commit to vozhyk-/zfs that referenced this issue Oct 8, 2015
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Oct 10, 2015
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Oct 10, 2015
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Oct 31, 2015
Add "compression = lz4hc | lz4hc-{1..16}" property values.

Issue openzfs#1900
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Oct 31, 2015
Add "compression = lz4hc | lz4hc-{1..16}" property values.

Issue openzfs#1900
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Oct 31, 2015
Add "compression = lz4hc | lz4hc-{1..16}" property values.

Issue openzfs#1900
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Apr 30, 2016
Add "compression = lz4hc | lz4hc-{1..16}" property values.

Issue openzfs#1900
vozhyk- added a commit to vozhyk-/zfs that referenced this issue May 1, 2016
Add "compression = lz4hc | lz4hc-{1..16}" property values.

Issue openzfs#1900
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Aug 6, 2016
Add "compression = lz4hc | lz4hc-{1..16}" property values.

Issue openzfs#1900
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Aug 20, 2016
Add "compression = lz4hc | lz4hc-{1..16}" property values.

Issue openzfs#1900
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Oct 30, 2016
Add "compression = lz4hc | lz4hc-{1..16}" property values.

Issue openzfs#1900
vozhyk- added a commit to vozhyk-/zfs that referenced this issue Mar 1, 2017
Add "compression = lz4hc | lz4hc-{1..16}" property values.

Issue openzfs#1900
@rincebrain
Copy link
Contributor

Having looked at this, it could be done, but with the integration of zstd, I don't actually know of any use case in my own experiments or seen for anyone else where lz4hc-N or lz4-N would be a better outcome.

It could be done, and refresh the lz4 integrated version for compression while we're at it, but while the decompressor update is a nice improvement, the compression improvements seemed somewhat negligible in all my tests at lz4-0 with 1.9.3, in speed and outcome, and all the nonzero values are, to my recollection, worse than just using an existing zstd value, as a tradeoff.

If people still think this is a useful option and have an example use case where zstd works worse for them, maybe I'll go take another look, but while the work to add it wouldn't be that involved, I couldn't immediately find a use case where it worked better than the existing configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

9 participants