-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compression=lz4hc support #1900
Comments
If processing power is really that much higher, I'd suggest just using gzip instead. No one's done lz4hc, mainly because the compression speed is staggeringly worse than lz4 itself. See http://wiki.illumos.org/display/illumos/LZ4+Compression -- The "LZ4 Performance" section links to a spreadsheet detailing compression statistics. Other than decompression speed, using gzip-6 (the default compression level) is superior to lz4hc. Is decompression speed still a big concern? |
it seemed like it would be a good thing, but if it is a lot of effort for no expected return then feel free to close. If it is perceived a trivial task though, it may be worth it for the testing alone. |
This would be useful for a number of situations. Namely, the WORM cases. This would be useful for a number of situations. The main ones being Because the on disk format is binary compatible, there's no reason to I think the WORM cases provide enough merit to undertake this feature,
|
I've no objection to leaving this open as possible feature if someone would like to work on it. |
I'll give it a spin if nobody else wants to. First step is determining if I need a new feature flag or if I can manage without one. |
The only thing you need to change on disk is you need to somehow store
|
I'd like to add another use case for lz4HC: /, /bin, /boot and /lib These are generally immutable and rarely updated, however an increase in compression ratio will always be welcome. Having rootFS and libraries/binaries on LZ4HC would be a godsend because of smaller transfers. |
I've made a naive port of LZ4HC (looking at how lz4 was introduced and doing the same) in https://github.com/vozhyk-/zfs/tree/lz4hc. What is left to do is to clean up the code (which isn't fully adapted to the illumos || Linux coding style, and contains parts of the code in So far the compression seems to work correctly (diff not finding any differences between copies of a Linux tree compressed with lz4/lz4hc, and no problems with correctness observed otherwise).
|
The current code doesn't add any feature flags, but adds a |
@vozhyk- thanks for working on this. Yes, we may need to add another feature flag to support this even through the existing lz4 implementation can technically decompress it. We want to be very careful about breaking the on-disk format in any way. Although I admit it would be nice to somehow cleanly avoid the need for another feature flag. Clever ideas welcome. If you could finish any remaining cleanup and open a pull request we can try and get you some feedback. |
It seems like the most important thing would be replicating the "gzip | gzip-N" semantics with "lz4 | lz4hc | lz4hc-N." |
A dataset property like A module parameter to control L2ARC compression strength is where I was thinking of going with it. |
@DeHackEd So legal values involving lz4 would be Any others? |
@DeHackEd The CLI should match the CLI for gzip, not introduce a separate mode, unless the same change is also being made for gzip and we're introducing a separate gzip mode dataset property, too. |
In that case you'll need a feature flag because the table of compression algorithms used internally will change. (Not that my suggestion wasn't a dirty hack to begin with. Anything that avoids feature flags is going to be messy). I'm stepping out. |
@DeHackEd As you said doing it as dataset property could work (assuming it's truly always on-disk compatible), but I'd hope that would be invisible to the user and that zfs create -o compression=lz4hc-13 foo/bar would do the right thing (set compression=lz4 and lz4mode=lz4hc-13), and that zfs get compression foo/bar would return lz4hc-13 not lz4. |
Here is an idea : introduce a new property If For lz4 , let the For compression formats which do not support choice levels (ie. anything that is not gzip or lz4), this property would be ignored, thus allowing for compression override in parts of filesystem without the necessity to update The fundamental property of this model is that it decouples compression algorithm (called "level") from compression format , i.e. you can have data with compatible compression format (e.g. gzip, lz4 etc) but written by different algorithms (i.e. "levels", identified by number specific to format) One downside of this proposal is that we will have to live, for some time, with duplication in gzip compression levels (which I explain at the start). In the face of this downside, perhaps it would make sense to deprecate Basically, this allows us to postpone creating new feature flag for some time (giving more time for discussion), while allowing to enjoy lz4hc without it. |
I think to most users this will feel like a regression: old syntax: new syntax: Call me crazy. |
That's what I've been thinking about too (avoiding the use of a feature flag).
works. |
By the way, now I've implemented compression level setting for |
A rough benchmark of different lz4hc compression levels and lz4: https://gist.github.com/vozhyk-/1dd8eb04613b641deab2 . |
If we are going to add |
We could use something like a Looks like this property can be accessed in Anyway, I'm going to clean up the code as much as possible and open a pull request (still without adding a feature flag or doing it this way). |
Off-topic, please don't use the @-username notation if you aren't trying to actually get someone's attention with your post. That's what it's for, not for just making a link to a user's profile. |
@DeHackEd |
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Having looked at this, it could be done, but with the integration of zstd, I don't actually know of any use case in my own experiments or seen for anyone else where lz4hc-N or lz4-N would be a better outcome. It could be done, and refresh the lz4 integrated version for compression while we're at it, but while the decompressor update is a nice improvement, the compression improvements seemed somewhat negligible in all my tests at lz4-0 with 1.9.3, in speed and outcome, and all the nonzero values are, to my recollection, worse than just using an existing zstd value, as a tradeoff. If people still think this is a useful option and have an example use case where zstd works worse for them, maybe I'll go take another look, but while the work to add it wouldn't be that involved, I couldn't immediately find a use case where it worked better than the existing configuration. |
The title says it all. I have a lot more processing power than IO throughput and I believe lz4hc would be advantageous over lz4. Can support for compression=lz4hc be added?
The text was updated successfully, but these errors were encountered: