-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lz4hc compression type #3908
Conversation
I've written a hack that allows to use LZ4HC while retaining full compatibility with existing implementations (lz4hc-hack branch).
|
include/sys/zio.h
Outdated
@@ -114,6 +115,22 @@ enum zio_compress { | |||
ZIO_COMPRESS_GZIP_9, | |||
ZIO_COMPRESS_ZLE, | |||
ZIO_COMPRESS_LZ4, | |||
ZIO_COMPRESS_LZ4HC_1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this is not the best way to do it (GZIP_# is also not great). I assume that (like gzip), any lz4hc_# can be decompressed by one algorithm. Adding all these values eats into the limited namespace of BP_GET_COMPRESS() (128 values). We should differentiate between the values that are set on the "compression" property (which could be lz4hc-2 or lz3hc-3, etc) and the values that are stored in the BP (where we would add at most one value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this would be great, and would provide a generic way (instead of the hack that works only for LZ4HC) to store e.g. 'LZ4' instead of 'LZ4HC-n' value in the BP.
Is having many enum values like this still good if they are not stored in the BP, but used for the property values?
@gerty3000 do you want to take a look at this, and/or counter-propose your hackathon results? (see also my existing comments) |
The lz4hc-hack branch seems correct, although as you said a little hacky. If it's valuable to have full backwards compatibility, I think something along those lines would be the way to go. An alternative that might be a cleaner implementation would be to only have readonly backwards compatibility. Then you could add the new "compression" property values, while not changing the values in the BP. Essentially you would replace the single "enum zio_compress" with 2 enums: one for values in the BP and one for values of the property. Then you just have to find the right place to translate from the prop val to the BP val. zio_write_bp_init (where you're essentially doing it now in the lz4hc-hack branch) seems reasonable. |
@ahrens |
b6b2885
to
1288561
Compare
Updated; now the compression/decompression enums/tables are separate, and --- BEGIN low-level code comments I also could try to retain a single
There is one issue with the current first commit - it seems to me that L2ARC writes the compression value to disk, but I haven't found the place in code where it happens / this can be changed. It shouldn't currently be a problem, since Now only the feature flag needs to be added. |
@vozhyk- wrote:
That's because it doesn't happen :-) L2ARC stores the compression function in memory only (remember, L2ARC is not persistent). See |
@ahrens |
909bcf7
to
98fca30
Compare
My lz4hc branch now has the LZ4HC_COMPRESS feature flag, which is now |
Any updates here since last year? I would really like to see it in 0.7. |
None, except for some in https://github.com/vozhyk-/zfs/commits/lz4hc . I'll try to find some time to finish this soon then. |
@dankimmel Care to review this? I think you implemented something similar at a hackathon. |
Do I understand correctly that the compression values in dnodes can only be <LZ4 (so when some data on a dataset has been compressed with |
Rebased on master, added an |
Updated - added a dependency on |
@sempervictus I'll look at it when I get the time - probably in 2 weeks (but it can be in 1-2 more weeks). I don't know much about ABD, so if it really is non-trivial, I won't be able to do much right away. It'll be interesting to learn about it and adapt my changes though. |
@vozhyk-: thank you much |
@vozhyk Can i ask the status? Did you find some time working on it? What's about Dan Kimmel's prototype? Much thanks. |
@jumbi77 So far I haven't. I also haven't asked @dankimmel about his prototype yet. These are the things that remain to be done:
|
0bfb258
to
260b0a3
Compare
A small update - added the new feature flag to A few buildbots still fail, and I don't know the reasons for this. This doesn't seem related to this pull request though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next time you rebase can you also extend the existing compression test cases so they cover this.
lz4 is now available in version 1.7.3 (https://github.com/lz4/lz4/releases). EDIT: |
@vozhyk-: with ABD merged in, could you please rebase the patch against current master? Thank you |
Store decompression values in BPs instead of compression values. Change zio_compress_table Add zio_decompress_table Add enum bp_compress
Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900
Require it being enabled for compression=lz4hc* . Activate LZ4HC feature if needed when syncing objset. Add LZ4HC feature to test config.
Add zio_compress_info_t.ci_feature
@sempervictus Rebased. It compiles, but I haven't tested it at all. I also have to look into how it interacts with compressed ARC. It seems I haven't introduced any regressions in compressed ARC, but I still have to verify it. I've done no changes to the code or the tests besides rebasing. |
I can't think of any reason compressed ARC would break as a result of adding a new algorithm, however you may need to add a new ZFS send feature flag for this compression algorithm if you want to use compressed send / receive with data in this format. |
I can't think of any reason compressed ARC would break as a result of
adding a new algorithm
Here I also make compression (the `compression` option value) and
decompression (stored in BPs) values separate enums. This could be a
problem if the compressed ARC changes made L2ARC start storing the
compression algorithm in the BP instead of keeping it in the ARC header in
memory. From what I've seen, this is not the case, so it shouldn't be a
problem.
however you may need to add a new ZFS send feature flag for this
compression algorithm if you want to use compressed send / receive with
data in this format.
Thanks, I'll look into this.
The data can be decompressed with plain LZ4. The only difference between a
dataset compressed with LZ4 and one compressed with LZ4HC is the
`compression` option value. The rest of the data on disk says it's in LZ4.
On Thu, Mar 2, 2017 at 8:31 AM Dan Kimmel <notifications@github.com> wrote:
I can't think of any reason compressed ARC would break as a result of
adding a new algorithm, however you may need to add a new ZFS send feature
flag for this compression algorithm if you want to use compressed send /
receive with data in this format.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3908 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC6QvG9drLvNoHTUFtt-6p2BQRWWdmvgks5rhnBfgaJpZM4GMjVE>
.
|
@vozhyk- @dankimmel @grwilson I think that separating the BP compression enum (which specifies the decompression function) from the compression property enum (which specifies the compression function) breaks @vozhyk- At a minimum you should change the ARC code to reflect the fact that it is storing the |
@vozhyk- The best solution I can come to for the problem (uncompressed ARC + L2ARC + separate compression / decompression enums) is to keep the checksum of the uncompressed data in memory when the data is in L2ARC. This is similar to how it used to be before compressed ARC, the downside is that it uses more memory to manage the L2ARC. When the data is compressed on disk, uncompressed in the ARC, and stored in the L2ARC, we allocate a Sorry the design I suggested (of separate compression/decompression enums) turns out to have some complications. Are you interested in implementing the scheme I described above? |
For the moment, I'm going to close this PR out as stale though finalizing this functionality would still be welcome! @vozhyk- if you find the time and interest to revisit this please go ahead and open a new PR. |
Adds LZ4HC compression, enabled with a
compression=lz4hc | lz4hc-{1..16}
property.Still doesn't add any feature flags and doesn't update the man-pages, as it hasn't been decided on the way it should be done. Currently 2 possibilities have been suggested:
compression=lz4
and adding a dataset property to choose betweenlz4
/lz4hc
(lz4mode
,compressionlevel
, etc.)How the code can be further cleaned up:
lz4
code with the new upstream code, which would decrease the amount of duplicated code (lz4.c
andlz4hc.c
have many common bits, butlz4.c
has them as macros while inlz4hc.c
they are functions).lz4[hc]_compress_zfs
functions, which contain almost the same code.Issue #1900