Add lz4hc compression type #3908

vozhyk- · 2015-10-10T11:13:08Z

Adds LZ4HC compression, enabled with a compression=lz4hc | lz4hc-{1..16} property.

Still doesn't add any feature flags and doesn't update the man-pages, as it hasn't been decided on the way it should be done. Currently 2 possibilities have been suggested:

Adding a feature flag
Using compression=lz4 and adding a dataset property to choose between lz4/lz4hc (lz4mode, compressionlevel, etc.)

How the code can be further cleaned up:

By replacing lz4 code with the new upstream code, which would decrease the amount of duplicated code (lz4.c and lz4hc.c have many common bits, but lz4.c has them as macros while in lz4hc.c they are functions).
By unifying lz4[hc]_compress_zfs functions, which contain almost the same code.

Issue #1900

vozhyk- · 2015-10-21T19:54:41Z

I've written a hack that allows to use LZ4HC while retaining full compatibility with existing implementations (lz4hc-hack branch).
It uses an lz4mode = lz4 | lz4hc | lz4hc-{1..16} property for choosing between LZ4/LZ4HC-*.
Better ways to do that might be:

to make a generic way to write a different compression value for selected algorithms (e.g. to add another field to zio_compress_table)
to use a different property than lz4mode (a property used only for one compression algorithm doesn't look good)

ahrens · 2015-10-22T21:53:55Z

include/sys/zio.h

@@ -114,6 +115,22 @@ enum zio_compress {
 	ZIO_COMPRESS_GZIP_9,
 	ZIO_COMPRESS_ZLE,
 	ZIO_COMPRESS_LZ4,
+	ZIO_COMPRESS_LZ4HC_1,


Unfortunately, this is not the best way to do it (GZIP_# is also not great). I assume that (like gzip), any lz4hc_# can be decompressed by one algorithm. Adding all these values eats into the limited namespace of BP_GET_COMPRESS() (128 values). We should differentiate between the values that are set on the "compression" property (which could be lz4hc-2 or lz3hc-3, etc) and the values that are stored in the BP (where we would add at most one value).

Doing this would be great, and would provide a generic way (instead of the hack that works only for LZ4HC) to store e.g. 'LZ4' instead of 'LZ4HC-n' value in the BP.
Is having many enum values like this still good if they are not stored in the BP, but used for the property values?

ahrens · 2015-10-22T21:58:05Z

@gerty3000 do you want to take a look at this, and/or counter-propose your hackathon results? (see also my existing comments)

ahrens · 2015-10-26T21:53:39Z

The lz4hc-hack branch seems correct, although as you said a little hacky. If it's valuable to have full backwards compatibility, I think something along those lines would be the way to go. An alternative that might be a cleaner implementation would be to only have readonly backwards compatibility. Then you could add the new "compression" property values, while not changing the values in the BP. Essentially you would replace the single "enum zio_compress" with 2 enums: one for values in the BP and one for values of the property. Then you just have to find the right place to translate from the prop val to the BP val. zio_write_bp_init (where you're essentially doing it now in the lz4hc-hack branch) seems reasonable.

vozhyk- · 2015-10-26T22:32:38Z

@ahrens
Thank you for the advice. I'm planning on doing it this way.
As for the feature flag (for the compression=lz4hc* property values), it can also be activated when the property is set to lz4hc* and deactivated when it's set to a different value.

vozhyk- · 2015-10-31T20:42:09Z

Updated; now the compression/decompression enums/tables are separate, and BP_COMPRESS_LZ4 is written for ZIO_COMPRESS_LZ4HC_* automatically.

--- BEGIN low-level code comments
I also thought of naming the "new" values ZIO_DECOMPRESS_* instead; both variants look bad in some parts of code (i.e. it's not readily apparent that BP_COMPRESS_* is used for decompression; or that you have to use BP_SET_COMPRESS(ZIO_UNCOMPRESS_*) (for the second variant)).

I also could try to retain a single zio_compress_table instead of splitting it in two, but it would result in either

slower decompression function lookups (walking the table to find a matching decompression value) or
forcing any new compression algorithm added after lz4hc to break the existing on-disk format (in case I append decompress_table + compress_table), or
limiting the number of compression algorithms (in case the compression values start at some bigger offset of the table than BP_COMPRESS_VALUES).

ZIO_COMPRESS_LEGACY_FUNCTIONS hasn't been renamed, since it's for both compression and decompression.
--- END low-level code comments

There is one issue with the current first commit - it seems to me that L2ARC writes the compression value to disk, but I haven't found the place in code where it happens / this can be changed. It shouldn't currently be a problem, since BP_COMPRESS_LZ4 == ZIO_COMPRESS_LZ4, but it's still wrong.

Now only the feature flag needs to be added.

ahrens · 2015-10-31T21:07:45Z

@vozhyk- wrote:

it seems to me that L2ARC writes the compression value to disk, but I haven't found the place in code where it happens

That's because it doesn't happen :-) L2ARC stores the compression function in memory only (remember, L2ARC is not persistent). See l2arc_buf_hdr_t:b_compress.

vozhyk- · 2015-10-31T21:18:16Z

@ahrens
Good to hear :-) . Then I'll update the commit to make the changes more minimal (to get rid of the added L2ARC_IS_VALID_DECOMPRESS).

vozhyk- · 2015-11-03T20:10:03Z

My lz4hc branch now has the LZ4HC_COMPRESS feature flag, which is now
required for setting compression=lz4hc* and activated on the dataset when
the property is set (the latter untested due to my Thinkpad T61 refusing to
boot after a reboot to test this (likely due to the Nvidia card breaking)).
Some more work remains to be done - making zfs deactivate the feature when
compression is set to a different value, adding zio_compress_to_feature(),
and updating the documentation (and doing some of what the first post says).

athei · 2016-03-19T15:37:29Z

Any updates here since last year? I would really like to see it in 0.7.

vozhyk- · 2016-03-19T16:03:41Z

None, except for some in https://github.com/vozhyk-/zfs/commits/lz4hc . I'll try to find some time to finish this soon then.

ahrens · 2016-03-19T16:14:02Z

@dankimmel Care to review this? I think you implemented something similar at a hackathon.

vozhyk- · 2016-04-30T18:59:37Z

Do I understand correctly that the compression values in dnodes can only be <LZ4 (so when some data on a dataset has been compressed with lz4hc and compress=lz4 is set, the LZ4HC values don't exist anywhere on the dataset)? If so, deactivating the LZ4HC pool feature when compress is set to a different value seems suitable.

vozhyk- · 2016-04-30T19:42:10Z

Rebased on master, added an LZ4_COMPRESS feature flag and some documentation.

vozhyk- · 2016-05-01T22:33:01Z

Updated - added a dependency on extensible_dataset to the lz4hc_compress feature flag. Turns out that "per-dataset" feature flags need it.

sempervictus · 2016-05-31T07:04:54Z

@vozhyk- I'd like to add this PR to several physical hosts in our environment which are running ABD, and at first glance, it looks like the merge conflict is "non-trivial."
Is there a chance we could ask you to review the possibility of merging a branch with ABD (or even better, #4439)? Thank you

vozhyk- · 2016-05-31T18:29:02Z

@sempervictus I'll look at it when I get the time - probably in 2 weeks (but it can be in 1-2 more weeks). I don't know much about ABD, so if it really is non-trivial, I won't be able to do much right away. It'll be interesting to learn about it and adapt my changes though.

sempervictus · 2016-05-31T22:11:14Z

@vozhyk-: thank you much

jumbi77 · 2016-08-05T21:59:23Z

@vozhyk Can i ask the status? Did you find some time working on it? What's about Dan Kimmel's prototype? Much thanks.

vozhyk- · 2016-08-06T10:13:08Z

@jumbi77 So far I haven't. I also haven't asked @dankimmel about his prototype yet.

These are the things that remain to be done:

Rebasing on master
Investigating/fixing the buildbot test failures (the zpool_get_002_pos test)
Finishing the work on deactivating the feature flag when no datasets have compression=lz4hc* set (my first attempt at making this work has failed)
Rebasing on ABD, if possible

vozhyk- · 2016-08-20T14:11:26Z

A small update - added the new feature flag to zpool_get test's config, so now it doesn't fail from seeing an unexpected pool property.

A few buildbots still fail, and I don't know the reasons for this. This doesn't seem related to this pull request though.

behlendorf

The next time you rebase can you also extend the existing compression test cases so they cover this.

jumbi77 · 2016-11-17T13:27:58Z

lz4 is now available in version 1.7.3 (https://github.com/lz4/lz4/releases).
On which version is your work based on? Since r131 there are over 350+ commits.
Do you plan to "update" lz4hc and lz4 to the new version? (@vozhyk-)

EDIT:
Anyway thanks for your work. Would be great to be able to choose a higher compression algorithm than current lz4. But according to the Github NEWS page lz4hc is really slow on compression (see: updated table with lz4hc -9 v1.7.3). Do you confirm that with your tests? As comparison zstandard is much faster on compression throughput and ratio. Decompression is still acceptable I guess. I saw that @skiselkov implemented zstd at Hackathon 2016, right? Are you planning to pulish your code, Saso? 👍

sempervictus · 2017-02-28T19:09:00Z

@vozhyk-: with ABD merged in, could you please rebase the patch against current master? Thank you

Store decompression values in BPs instead of compression values. Change zio_compress_table Add zio_decompress_table Add enum bp_compress

Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900

Require it being enabled for compression=lz4hc* . Activate LZ4HC feature if needed when syncing objset. Add LZ4HC feature to test config.

Add zio_compress_info_t.ci_feature

vozhyk- · 2017-03-01T21:59:58Z

@sempervictus Rebased. It compiles, but I haven't tested it at all.

I also have to look into how it interacts with compressed ARC. It seems I haven't introduced any regressions in compressed ARC, but I still have to verify it.

I've done no changes to the code or the tests besides rebasing.

dankimmel · 2017-03-02T07:31:35Z

I can't think of any reason compressed ARC would break as a result of adding a new algorithm, however you may need to add a new ZFS send feature flag for this compression algorithm if you want to use compressed send / receive with data in this format.

vozhyk- · 2017-03-26T08:29:31Z

I can't think of any reason compressed ARC would break as a result of

adding a new algorithm Here I also make compression (the `compression` option value) and decompression (stored in BPs) values separate enums. This could be a problem if the compressed ARC changes made L2ARC start storing the compression algorithm in the BP instead of keeping it in the ARC header in memory. From what I've seen, this is not the case, so it shouldn't be a problem.

however you may need to add a new ZFS send feature flag for this

compression algorithm if you want to use compressed send / receive with data in this format. Thanks, I'll look into this. The data can be decompressed with plain LZ4. The only difference between a dataset compressed with LZ4 and one compressed with LZ4HC is the `compression` option value. The rest of the data on disk says it's in LZ4. On Thu, Mar 2, 2017 at 8:31 AM Dan Kimmel <notifications@github.com> wrote: I can't think of any reason compressed ARC would break as a result of adding a new algorithm, however you may need to add a new ZFS send feature flag for this compression algorithm if you want to use compressed send / receive with data in this format. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3908 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC6QvG9drLvNoHTUFtt-6p2BQRWWdmvgks5rhnBfgaJpZM4GMjVE> .

ahrens · 2017-03-27T23:09:56Z

@vozhyk- @dankimmel @grwilson I think that separating the BP compression enum (which specifies the decompression function) from the compression property enum (which specifies the compression function) breaks arc_cksum_is_equal(), because the ARC only knows the decompression function (from the BP), but it needs to regenerate the exact data that is stored on disk by recompressing it.

@vozhyk- At a minimum you should change the ARC code to reflect the fact that it is storing the enum bp_compress rather than the enum zio_compress.

ahrens · 2017-03-28T23:36:43Z

@vozhyk- The best solution I can come to for the problem (uncompressed ARC + L2ARC + separate compression / decompression enums) is to keep the checksum of the uncompressed data in memory when the data is in L2ARC. This is similar to how it used to be before compressed ARC, the downside is that it uses more memory to manage the L2ARC. When the data is compressed on disk, uncompressed in the ARC, and stored in the L2ARC, we allocate a zio_cksum_t and have a new field in the arc_buf_hdr_t point to it.

Sorry the design I suggested (of separate compression/decompression enums) turns out to have some complications. Are you interested in implementing the scheme I described above?

behlendorf · 2018-07-30T23:49:11Z

Sorry the design I suggested (of separate compression/decompression enums) turns out to have some complications. Are you interested in implementing the scheme I described above?

For the moment, I'm going to close this PR out as stale though finalizing this functionality would still be welcome! @vozhyk- if you find the time and interest to revisit this please go ahead and open a new PR.

vozhyk- force-pushed the lz4hc-pull branch from ab19f09 to d42afe9 Compare October 10, 2015 12:14

behlendorf added the Type: Feature Feature request or new feature label Oct 13, 2015

behlendorf added the OpenZFS Review label Oct 20, 2015

ahrens reviewed Oct 22, 2015
View reviewed changes

vozhyk- force-pushed the lz4hc-pull branch 2 times, most recently from b6b2885 to 1288561 Compare October 31, 2015 20:14

vozhyk- force-pushed the lz4hc-pull branch 2 times, most recently from 909bcf7 to 98fca30 Compare November 1, 2015 23:48

vozhyk- force-pushed the lz4hc-pull branch from 98fca30 to 78fd130 Compare April 30, 2016 19:37

vozhyk- force-pushed the lz4hc-pull branch from 78fd130 to 8ac22c6 Compare May 1, 2016 22:25

vozhyk- force-pushed the lz4hc-pull branch from 8ac22c6 to 8794dde Compare August 6, 2016 10:44

vozhyk- force-pushed the lz4hc-pull branch 2 times, most recently from 0bfb258 to 260b0a3 Compare August 20, 2016 11:28

behlendorf requested changes Sep 17, 2016

View reviewed changes

behlendorf added the Status: Work in Progress Not yet ready for general review label Oct 7, 2016

vozhyk- force-pushed the lz4hc-pull branch from 260b0a3 to 3433135 Compare October 30, 2016 22:30

vozhyk- added 4 commits March 1, 2017 22:32

Separate compression and decompression algorithm enums/tables

ade382d

Store decompression values in BPs instead of compression values. Change zio_compress_table Add zio_decompress_table Add enum bp_compress

Add lz4hc compression type

b65f285

Add "compression = lz4hc | lz4hc-{1..16}" property values. Issue openzfs#1900

Add lz4hc_compress per-dataset feature flag

6655d8a

Require it being enabled for compression=lz4hc* . Activate LZ4HC feature if needed when syncing objset. Add LZ4HC feature to test config.

Store required feature flags in zio_compress_table

a957660

Add zio_compress_info_t.ci_feature

vozhyk- force-pushed the lz4hc-pull branch from 3433135 to a957660 Compare March 1, 2017 21:52

This was referenced Mar 27, 2017

lz4fast compression #5927

Closed

auto compression #5928

Closed

Quality of service for ZFS + improvement through compression #5929

Closed

behlendorf removed the OpenZFS Review label Jul 27, 2018

behlendorf closed this Jul 30, 2018

rincebrain mentioned this pull request Oct 10, 2018

XZ compression support #406

Open

rincebrain mentioned this pull request Jan 10, 2019

Suggestion: update the embedded lz4 copy #8260

Open

PrivatePuffin mentioned this pull request Dec 6, 2019

[WIP] Adaptive Compression Refresh #9689

Closed

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lz4hc compression type #3908

Add lz4hc compression type #3908

vozhyk- commented Oct 10, 2015

vozhyk- commented Oct 21, 2015

ahrens Oct 22, 2015

vozhyk- Oct 23, 2015

ahrens commented Oct 22, 2015

ahrens commented Oct 26, 2015

vozhyk- commented Oct 26, 2015

vozhyk- commented Oct 31, 2015

ahrens commented Oct 31, 2015

vozhyk- commented Oct 31, 2015

vozhyk- commented Nov 3, 2015

athei commented Mar 19, 2016

vozhyk- commented Mar 19, 2016

ahrens commented Mar 19, 2016

vozhyk- commented Apr 30, 2016 •

edited

Loading

vozhyk- commented Apr 30, 2016

vozhyk- commented May 1, 2016

sempervictus commented May 31, 2016

vozhyk- commented May 31, 2016

sempervictus commented May 31, 2016

jumbi77 commented Aug 5, 2016

vozhyk- commented Aug 6, 2016 •

edited

Loading

vozhyk- commented Aug 20, 2016

behlendorf left a comment

jumbi77 commented Nov 17, 2016 •

edited

Loading

sempervictus commented Feb 28, 2017

vozhyk- commented Mar 1, 2017

dankimmel commented Mar 2, 2017

vozhyk- commented Mar 26, 2017 via email

ahrens commented Mar 27, 2017

ahrens commented Mar 28, 2017

behlendorf commented Jul 30, 2018

Add lz4hc compression type #3908

Add lz4hc compression type #3908

Conversation

vozhyk- commented Oct 10, 2015

vozhyk- commented Oct 21, 2015

ahrens Oct 22, 2015

Choose a reason for hiding this comment

vozhyk- Oct 23, 2015

Choose a reason for hiding this comment

ahrens commented Oct 22, 2015

ahrens commented Oct 26, 2015

vozhyk- commented Oct 26, 2015

vozhyk- commented Oct 31, 2015

ahrens commented Oct 31, 2015

vozhyk- commented Oct 31, 2015

vozhyk- commented Nov 3, 2015

athei commented Mar 19, 2016

vozhyk- commented Mar 19, 2016

ahrens commented Mar 19, 2016

vozhyk- commented Apr 30, 2016 • edited Loading

vozhyk- commented Apr 30, 2016

vozhyk- commented May 1, 2016

sempervictus commented May 31, 2016

vozhyk- commented May 31, 2016

sempervictus commented May 31, 2016

jumbi77 commented Aug 5, 2016

vozhyk- commented Aug 6, 2016 • edited Loading

vozhyk- commented Aug 20, 2016

behlendorf left a comment

Choose a reason for hiding this comment

jumbi77 commented Nov 17, 2016 • edited Loading

sempervictus commented Feb 28, 2017

vozhyk- commented Mar 1, 2017

dankimmel commented Mar 2, 2017

vozhyk- commented Mar 26, 2017 via email

ahrens commented Mar 27, 2017

ahrens commented Mar 28, 2017

behlendorf commented Jul 30, 2018

vozhyk- commented Apr 30, 2016 •

edited

Loading

vozhyk- commented Aug 6, 2016 •

edited

Loading

jumbi77 commented Nov 17, 2016 •

edited

Loading