Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lz4fast compression #5927

Closed
wants to merge 1 commit into from
Closed

lz4fast compression #5927

wants to merge 1 commit into from

Conversation

n1kl
Copy link

@n1kl n1kl commented Mar 27, 2017

As part of my master thesis at University of Hamburg I have targeted to improve ZFS through compression. Now I would like to share my 3 feature branches with the community.

  1. lz4fast (current)
  2. autocompression auto compression #5928
  3. qos Quality of service for ZFS + improvement through compression #5929

This patch updates the lz4 *1 code to version 1.7.3 to make use of lz4 fast compression.
The lz4 code is based on a seperate project for updating lz4 inside the linux kernel.
There a few changes were made for an clean implementation and to improve speed that are currently in review *2.

*1: https://github.com/lz4/lz4
*2: https://patchwork.kernel.org/patch/9574745/

Description

LZ4-fast capability is now available.
zfs set compression=lz4fast-[1-20,30,+10*n,100]
Higher values result in improved compression speed and less ratio.

Motivation and Context

Lz4 fast trades in compression ratio for speed. This gives us more flexibility in environments with either low computational power or fast and many SSDs/HDDs where the lz4 is the limiting factor.
Autocompression and qos can also be improved by adding lz4fast algorithms.

How Has This Been Tested?

Checksums were made to proof full compatibility between the old and new lz4 compressed files.

Benchmark

Copy file from Tempfs to ZFS (ZFS also in Tempfs for high disk throughput simulation).

Name Ratio MB/s
lz4 0.58 228
lz4fast-2 0.62 249
lz4fast-3 0.65 266
lz4fast-4 0.68 282
lz4fast-5 0.71 298
lz4fast-7 0.76 329
lz4fast-10 0.80 370
lz4fast-20 0.97 469
lz4fast-30 0.98 546
lz4fast-50 0.98 634
lz4fast-100 0.99 690
off 1 744

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)

Branch overlapping changes (feature, compress values)

The patch has read-only backward compatibility by using the new SPA_FEATURE_LZ4FAST_COMPRESS feature. The feature activation procedure is equivalent to my other code branches.
Regarding the limited namespace of BP_GET_COMPRESS() (128 values), the
zio_compress enum's first part is for block pointer & dataset values, the second part for dataset values only. Or should I make use of a new property? This is an alternative suggestion to #3908.

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • Change has been approved by a ZFS on Linux member.

@mention-bot
Copy link

@n1kl, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @edillmann and @mkjorling to be potential reviewers.

@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Mar 27, 2017
Copy link
Member

@ahrens ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using separate enums for the blkptr_t's compression field, and the property value's compression field, as is done in #3908

.TE

\fBlz4fast\fR adds an acceleration n factor to \fBlz4\fR,
reducung the time to compress by about 1.03^n.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: reducing

{ "lz4fast-70", ZIO_COMPRESS_LZ4FAST_70 },
{ "lz4fast-80", ZIO_COMPRESS_LZ4FAST_80 },
{ "lz4fast-90", ZIO_COMPRESS_LZ4FAST_90 },
{ "lz4fast-100", ZIO_COMPRESS_LZ4FAST_100 },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that not all values of N between 1 and 100 are supported for compression=lz4fast-N?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After N=20 I have only supported values in steps of 10 because of the low impact on compression ratio. I can extend the table when required. Keeping values up to 100 will give more flexibility when tuning #define COMPRESSIONLEVEL 12 for CPUs with big L1 cache.

@behlendorf behlendorf added the Type: Feature Feature request or new feature label Mar 27, 2017
Copy link
Contributor

@brad-lewis brad-lewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at dsl_dataset code. It looks good with a couple of nits.

dsl_dataset_actv_lz4fast_compress_sync, (void *)ddname, 0,
ZFS_SPACE_CHECK_RESERVED);

if (error == EALREADY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to hit this condition? It seems like this might unnecessary as I don't see dsl_dataset_actv_lz4fast_compress_check() returning EALREADY.

char *ddname = (char *)arg;
dsl_pool_t *dp = dmu_tx_pool(tx);
dsl_dataset_t *ds;
spa_feature_t f = SPA_FEATURE_LZ4FAST_COMPRESS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. It might be better to use SPA_FEATURE_LZ4FAST_COMPRESS instead of introducing the variable f. It's helpful for tools like grok that might show a line. It's also consistent with the use of SPA_FEATAURE_* in this file.

@ahrens
Copy link
Member

ahrens commented Jun 9, 2018

@allanjude may be interested in this

@behlendorf behlendorf added Status: Work in Progress Not yet ready for general review and removed Type: Performance Performance improvement or performance problem labels Jul 30, 2018
@behlendorf behlendorf added Status: Inactive Not being actively updated and removed Status: Work in Progress Not yet ready for general review labels Sep 25, 2018
@ahrens ahrens added the Status: Revision Needed Changes are required for the PR to be accepted label Sep 27, 2018
@interduo
Copy link

@n1kl will You remove conflicts and make suggested corrections? We would be happy to see lz4fast in ZFS.

@ahrens
Copy link
Member

ahrens commented Jun 4, 2021

I noticed that this PR hasn’t been updated in some time. We would like to see this feature added, but it seems that it isn’t quite complete, so we’re going to close it for now. When you have time to revisit this work, feel free to reopen this PR or open a new one. Thanks for your contribution to OpenZFS and please continue to open PR’s to address issues that you encounter!

@ahrens ahrens closed this Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated Status: Revision Needed Changes are required for the PR to be accepted Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants