Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ZStandard Compression #9024

Closed
wants to merge 20 commits into from

Conversation

allanjude
Copy link
Contributor

@allanjude allanjude commented Jul 12, 2019

Warning: The on-disk layout has not been finalized yet. Please only use this on test pools, otherwise you may not be able to import your pool with the final version.

Note: This pull request has been broken into a few separate commits, mostly for ease of review (separate out the mechanical changes from the substantive ones). I can reformat this as separate pull requests if desired. We can decide if it makes sense to squash them after review.

Motivation and Context

ZStandard is a modern, high performance, general compression algorithm originally developed by Yann Collet (the author of LZ4). It aims to provide similar or better compression levels to GZIP, but with much better performance. ZStandard provides a large selection of compression levels to allow a storage administrator to select the preferred performance/compression trade-off.

Description

Add the new compression type, zstd, which defaults to compression level 3.
Available compression levels are zstd-1 through zstd-19 (the extreme levels were excluded due to their memory requirements. They are not useful with ZFS block sizes). In addition, ZStandard supports faster levels of compression, implemented as zstd-fast-. Supported levels are 1-10, 20-100 in steps of 10, 500, and 1000.

Rather than the approached used when gzip was added, of using a separate compression type for each different level, I instead added a new hidden property, compression_level. All of the zstd levels use the single entry in the zio_compress enum, since all levels use the same decompression function. However, in some cases, it was necessary to know what level a block was compressed with in other parts of the code, so the compression level is stored on disk inline with the data, after the compression header (uncompressed size). The level is plumbed through arc_buf_hdr_t, objset_t, and zio_prop_t. There may be additional places where this needs to be added still.

This new property is hidden from the user, and the user simply does zfs set compress=zstd-10 dataset and zfs_ioctl.c splits this single value into the two values to be stored in the dataset properties. This logic has been extended to Channel Programs as well.

How Has This Been Tested?

This has been tested mostly on FreeBSD, although some Linux users have tested it via a port of an earlier version of this work (Pull Requests #8044 and #8941)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@allanjude
Copy link
Contributor Author

I need to investigate a possible issue with property inheritance.
If you have a dataset with compression set locally, then you zfs inherit compress dataset I need to confirm this can't result in the hidden compression_level property having the wrong value.

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Jul 12, 2019
@sempervictus
Copy link
Contributor

Seems like the same approach would work for the lz4hc work proposed a whole back. Thank you

@allanjude
Copy link
Contributor Author

Seems like the same approach would work for the lz4hc work proposed a whole back. Thank you

Yes, I changed the design to be more generic last year after some consideration.
Specifically, I want to handle: upgrade the version of LZ4 we use (20+% performance improvement on decompression in a recent update, need to benchmark what other gains there have been over the years). Also considering how to manage versions of the compression, since we'll want to keep the old LZ4 compression function for users of nop-write, dedup, etc. LZ4 technically has levels too.

So I was thinking: add a 'new lz4' or 'lz4-2019' or something. Also, does LZ4HC use the same decompression function? is it one pass or do you have to unwind all of the passes?

It might also be worth looking at if LZ4HC is still useful, compared to lower levels of zstd and zstd-fast.

@gmelikov
Copy link
Member

So I was thinking: add a 'new lz4' or 'lz4-2019' or something. Also, does LZ4HC use the same decompression function? is it one pass or do you have to unwind all of the passes?

If you're already create new (but hidden) property - I think it's a good idea to have a compressor_version or similar property just for this case with version changes. So it would be possible to decompress with needed version. Thoughts?

@allanjude
Copy link
Contributor Author

So I was thinking: add a 'new lz4' or 'lz4-2019' or something. Also, does LZ4HC use the same decompression function? is it one pass or do you have to unwind all of the passes?

If you're already create new (but hidden) property - I think it's a good idea to have a compressor_version or similar property just for this case with version changes. So it would be possible to decompress with needed version. Thoughts?

I need to investigate how much trouble having the separate properties is (specifically for inheritance when you rename to a new parent etc) before deciding if we should do additional extra hidden properties, or just use the high bits of the 'compress_level' property to store the version.

Note: in general, you can always use the newest decompressor, even on data compressed with the older version, so we likely only need to keep the different versions of the compression function around, for use-cases like nop-write where we need to recompress data and get the same checksum.

@sempervictus
Copy link
Contributor

IIRC, LZ4HC does use the same decompressor, and it has some very very fast decompression rates - just shy of 2G/s according to http://fastcompression.blogspot.com/2015/01/zstd-stronger-compression-algorithm.html... So for WORM patterns, this is quite desirable.

@allanjude
Copy link
Contributor Author

allanjude commented Jul 15, 2019

IIRC, LZ4HC does use the same decompressor, and it has some very very fast decompression rates - just shy of 2G/s according to http://fastcompression.blogspot.com/2015/01/zstd-stronger-compression-algorithm.html... So for WORM patterns, this is quite desirable.

Some raw compression benchmarks from my workstation (these are not in a ZFS context, just general comparative information):

hw.model: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
hw.ncpu: 8

# lz4 -b1 -e9
*** LZ4 command line interface 64-bits v1.9.1, by Yann Collet ***
Benchmarking levels from 1 to 9
 1#Synthetic 50%     :  10000000 ->   5859357 (1.707),1280.0 MB/s ,6720.0 MB/s 
 2#Synthetic 50%     :  10000000 ->   5859357 (1.707),1280.0 MB/s ,6720.0 MB/s 
 3#Synthetic 50%     :  10000000 ->   4560720 (2.193),  75.3 MB/s ,3840.0 MB/s 
 4#Synthetic 50%     :  10000000 ->   4534386 (2.205),  61.0 MB/s ,3840.0 MB/s 
 5#Synthetic 50%     :  10000000 ->   4529772 (2.208),  55.7 MB/s ,3840.0 MB/s 
 6#Synthetic 50%     :  10000000 ->   4529581 (2.208),  55.7 MB/s ,3840.0 MB/s 
 7#Synthetic 50%     :  10000000 ->   4529569 (2.208),  54.5 MB/s ,3840.0 MB/s 
 8#Synthetic 50%     :  10000000 ->   4529565 (2.208),  55.7 MB/s ,3840.0 MB/s 
 9#Synthetic 50%     :  10000000 ->   4529565 (2.208),  55.7 MB/s ,3840.0 MB/s 

# zstd -b1 -e10 --fast=10 --priority=rt
*** zstd command line interface 64-bits v1.4.0, by Yann Collet ***
Note : switching to real-time priority 
-10#Synthetic 50%     :  10000000 ->   6110186 (1.637),1283.8 MB/s ,4428.0 MB/s 
-9#Synthetic 50%     :  10000000 ->   6046532 (1.654),1178.1 MB/s ,4265.9 MB/s 
-8#Synthetic 50%     :  10000000 ->   5974710 (1.674),1192.7 MB/s ,4083.0 MB/s 
-7#Synthetic 50%     :  10000000 ->   5900567 (1.695),1133.8 MB/s ,4017.0 MB/s 
-6#Synthetic 50%     :  10000000 ->   5832577 (1.715),1087.6 MB/s ,3882.2 MB/s 
-5#Synthetic 50%     :  10000000 ->   5739376 (1.742),1016.9 MB/s ,3716.1 MB/s 
-4#Synthetic 50%     :  10000000 ->   5637044 (1.774), 937.8 MB/s ,3505.5 MB/s 
-3#Synthetic 50%     :  10000000 ->   5538415 (1.806), 836.9 MB/s ,3248.9 MB/s 
-2#Synthetic 50%     :  10000000 ->   5424064 (1.844), 737.7 MB/s ,3050.1 MB/s 
-1#Synthetic 50%     :  10000000 ->   5381043 (1.858), 612.9 MB/s ,2959.6 MB/s 
 0#Synthetic 50%     :  10000000 ->   3222956 (3.103), 167.0 MB/s , 892.7 MB/s 
 1#Synthetic 50%     :  10000000 ->   3137011 (3.188), 430.8 MB/s ,1224.8 MB/s 
 2#Synthetic 50%     :  10000000 ->   3114376 (3.211), 317.0 MB/s ,1182.5 MB/s 
 3#Synthetic 50%     :  10000000 ->   3222956 (3.103), 166.7 MB/s , 888.6 MB/s 
 4#Synthetic 50%     :  10000000 ->   3339323 (2.995), 109.6 MB/s , 663.8 MB/s 
 5#Synthetic 50%     :  10000000 ->   3327445 (3.005),  50.6 MB/s , 703.0 MB/s 
 6#Synthetic 50%     :  10000000 ->   3331451 (3.002),  23.5 MB/s , 541.5 MB/s 
 7#Synthetic 50%     :  10000000 ->   3315227 (3.016),  38.2 MB/s , 715.9 MB/s 
 8#Synthetic 50%     :  10000000 ->   3306698 (3.024),  33.2 MB/s , 717.2 MB/s 
 9#Synthetic 50%     :  10000000 ->   3310350 (3.021),  24.4 MB/s , 707.7 MB/s 
10#Synthetic 50%     :  10000000 ->   3344322 (2.990),  17.4 MB/s , 663.1 MB/s 

So it seems like like zstd-fast-1 might be a better substitute for LZ4 if you want to decompress at 2GB/sec, although the compression level is not as good. It will really depend on your workload, and exactly how compressible your data is.

As you can see, all of the zstd-fast-* levels are 3GB/sec or better decompression (per core). And even the high levels of zstd still maintain 700MB/sec per core for decompression. Not as impressive as LZ4's 6+ GB/sec, but compression levels of 3.0 vs 1.7 mean you might actually save time by reading less off the disk.

@sempervictus
Copy link
Contributor

Not championing one vs the other - I've seen enough corner cases where "workload determines" over the years to be a firm believer in giving everyone enough rope/choice. Its dataset & descendant level property anyway.

@allanjude
Copy link
Contributor Author

Not championing one vs the other - I've seen enough corner cases where "workload determines" over the years to be a firm believer in giving everyone enough rope/choice. Its dataset & descendant level property anyway.

It was one of the things that drew me to ZSTD for ZFS, giving the admin a choice between an array of compression levels from many GB/sec compression and decompress, down to slower than gzip but even better compression (although ZSTD maintains decompression speed even with its highest compression levels)

@johnnyjacq16
Copy link

I am not sure if this pull request should be operational on linux, the older pull request #8941 is operational. With this pull request the zstd_zfs modules is not found hence cannot be loaded.

modinfo zfs
filename: /lib/modules/5.1.18-4-uksm/extra/zfs/zfs/zfs.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
description: ZFS
alias: devname:zfs
alias: char-major-10-249
srcversion: 4F872C88591BB3692E87278
depends: spl,znvpair,icp,zlua,zunicode,zcommon,zstd_zfs,zavl

modinfo zstd_zfs
modinfo: ERROR: Module zstd_zfs not found.

All other modules loaded:

modinfo spl
filename: /lib/modules/5.1.18-4-uksm/extra/spl/spl/spl.ko.xz
version: 0.8.0-1
license: GPL
author: OpenZFS on Linux
description: Solaris Porting Layer
srcversion: 738D9BB40B25892B95D18A8
depends:
retpoline: Y
name: spl

modinfo znvpair
filename: /lib/modules/5.1.18-4-uksm/extra/nvpair/znvpair/znvpair.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
description: Generic name/value pair implementation
srcversion: 1E8C1B92D74D8298998A5AD
depends: spl
retpoline: Y
name: znvpair

modinfo icp
filename: /lib/modules/5.1.18-4-uksm/extra/icp/icp/icp.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
srcversion: 521F98D25594D17E0227929
depends: spl
retpoline: Y
name: icp

modinfo zlua
filename: /lib/modules/5.1.18-4-uksm/extra/lua/zlua/zlua.ko.xz
version: 0.8.0-1
license: MIT
author: Lua.org
description: Lua Interpreter for ZFS
srcversion: 14B2EB141354468DBB3680B
depends:
retpoline: Y
name: zlua

modinfo zunicode
filename: /lib/modules/5.1.18-4-uksm/extra/unicode/zunicode/zunicode.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
description: Unicode implementation
srcversion: CB246334CE6CB1934133745
depends:
retpoline: Y
name: zunicode

modinfo zavl
filename: /lib/modules/5.1.18-4-uksm/extra/avl/avl/zavl.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
description: Generic AVL tree implementation
srcversion: 98E85778E754CF75DEF9E8E
depends: spl
retpoline: Y
name: zavl

@allanjude
Copy link
Contributor Author

I am not sure if this pull request should be operational on linux, the older pull request #8941 is operational. With this pull request the zstd_zfs modules is not found hence cannot be loaded.

Are there errors during the build? This branch is working for me on CentOS 7 following the instructions from the ZoL Wiki.

@johnnyjacq16
Copy link

Hi @allanjude the problem still persist and there are no compilation errors that prevent zfs from compiling.
My configuration code to compile zfs looks like this:

./autogen.sh

./configure --prefix=/usr --sysconfdir=/etc --sbindir=/usr/bin --with-mounthelperdir=/usr/bin \
             --libdir=/usr/lib --datadir=/usr/share --includedir=/usr/include \
             --with-udevdir=/lib/udev --libexecdir=/usr/lib/zfs-zstd \
             --with-config=user --enable-systemd --enable-pyzfs

I tried using --with-config=srpm as used in the CentOS configuration shown below:

./autogen.sh

./configure --prefix=/usr --sysconfdir=/etc --sbindir=/usr/bin --with-mounthelperdir=/usr/bin \
                --libdir=/usr/lib --datadir=/usr/share --includedir=/usr/include \
                --with-udevdir=/lib/udev --libexecdir=/usr/lib/zfs-zstd \
                --with-config=srpm --with-config=user --enable-systemd --enable-pyzfs

however the zstd_zfs module for some reason doesn't get built when creating the dkms package.

@allanjude
Copy link
Contributor Author

Hi @allanjude the problem still persist and there are no compilation errors that prevent zfs from compiling.

however the zstd_zfs module for some reason doesn't get built when creating the dkms package.

I've only tested with the ./scripts/zfs.sh script or using 'make install'. I've not tested with dkms or rpms, there may be entries I need to add to manifests or something.

@c0d3z3r0
Copy link
Contributor

zstd 1.4.1 got released few hours ago improving performance a bit and fixing some bugs https://github.com/facebook/zstd/releases/tag/v1.4.1

@lnicola lnicola mentioned this pull request Jul 21, 2019
13 tasks
@ahrens ahrens added Status: Design Review Needed Architecture or design is under discussion and removed Status: Work in Progress Not yet ready for general review labels Jul 23, 2019
@Conan-Kudo
Copy link
Contributor

@allanjude Could you please rebase your PR against current master? There's a conflict which makes the buildbot infra unhappy...

Copy link
Contributor

@Conan-Kudo Conan-Kudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple review based on a look through the code turned up a couple of questions...

@@ -0,0 +1,7 @@
/* This file is in the public domain */
/* $FreeBSD$ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need these relatively useless $FreeBSD$ markers? They don't expand in this source tree...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files were taken verbatim from FreeBSD. We can remove these lines if that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd prefer them gone, as they look silly unexpanded.

@@ -0,0 +1,3 @@
/* This file is in the public domain */
/* $FreeBSD$ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -0,0 +1,3 @@
/* This file is in the public domain */
/* $FreeBSD$ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -0,0 +1,3 @@
/* This file is in the public domain */
/* $FreeBSD$ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -0,0 +1,3 @@
/* This file is in the public domain */
/* $FreeBSD$ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

module/zfs/dmu.c Outdated
@@ -2164,6 +2166,11 @@ dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp, zio_prop_t *zp)
} else {
compress = zio_compress_select(os->os_spa, dn->dn_compress,
compress);
#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we including a code block that is entirely disabled with #if 0? Do you feel this will get used at all at some point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question of if this block of code is needed is still outstanding. In the end it will either be enabled or removed.

@Conan-Kudo
Copy link
Contributor

@allanjude

So this is a common failure I'm seeing:

make[7]: *** No rule to make target `/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs/zstd/common/zstd_common.o', needed by `/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs/zstd_zfs.o'.  Stop.
make[6]: *** [/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs] Error 2

This seems like there's a dependency problem here making parallel make fail... Have you tried testing locally with make -j6 or higher?

@allanjude
Copy link
Contributor Author

@allanjude

So this is a common failure I'm seeing:

make[7]: *** No rule to make target `/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs/zstd/common/zstd_common.o', needed by `/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs/zstd_zfs.o'.  Stop.
make[6]: *** [/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs] Error 2

This seems like there's a dependency problem here making parallel make fail... Have you tried testing locally with make -j6 or higher?

I usually use -j8, but I can try some other numbers.

Can you see anything obvious in the Makefile that will cause this? The automake stuff is not my strong suit, we don't use it in the FreeBSD kernel. I don't see where to make the dependencies more explicit.

@lnicola
Copy link
Contributor

lnicola commented Jul 24, 2019

@Conan-Kudo When the dependency tracking doesn't work, running make a second time usually makes progress. Do you see that on this branch?

@@ -12,11 +16,15 @@ AM_CFLAGS += $(NO_UNUSED_BUT_SET_VARIABLE)
# Includes kernel code generate warnings for large stack frames
AM_CFLAGS += $(FRAME_LARGER_THAN)

AM_CFLAGS += -DLIB_ZPOOL_BUILD
AM_CFLAGS += -DLIB_ZPOOL_BUILD -Wframe-larger-than=20480 -Wno-uninitialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Wframe-larger-than=20480 should get removed here; see #8044 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're keeping the pristine copy of libzstd, so this flag is still required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only if you keep zstd as is. half of the code of zstd is unused. if you keep the unused code out. the whole line is obsolete. see my variant of the zstd patch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason for us to mangle the zstd dependency, and it would make it harder to upgrade it over time. Leaving it as is makes the maintenance burden very minor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full ACK. Libs should be kept vanilla whenever possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BrainSlayer I assume this should have gone here...

BrainSlayer commented 14 minutes ago
i would ack too. but there is a counter argument. the code you compile contains alot of unused code. its about 300 kb bigger than it should and the unused code cannot even be used due the high stack size usage. you compile basicly broken code which will crash any kernel if used.

Am 28.07.2019 um 10:14 schrieb Michael Niewöhner:… @.**** commented on this pull request. ------------------------------------------------------------------------ In lib/libzpool/Makefile.am <#9024 (comment)>: > @@ -12,11 +16,15 @@ AM_CFLAGS += $(NO_UNUSED_BUT_SET_VARIABLE) # Includes kernel code generate warnings for large stack frames AM_CFLAGS += $(FRAME_LARGER_THAN) -AM_CFLAGS += -DLIB_ZPOOL_BUILD +AM_CFLAGS += -DLIB_ZPOOL_BUILD -Wframe-larger-than=20480 -Wno-uninitialized Full ACK. Libs should be kept vanilla whenever possible — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9024?email_source=notifications&email_token=AB2WNE5YMXU2NJC46J7Y5C3QBVIP5A5CNFSM4IB75NFKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB7Y2PGA#discussion_r307990857>, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WNEZEQML3MMAYX5R6YJ3QBVIP5ANCNFSM4IB75NFA.
--

@@ -6,6 +6,7 @@ subdir-m += spl
subdir-m += unicode
subdir-m += zcommon
subdir-m += zfs
subdir-m += zstd_zfs
Copy link
Contributor

@c0d3z3r0 c0d3z3r0 Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this go before zfs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, it should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am willing to bet this is the one that is causing the concurrency issues...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allanjude pretty sure, yes

Copy link
Contributor

@c0d3z3r0 c0d3z3r0 Jul 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and now you've got that error in the tests as well... looks like zstd_zfs must go before zcommon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with zstd_zfs before zcommon, same build failure. I cannot recreate this on a completely fresh build, but somehow can get it to happen when rebuilding incrementally later.

@c0d3z3r0
Copy link
Contributor

@allanjude DKMS config entries are missing:

diff --git a/scripts/dkms.mkconf b/scripts/dkms.mkconf
index e1a49dca1..85fbb7c58 100755
--- a/scripts/dkms.mkconf
+++ b/scripts/dkms.mkconf
@@ -89,6 +89,7 @@ STRIP[4]="\${STRIP[0]}"
 STRIP[5]="\${STRIP[0]}"
 STRIP[6]="\${STRIP[0]}"
 STRIP[7]="\${STRIP[0]}"
+STRIP[8]="\${STRIP[0]}"
 BUILT_MODULE_NAME[0]="zavl"
 BUILT_MODULE_LOCATION[0]="module/avl/"
 DEST_MODULE_LOCATION[0]="/extra/avl/avl"
@@ -113,4 +114,7 @@ DEST_MODULE_LOCATION[6]="/extra/lua/zlua"
 BUILT_MODULE_NAME[7]="spl"
 BUILT_MODULE_LOCATION[7]="module/spl/"
 DEST_MODULE_LOCATION[7]="/extra/spl/spl"
+BUILT_MODULE_NAME[8]="zstd_zfs"
+BUILT_MODULE_LOCATION[8]="module/zstd_zfs/"
+DEST_MODULE_LOCATION[8]="/extra/zstd_zfs/zstd_zfs"
 EOF

@BrainSlayer
Copy link
Contributor

BrainSlayer commented Jul 28, 2019 via email

allanjude and others added 6 commits August 13, 2020 04:52
Extend APIs that deal with compression to also pass the compression level

Signed-off-by: Allan Jude <allanjude@freebsd.org>
Extend APIs that deals with compression to also pass the compression
level.

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
If a compression algorithm requires a feature flag, activate the
feature when the property is set, rather than waiting for the first
block to be born.  This is currently only used by zstd but can be
extended as needed.

Sponsored-by: The FreeBSD Foundation
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This imports the unmodified ZStandard single-file library which will be
used by ZFS. This code shall not be modified in any way to keep it
easily updatable.

The library is excluded from codecov calculation as dependencies don't
need full codecov. It's also added to the cppcheck suppressions list.

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This adds headers to be able to:

a) build ZFS with ZSTD on various platforms/architectures (Linux, FreeBSD, aarch64)
b) build ZFS with ZSTD for userspace

without modification of the original ZSTD library.

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This adds dummy defines for `__init`, `__exit` to be able to compile kernel
code for userspace.

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
allanjude and others added 13 commits August 13, 2020 18:50
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Add zstd support to zfs

ZStandard is a modern, high performance, general compression algorithm
originally developed by Yann Collet (the author of LZ4). It aims to
provide similar or better compression levels to GZIP, but with much
better performance. ZStandard provides a large selection of compression
levels to allow a storage administrator to select the preferred
performance/compression trade-off.

This PR adds the new compression types: zstd and zstd-fast, which is
basically a faster "negative" level of zstd.

Available compression levels for zstd are zstd-1 through zstd-19. In
addition, faster levels supported by zstd-fast are 1-10, 20-100 (in
increments of 10), 500 and 1000. As zstd-fast is basically a "negative"
level of zstd, it gets faster with every level in stark contrast with
"normal" zstd which gets slower with every level.

Rather than the approach used when gzip was added, of using a separate
compression type for each different level, this PR added a new hidden
property, compression_level. All of the zstd levels use the single entry
in the zio_compress enum, since all levels use the same decompression
function.

However, in some cases, it was necessary to know what level a block was
compressed within other parts of the code, so the compression level is
stored on disk inline with the data, after the compression header
(uncompressed size). The level is plumbed through arc_buf_hdr_t,
objset_t, and zio_prop_t. There may still be additional places where
this needs to be added.

This new property is hidden from the user, and the user simply does zfs
set compress=zstd-10 dataset and zfs_ioctl.c splits this single value
into the two values to be stored in the dataset properties. This logic
has been extended to Channel Programs as well.

Closes: openzfs#6247
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This add a new build intermediary `libzstd`, that allows us to keep zstd
cleanly separated from libzpool and to override the frame-size compiler
option for zstd only. (This is needed due to unused code in the
intentionally unmodified zstd source.)

Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This will allow us to measure the number of times compression is
skipped due to a memory allocation failure, as well as the number of
times that we have used the fallback decompression context.

It also counts all other failure types:
- (de)compression level invalid
- (de)compression memory allocation failed
- (de)compression failed (zstd error)
- Failed to decode zstd_header for decompression (invalid size)

Sponsored By: The FreeBSD Foundation
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
- Add zstd to the test suite
  - Add zstd to history_002_pos.ksh
  - Add random levels of zstd to history_002_pos.ksh
  - Add zstd-fast to history_002_pos.ksh
  - Add random levels of zstd-fast to history_002_pos.ksh

- Add documentation
  - Add man page content
  - Add README for module/zstd

- Add missing persons to AUTHORS

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
zdb can now decode the ZSTD compression header and inspect the size,
version and compression level saved in that header.

This will be used to create a test that will ensure this information
is saved and restored correctly.

Example usage:
```
obj=2 dataset=zof/zstd/10 path=/allanbsd.txt type=19 bonustype=44

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         2    1   128K  8.50K      0     512  8.50K  100.00  ZFS plain file (K=inherit) (Z=inherit)
                                               168   bonus  System attributes
        dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
        dnode maxblkid: 0
        uid     0
        gid     0
        atime   Sat Jun 13 16:03:26 2020
        mtime   Sat Jun 13 16:03:26 2020
        ctime   Sat Jun 13 16:03:26 2020
        crtime  Sat Jun 13 16:03:26 2020
        gen     3046
        mode    100644
        size    8193
        parent  34
        links   1
        pflags  40800000004
Indirect blocks:
               0 L0 EMBEDDED et=0 2200L/22P B=3046 ZSTD:size=26:version=10405:level=10:EMBEDDED

                segment [0000000000000000, 0000000000002200) size 8.50K
```

For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.

Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Restore previous output format for zdb

Only show the dnode compression details if they are interesting,
or if the verbosity level is high enough

This solves an issue with the cli_root/zdb/zdb_object_range_pos test

Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
1. Add a test case for the ZSTD header: Ensure the actual psize and
   level are saved and restored correctly.

2. Add a test that compression survives replication

3. Add a new test to confirm that the ZSTD feature flag is activated.

   When the compression property on a dataset is set to zstd, the
   feature flag is not changed from 'enabled' to 'active' until the
   first block is born.

   If the pool is exported after `zfs set` but before a block is written
   then imported on a system with an older version of ZFS that does not
   understand ZSTD, it will cause the userland utilities to panic.

This also moves the test data to a common location as it is used by
multiple tests.

Sponsored-by: The FreeBSD Foundation
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Sponsored-by: The FreeBSD Foundation
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Sponsored-by: The FreeBSD Foundation
Signed-off-by: Allan Jude <allanjude@freebsd.org>
When reading compressed blocks from the L2ARC, with
compressed ARC disabled, arc_hdr_size() returns
LSIZE rather than PSIZE, but the actual read is PSIZE.
This causes l2arc_read_done() to compare the checksum
against the wrong size, resulting in checksum failure.

This manifests as an increase in the kstat l2_cksum_bad
and the read being retried from the main pool, making the
L2ARC ineffective.

Add new L2ARC tests with Compressed ARC enabled/disabled

Blocks are handled differently depending on the state of the
zfs_compressed_arc_enabled tunable.

If a block is compressed on-disk, and compressed_arc is enabled:
- the block is read from disk
- It is NOT decompressed
- It is added to the ARC in its compressed form
- l2arc_write_buffers() may write it to the L2ARC (as is)
- l2arc_read_done() compares the checksum to the BP (compressed)

However, if compressed_arc is disabled:
- the block is read from disk
- It is decompressed
- It is added to the ARC (uncompressed)
- l2arc_write_buffers() will use l2arc_apply_transforms() to
  recompress the block, before writing it to the L2ARC
- l2arc_read_done() compares the checksum to the BP (compressed)
- l2arc_read_done() will use l2arc_untransform() to uncompress it

This test writes out a test file to a pool consisting of one disk
and one cache device, then randomly reads from it. Since the arc_max
in the tests is low, this will feed the L2ARC, and result in reads
from the L2ARC.

We compare the value of the kstat l2_cksum_bad before and after
to determine if any blocks failed to survive the trip through the
L2ARC.

Sponsored-by: The FreeBSD Foundation
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
@ghost ghost added Status: Work in Progress Not yet ready for general review and removed Status: Design Review Needed Architecture or design is under discussion labels Aug 14, 2020
@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Aug 14, 2020

  • I'm an idiot and misread. Please ignore.

c0d3z3r0 added a commit to c0d3z3r0/zfs that referenced this pull request Aug 19, 2020
This PR adds two new compression types, based on ZStandard:

- zstd: A basic ZStandard compression algorithm Available compression.
  Levels for zstd are zstd-1 through zstd-19, where the compression
  increases with every level, but speed decreases.

- zstd-fast: A faster version of the ZStandard compression algorithm
  zstd-fast is basically a "negative" level of zstd. The compression
  decreases with every level, but speed increases.

  Available compression levels for zstd-fast:
   - zstd-fast-1 through zstd-fast-10
   - zstd-fast-20 through zstd-fast-100 (in increments of 10)
   - zstd-fast-500 and zstd-fast-1000

For more information check the man page.

Implementation details:

Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.

The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers).  The upper bits are used to store the compression level.

It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.

All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables.  Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).

The userspace tools all use the combined/bit-shifted value.

Additional notes:

zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.

ZSTD is included with all current tests and new tests are added
as-needed.

Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born.  This is currently only used by zstd but can be
extended as needed.

Closes: openzfs#6247
Closes: openzfs#10277
Closes: openzfs#9024
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
c0d3z3r0 added a commit to c0d3z3r0/zfs that referenced this pull request Aug 19, 2020
This PR adds two new compression types, based on ZStandard:

- zstd: A basic ZStandard compression algorithm Available compression.
  Levels for zstd are zstd-1 through zstd-19, where the compression
  increases with every level, but speed decreases.

- zstd-fast: A faster version of the ZStandard compression algorithm
  zstd-fast is basically a "negative" level of zstd. The compression
  decreases with every level, but speed increases.

  Available compression levels for zstd-fast:
   - zstd-fast-1 through zstd-fast-10
   - zstd-fast-20 through zstd-fast-100 (in increments of 10)
   - zstd-fast-500 and zstd-fast-1000

For more information check the man page.

Implementation details:

Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.

The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers).  The upper bits are used to store the compression level.

It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.

All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables.  Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).

The userspace tools all use the combined/bit-shifted value.

Additional notes:

zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.

ZSTD is included with all current tests and new tests are added
as-needed.

Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born.  This is currently only used by zstd but can be
extended as needed.

Closes: openzfs#6247
Closes: openzfs#10277
Closes: openzfs#9024
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
This PR adds two new compression types, based on ZStandard:

- zstd: A basic ZStandard compression algorithm Available compression.
  Levels for zstd are zstd-1 through zstd-19, where the compression
  increases with every level, but speed decreases.

- zstd-fast: A faster version of the ZStandard compression algorithm
  zstd-fast is basically a "negative" level of zstd. The compression
  decreases with every level, but speed increases.

  Available compression levels for zstd-fast:
   - zstd-fast-1 through zstd-fast-10
   - zstd-fast-20 through zstd-fast-100 (in increments of 10)
   - zstd-fast-500 and zstd-fast-1000

For more information check the man page.

Implementation details:

Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.

The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers).  The upper bits are used to store the compression level.

It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.

All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables.  Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).

The userspace tools all use the combined/bit-shifted value.

Additional notes:

zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.

ZSTD is included with all current tests and new tests are added
as-needed.

Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born.  This is currently only used by zstd but can be
extended as needed.

Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes openzfs#6247
Closes openzfs#9024
Closes openzfs#10277
Closes openzfs#10278
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
This PR adds two new compression types, based on ZStandard:

- zstd: A basic ZStandard compression algorithm Available compression.
  Levels for zstd are zstd-1 through zstd-19, where the compression
  increases with every level, but speed decreases.

- zstd-fast: A faster version of the ZStandard compression algorithm
  zstd-fast is basically a "negative" level of zstd. The compression
  decreases with every level, but speed increases.

  Available compression levels for zstd-fast:
   - zstd-fast-1 through zstd-fast-10
   - zstd-fast-20 through zstd-fast-100 (in increments of 10)
   - zstd-fast-500 and zstd-fast-1000

For more information check the man page.

Implementation details:

Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.

The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers).  The upper bits are used to store the compression level.

It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.

All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables.  Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).

The userspace tools all use the combined/bit-shifted value.

Additional notes:

zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.

ZSTD is included with all current tests and new tests are added
as-needed.

Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born.  This is currently only used by zstd but can be
extended as needed.

Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes openzfs#6247
Closes openzfs#9024
Closes openzfs#10277
Closes openzfs#10278
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Work in Progress Not yet ready for general review
Projects
None yet
Development

Successfully merging this pull request may close these issues.