Implement ZStandard Compression #9024

allanjude · 2019-07-12T03:13:00Z

Warning: The on-disk layout has not been finalized yet. Please only use this on test pools, otherwise you may not be able to import your pool with the final version.

Note: This pull request has been broken into a few separate commits, mostly for ease of review (separate out the mechanical changes from the substantive ones). I can reformat this as separate pull requests if desired. We can decide if it makes sense to squash them after review.

Motivation and Context

ZStandard is a modern, high performance, general compression algorithm originally developed by Yann Collet (the author of LZ4). It aims to provide similar or better compression levels to GZIP, but with much better performance. ZStandard provides a large selection of compression levels to allow a storage administrator to select the preferred performance/compression trade-off.

Description

Add the new compression type, zstd, which defaults to compression level 3.
Available compression levels are zstd-1 through zstd-19 (the extreme levels were excluded due to their memory requirements. They are not useful with ZFS block sizes). In addition, ZStandard supports faster levels of compression, implemented as zstd-fast-. Supported levels are 1-10, 20-100 in steps of 10, 500, and 1000.

Rather than the approached used when gzip was added, of using a separate compression type for each different level, I instead added a new hidden property, compression_level. All of the zstd levels use the single entry in the zio_compress enum, since all levels use the same decompression function. However, in some cases, it was necessary to know what level a block was compressed with in other parts of the code, so the compression level is stored on disk inline with the data, after the compression header (uncompressed size). The level is plumbed through arc_buf_hdr_t, objset_t, and zio_prop_t. There may be additional places where this needs to be added still.

This new property is hidden from the user, and the user simply does zfs set compress=zstd-10 dataset and zfs_ioctl.c splits this single value into the two values to be stored in the dataset properties. This logic has been extended to Channel Programs as well.

How Has This Been Tested?

This has been tested mostly on FreeBSD, although some Linux users have tested it via a port of an earlier version of this work (Pull Requests #8044 and #8941)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

allanjude · 2019-07-12T03:16:34Z

I need to investigate a possible issue with property inheritance.
If you have a dataset with compression set locally, then you zfs inherit compress dataset I need to confirm this can't result in the hidden compression_level property having the wrong value.

sempervictus · 2019-07-14T04:36:28Z

Seems like the same approach would work for the lz4hc work proposed a whole back. Thank you

allanjude · 2019-07-14T04:51:07Z

Seems like the same approach would work for the lz4hc work proposed a whole back. Thank you

Yes, I changed the design to be more generic last year after some consideration.
Specifically, I want to handle: upgrade the version of LZ4 we use (20+% performance improvement on decompression in a recent update, need to benchmark what other gains there have been over the years). Also considering how to manage versions of the compression, since we'll want to keep the old LZ4 compression function for users of nop-write, dedup, etc. LZ4 technically has levels too.

So I was thinking: add a 'new lz4' or 'lz4-2019' or something. Also, does LZ4HC use the same decompression function? is it one pass or do you have to unwind all of the passes?

It might also be worth looking at if LZ4HC is still useful, compared to lower levels of zstd and zstd-fast.

gmelikov · 2019-07-14T09:38:13Z

So I was thinking: add a 'new lz4' or 'lz4-2019' or something. Also, does LZ4HC use the same decompression function? is it one pass or do you have to unwind all of the passes?

If you're already create new (but hidden) property - I think it's a good idea to have a compressor_version or similar property just for this case with version changes. So it would be possible to decompress with needed version. Thoughts?

allanjude · 2019-07-14T14:54:53Z

So I was thinking: add a 'new lz4' or 'lz4-2019' or something. Also, does LZ4HC use the same decompression function? is it one pass or do you have to unwind all of the passes?

If you're already create new (but hidden) property - I think it's a good idea to have a compressor_version or similar property just for this case with version changes. So it would be possible to decompress with needed version. Thoughts?

I need to investigate how much trouble having the separate properties is (specifically for inheritance when you rename to a new parent etc) before deciding if we should do additional extra hidden properties, or just use the high bits of the 'compress_level' property to store the version.

Note: in general, you can always use the newest decompressor, even on data compressed with the older version, so we likely only need to keep the different versions of the compression function around, for use-cases like nop-write where we need to recompress data and get the same checksum.

sempervictus · 2019-07-15T22:11:05Z

IIRC, LZ4HC does use the same decompressor, and it has some very very fast decompression rates - just shy of 2G/s according to http://fastcompression.blogspot.com/2015/01/zstd-stronger-compression-algorithm.html... So for WORM patterns, this is quite desirable.

allanjude · 2019-07-15T22:40:00Z

IIRC, LZ4HC does use the same decompressor, and it has some very very fast decompression rates - just shy of 2G/s according to http://fastcompression.blogspot.com/2015/01/zstd-stronger-compression-algorithm.html... So for WORM patterns, this is quite desirable.

Some raw compression benchmarks from my workstation (these are not in a ZFS context, just general comparative information):

hw.model: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
hw.ncpu: 8

# lz4 -b1 -e9
*** LZ4 command line interface 64-bits v1.9.1, by Yann Collet ***
Benchmarking levels from 1 to 9
 1#Synthetic 50%     :  10000000 ->   5859357 (1.707),1280.0 MB/s ,6720.0 MB/s 
 2#Synthetic 50%     :  10000000 ->   5859357 (1.707),1280.0 MB/s ,6720.0 MB/s 
 3#Synthetic 50%     :  10000000 ->   4560720 (2.193),  75.3 MB/s ,3840.0 MB/s 
 4#Synthetic 50%     :  10000000 ->   4534386 (2.205),  61.0 MB/s ,3840.0 MB/s 
 5#Synthetic 50%     :  10000000 ->   4529772 (2.208),  55.7 MB/s ,3840.0 MB/s 
 6#Synthetic 50%     :  10000000 ->   4529581 (2.208),  55.7 MB/s ,3840.0 MB/s 
 7#Synthetic 50%     :  10000000 ->   4529569 (2.208),  54.5 MB/s ,3840.0 MB/s 
 8#Synthetic 50%     :  10000000 ->   4529565 (2.208),  55.7 MB/s ,3840.0 MB/s 
 9#Synthetic 50%     :  10000000 ->   4529565 (2.208),  55.7 MB/s ,3840.0 MB/s 

# zstd -b1 -e10 --fast=10 --priority=rt
*** zstd command line interface 64-bits v1.4.0, by Yann Collet ***
Note : switching to real-time priority 
-10#Synthetic 50%     :  10000000 ->   6110186 (1.637),1283.8 MB/s ,4428.0 MB/s 
-9#Synthetic 50%     :  10000000 ->   6046532 (1.654),1178.1 MB/s ,4265.9 MB/s 
-8#Synthetic 50%     :  10000000 ->   5974710 (1.674),1192.7 MB/s ,4083.0 MB/s 
-7#Synthetic 50%     :  10000000 ->   5900567 (1.695),1133.8 MB/s ,4017.0 MB/s 
-6#Synthetic 50%     :  10000000 ->   5832577 (1.715),1087.6 MB/s ,3882.2 MB/s 
-5#Synthetic 50%     :  10000000 ->   5739376 (1.742),1016.9 MB/s ,3716.1 MB/s 
-4#Synthetic 50%     :  10000000 ->   5637044 (1.774), 937.8 MB/s ,3505.5 MB/s 
-3#Synthetic 50%     :  10000000 ->   5538415 (1.806), 836.9 MB/s ,3248.9 MB/s 
-2#Synthetic 50%     :  10000000 ->   5424064 (1.844), 737.7 MB/s ,3050.1 MB/s 
-1#Synthetic 50%     :  10000000 ->   5381043 (1.858), 612.9 MB/s ,2959.6 MB/s 
 0#Synthetic 50%     :  10000000 ->   3222956 (3.103), 167.0 MB/s , 892.7 MB/s 
 1#Synthetic 50%     :  10000000 ->   3137011 (3.188), 430.8 MB/s ,1224.8 MB/s 
 2#Synthetic 50%     :  10000000 ->   3114376 (3.211), 317.0 MB/s ,1182.5 MB/s 
 3#Synthetic 50%     :  10000000 ->   3222956 (3.103), 166.7 MB/s , 888.6 MB/s 
 4#Synthetic 50%     :  10000000 ->   3339323 (2.995), 109.6 MB/s , 663.8 MB/s 
 5#Synthetic 50%     :  10000000 ->   3327445 (3.005),  50.6 MB/s , 703.0 MB/s 
 6#Synthetic 50%     :  10000000 ->   3331451 (3.002),  23.5 MB/s , 541.5 MB/s 
 7#Synthetic 50%     :  10000000 ->   3315227 (3.016),  38.2 MB/s , 715.9 MB/s 
 8#Synthetic 50%     :  10000000 ->   3306698 (3.024),  33.2 MB/s , 717.2 MB/s 
 9#Synthetic 50%     :  10000000 ->   3310350 (3.021),  24.4 MB/s , 707.7 MB/s 
10#Synthetic 50%     :  10000000 ->   3344322 (2.990),  17.4 MB/s , 663.1 MB/s

So it seems like like zstd-fast-1 might be a better substitute for LZ4 if you want to decompress at 2GB/sec, although the compression level is not as good. It will really depend on your workload, and exactly how compressible your data is.

As you can see, all of the zstd-fast-* levels are 3GB/sec or better decompression (per core). And even the high levels of zstd still maintain 700MB/sec per core for decompression. Not as impressive as LZ4's 6+ GB/sec, but compression levels of 3.0 vs 1.7 mean you might actually save time by reading less off the disk.

sempervictus · 2019-07-15T22:51:47Z

Not championing one vs the other - I've seen enough corner cases where "workload determines" over the years to be a firm believer in giving everyone enough rope/choice. Its dataset & descendant level property anyway.

allanjude · 2019-07-15T23:06:39Z

Not championing one vs the other - I've seen enough corner cases where "workload determines" over the years to be a firm believer in giving everyone enough rope/choice. Its dataset & descendant level property anyway.

It was one of the things that drew me to ZSTD for ZFS, giving the admin a choice between an array of compression levels from many GB/sec compression and decompress, down to slower than gzip but even better compression (although ZSTD maintains decompression speed even with its highest compression levels)

johnnyjacq16 · 2019-07-17T11:27:55Z

I am not sure if this pull request should be operational on linux, the older pull request #8941 is operational. With this pull request the zstd_zfs modules is not found hence cannot be loaded.

modinfo zfs
filename: /lib/modules/5.1.18-4-uksm/extra/zfs/zfs/zfs.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
description: ZFS
alias: devname:zfs
alias: char-major-10-249
srcversion: 4F872C88591BB3692E87278
depends: spl,znvpair,icp,zlua,zunicode,zcommon,zstd_zfs,zavl

modinfo zstd_zfs
modinfo: ERROR: Module zstd_zfs not found.

All other modules loaded:

modinfo spl
filename: /lib/modules/5.1.18-4-uksm/extra/spl/spl/spl.ko.xz
version: 0.8.0-1
license: GPL
author: OpenZFS on Linux
description: Solaris Porting Layer
srcversion: 738D9BB40B25892B95D18A8
depends:
retpoline: Y
name: spl

modinfo znvpair
filename: /lib/modules/5.1.18-4-uksm/extra/nvpair/znvpair/znvpair.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
description: Generic name/value pair implementation
srcversion: 1E8C1B92D74D8298998A5AD
depends: spl
retpoline: Y
name: znvpair

modinfo icp
filename: /lib/modules/5.1.18-4-uksm/extra/icp/icp/icp.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
srcversion: 521F98D25594D17E0227929
depends: spl
retpoline: Y
name: icp

modinfo zlua
filename: /lib/modules/5.1.18-4-uksm/extra/lua/zlua/zlua.ko.xz
version: 0.8.0-1
license: MIT
author: Lua.org
description: Lua Interpreter for ZFS
srcversion: 14B2EB141354468DBB3680B
depends:
retpoline: Y
name: zlua

modinfo zunicode
filename: /lib/modules/5.1.18-4-uksm/extra/unicode/zunicode/zunicode.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
description: Unicode implementation
srcversion: CB246334CE6CB1934133745
depends:
retpoline: Y
name: zunicode

modinfo zavl
filename: /lib/modules/5.1.18-4-uksm/extra/avl/avl/zavl.ko.xz
version: 0.8.0-1
license: CDDL
author: OpenZFS on Linux
description: Generic AVL tree implementation
srcversion: 98E85778E754CF75DEF9E8E
depends: spl
retpoline: Y
name: zavl

allanjude · 2019-07-17T16:01:11Z

I am not sure if this pull request should be operational on linux, the older pull request #8941 is operational. With this pull request the zstd_zfs modules is not found hence cannot be loaded.

Are there errors during the build? This branch is working for me on CentOS 7 following the instructions from the ZoL Wiki.

johnnyjacq16 · 2019-07-18T15:23:05Z

Hi @allanjude the problem still persist and there are no compilation errors that prevent zfs from compiling.
My configuration code to compile zfs looks like this:

./autogen.sh

./configure --prefix=/usr --sysconfdir=/etc --sbindir=/usr/bin --with-mounthelperdir=/usr/bin \
             --libdir=/usr/lib --datadir=/usr/share --includedir=/usr/include \
             --with-udevdir=/lib/udev --libexecdir=/usr/lib/zfs-zstd \
             --with-config=user --enable-systemd --enable-pyzfs

I tried using --with-config=srpm as used in the CentOS configuration shown below:

./autogen.sh

./configure --prefix=/usr --sysconfdir=/etc --sbindir=/usr/bin --with-mounthelperdir=/usr/bin \
                --libdir=/usr/lib --datadir=/usr/share --includedir=/usr/include \
                --with-udevdir=/lib/udev --libexecdir=/usr/lib/zfs-zstd \
                --with-config=srpm --with-config=user --enable-systemd --enable-pyzfs

however the zstd_zfs module for some reason doesn't get built when creating the dkms package.

allanjude · 2019-07-18T15:41:51Z

Hi @allanjude the problem still persist and there are no compilation errors that prevent zfs from compiling.

however the zstd_zfs module for some reason doesn't get built when creating the dkms package.

I've only tested with the ./scripts/zfs.sh script or using 'make install'. I've not tested with dkms or rpms, there may be entries I need to add to manifests or something.

c0d3z3r0 · 2019-07-19T22:23:19Z

zstd 1.4.1 got released few hours ago improving performance a bit and fixing some bugs https://github.com/facebook/zstd/releases/tag/v1.4.1

Conan-Kudo · 2019-07-23T18:55:48Z

@allanjude Could you please rebase your PR against current master? There's a conflict which makes the buildbot infra unhappy...

Conan-Kudo

Simple review based on a look through the code turned up a couple of questions...

Conan-Kudo · 2019-07-23T19:03:09Z

module/zstd_zfs/include/limits.h

@@ -0,0 +1,7 @@
+/* This file is in the public domain */
+/* $FreeBSD$ */


Do we really need these relatively useless $FreeBSD$ markers? They don't expand in this source tree...

These files were taken verbatim from FreeBSD. We can remove these lines if that makes sense.

Personally, I'd prefer them gone, as they look silly unexpanded.

Conan-Kudo · 2019-07-23T19:06:40Z

module/zstd_zfs/include/stddef.h

@@ -0,0 +1,3 @@
+/* This file is in the public domain */
+/* $FreeBSD$ */


Conan-Kudo · 2019-07-23T19:06:43Z

module/zstd_zfs/include/stdint.h

@@ -0,0 +1,3 @@
+/* This file is in the public domain */
+/* $FreeBSD$ */


Conan-Kudo · 2019-07-23T19:06:47Z

module/zstd_zfs/include/stdio.h

@@ -0,0 +1,3 @@
+/* This file is in the public domain */
+/* $FreeBSD$ */


Conan-Kudo · 2019-07-23T19:06:52Z

module/zstd_zfs/include/stdlib.h

+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$


Conan-Kudo · 2019-07-23T19:06:56Z

module/zstd_zfs/include/string.h

@@ -0,0 +1,3 @@
+/* This file is in the public domain */
+/* $FreeBSD$ */


Conan-Kudo · 2019-07-23T19:07:00Z

module/zstd_zfs/include/zstd_kfreebsd.c

+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$


Conan-Kudo · 2019-07-23T19:07:07Z

module/zstd_zfs/include/zstd_kfreebsd.h

+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$


Conan-Kudo · 2019-07-23T19:07:12Z

module/zstd_zfs/include/zstd_kmalloc.c

+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$


Conan-Kudo · 2019-07-23T19:10:53Z

module/zfs/dmu.c

@@ -2164,6 +2166,11 @@ dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp, zio_prop_t *zp)
 	} else {
 		compress = zio_compress_select(os->os_spa, dn->dn_compress,
 		    compress);
+#if 0


Why are we including a code block that is entirely disabled with #if 0? Do you feel this will get used at all at some point?

The question of if this block of code is needed is still outstanding. In the end it will either be enabled or removed.

Conan-Kudo · 2019-07-23T19:27:42Z

@allanjude

So this is a common failure I'm seeing:

make[7]: *** No rule to make target `/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs/zstd/common/zstd_common.o', needed by `/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs/zstd_zfs.o'.  Stop.
make[6]: *** [/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs] Error 2

This seems like there's a dependency problem here making parallel make fail... Have you tried testing locally with make -j6 or higher?

allanjude · 2019-07-24T17:09:22Z

@allanjude

So this is a common failure I'm seeing:
make[7]: *** No rule to make target `/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs/zstd/common/zstd_common.o', needed by `/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs/zstd_zfs.o'.  Stop.
make[6]: *** [/tmp/zfs-build-buildbot-a64JhwVJ/BUILD/zfs-0.8.0/module/zstd_zfs] Error 2
This seems like there's a dependency problem here making parallel make fail... Have you tried testing locally with make -j6 or higher?

I usually use -j8, but I can try some other numbers.

Can you see anything obvious in the Makefile that will cause this? The automake stuff is not my strong suit, we don't use it in the FreeBSD kernel. I don't see where to make the dependencies more explicit.

lnicola · 2019-07-24T17:34:07Z

@Conan-Kudo When the dependency tracking doesn't work, running make a second time usually makes progress. Do you see that on this branch?

c0d3z3r0 · 2019-07-26T19:31:50Z

lib/libzpool/Makefile.am

@@ -12,11 +16,15 @@ AM_CFLAGS += $(NO_UNUSED_BUT_SET_VARIABLE)
 # Includes kernel code generate warnings for large stack frames
 AM_CFLAGS += $(FRAME_LARGER_THAN)

-AM_CFLAGS += -DLIB_ZPOOL_BUILD
+AM_CFLAGS += -DLIB_ZPOOL_BUILD -Wframe-larger-than=20480 -Wno-uninitialized


-Wframe-larger-than=20480 should get removed here; see #8044 (comment)

We're keeping the pristine copy of libzstd, so this flag is still required.

only if you keep zstd as is. half of the code of zstd is unused. if you keep the unused code out. the whole line is obsolete. see my variant of the zstd patch

There's no reason for us to mangle the zstd dependency, and it would make it harder to upgrade it over time. Leaving it as is makes the maintenance burden very minor.

Full ACK. Libs should be kept vanilla whenever possible

@BrainSlayer I assume this should have gone here...

BrainSlayer commented 14 minutes ago
i would ack too. but there is a counter argument. the code you compile contains alot of unused code. its about 300 kb bigger than it should and the unused code cannot even be used due the high stack size usage. you compile basicly broken code which will crash any kernel if used.

Am 28.07.2019 um 10:14 schrieb Michael Niewöhner:… @.**** commented on this pull request. ------------------------------------------------------------------------ In lib/libzpool/Makefile.am <#9024 (comment)>: > @@ -12,11 +16,15 @@ AM_CFLAGS += $(NO_UNUSED_BUT_SET_VARIABLE) # Includes kernel code generate warnings for large stack frames AM_CFLAGS += $(FRAME_LARGER_THAN) -AM_CFLAGS += -DLIB_ZPOOL_BUILD +AM_CFLAGS += -DLIB_ZPOOL_BUILD -Wframe-larger-than=20480 -Wno-uninitialized Full ACK. Libs should be kept vanilla whenever possible — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9024?email_source=notifications&email_token=AB2WNE5YMXU2NJC46J7Y5C3QBVIP5A5CNFSM4IB75NFKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB7Y2PGA#discussion_r307990857>, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WNEZEQML3MMAYX5R6YJ3QBVIP5ANCNFSM4IB75NFA.
--

c0d3z3r0 · 2019-07-26T19:56:29Z

module/Makefile.in

@@ -6,6 +6,7 @@ subdir-m += spl
 subdir-m += unicode
 subdir-m += zcommon
 subdir-m += zfs
+subdir-m += zstd_zfs


shouldn't this go before zfs?

You're correct, it should.

I am willing to bet this is the one that is causing the concurrency issues...

@allanjude pretty sure, yes

and now you've got that error in the tests as well... looks like zstd_zfs must go before zcommon

Even with zstd_zfs before zcommon, same build failure. I cannot recreate this on a completely fresh build, but somehow can get it to happen when rebuilding incrementally later.

c0d3z3r0 · 2019-07-26T19:58:00Z

@allanjude DKMS config entries are missing:

diff --git a/scripts/dkms.mkconf b/scripts/dkms.mkconf
index e1a49dca1..85fbb7c58 100755
--- a/scripts/dkms.mkconf
+++ b/scripts/dkms.mkconf
@@ -89,6 +89,7 @@ STRIP[4]="\${STRIP[0]}"
 STRIP[5]="\${STRIP[0]}"
 STRIP[6]="\${STRIP[0]}"
 STRIP[7]="\${STRIP[0]}"
+STRIP[8]="\${STRIP[0]}"
 BUILT_MODULE_NAME[0]="zavl"
 BUILT_MODULE_LOCATION[0]="module/avl/"
 DEST_MODULE_LOCATION[0]="/extra/avl/avl"
@@ -113,4 +114,7 @@ DEST_MODULE_LOCATION[6]="/extra/lua/zlua"
 BUILT_MODULE_NAME[7]="spl"
 BUILT_MODULE_LOCATION[7]="module/spl/"
 DEST_MODULE_LOCATION[7]="/extra/spl/spl"
+BUILT_MODULE_NAME[8]="zstd_zfs"
+BUILT_MODULE_LOCATION[8]="module/zstd_zfs/"
+DEST_MODULE_LOCATION[8]="/extra/zstd_zfs/zstd_zfs"
 EOF

BrainSlayer · 2019-07-28T12:48:57Z

i would ack too. but there is a counter argument. the code you compile contains alot of unused code. its about 300 kb bigger than it should and the unused code cannot even be used due the high stack size usage. you compile basicly broken code which will crash any kernel if used. Am 28.07.2019 um 10:14 schrieb Michael Niewöhner:

…

***@***.**** commented on this pull request. ------------------------------------------------------------------------ In lib/libzpool/Makefile.am <#9024 (comment)>: > @@ -12,11 +16,15 @@ AM_CFLAGS += $(NO_UNUSED_BUT_SET_VARIABLE) # Includes kernel code generate warnings for large stack frames AM_CFLAGS += $(FRAME_LARGER_THAN) -AM_CFLAGS += -DLIB_ZPOOL_BUILD +AM_CFLAGS += -DLIB_ZPOOL_BUILD -Wframe-larger-than=20480 -Wno-uninitialized Full ACK. Libs should be kept vanilla whenever possible — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9024>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB2WNEZEQML3MMAYX5R6YJ3QBVIP5ANCNFSM4IB75NFA>.

Extend APIs that deal with compression to also pass the compression level Signed-off-by: Allan Jude <allanjude@freebsd.org>

Extend APIs that deals with compression to also pass the compression level. Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

If a compression algorithm requires a feature flag, activate the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Sponsored-by: The FreeBSD Foundation Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

This imports the unmodified ZStandard single-file library which will be used by ZFS. This code shall not be modified in any way to keep it easily updatable. The library is excluded from codecov calculation as dependencies don't need full codecov. It's also added to the cppcheck suppressions list. Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

This adds headers to be able to: a) build ZFS with ZSTD on various platforms/architectures (Linux, FreeBSD, aarch64) b) build ZFS with ZSTD for userspace without modification of the original ZSTD library. Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

This adds dummy defines for `__init`, `__exit` to be able to compile kernel code for userspace. Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

Signed-off-by: Allan Jude <allanjude@freebsd.org>

Add zstd support to zfs ZStandard is a modern, high performance, general compression algorithm originally developed by Yann Collet (the author of LZ4). It aims to provide similar or better compression levels to GZIP, but with much better performance. ZStandard provides a large selection of compression levels to allow a storage administrator to select the preferred performance/compression trade-off. This PR adds the new compression types: zstd and zstd-fast, which is basically a faster "negative" level of zstd. Available compression levels for zstd are zstd-1 through zstd-19. In addition, faster levels supported by zstd-fast are 1-10, 20-100 (in increments of 10), 500 and 1000. As zstd-fast is basically a "negative" level of zstd, it gets faster with every level in stark contrast with "normal" zstd which gets slower with every level. Rather than the approach used when gzip was added, of using a separate compression type for each different level, this PR added a new hidden property, compression_level. All of the zstd levels use the single entry in the zio_compress enum, since all levels use the same decompression function. However, in some cases, it was necessary to know what level a block was compressed within other parts of the code, so the compression level is stored on disk inline with the data, after the compression header (uncompressed size). The level is plumbed through arc_buf_hdr_t, objset_t, and zio_prop_t. There may still be additional places where this needs to be added. This new property is hidden from the user, and the user simply does zfs set compress=zstd-10 dataset and zfs_ioctl.c splits this single value into the two values to be stored in the dataset properties. This logic has been extended to Channel Programs as well. Closes: openzfs#6247 Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

This add a new build intermediary `libzstd`, that allows us to keep zstd cleanly separated from libzpool and to override the frame-size compiler option for zstd only. (This is needed due to unused code in the intentionally unmodified zstd source.) Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

This will allow us to measure the number of times compression is skipped due to a memory allocation failure, as well as the number of times that we have used the fallback decompression context. It also counts all other failure types: - (de)compression level invalid - (de)compression memory allocation failed - (de)compression failed (zstd error) - Failed to decode zstd_header for decompression (invalid size) Sponsored By: The FreeBSD Foundation Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

- Add zstd to the test suite - Add zstd to history_002_pos.ksh - Add random levels of zstd to history_002_pos.ksh - Add zstd-fast to history_002_pos.ksh - Add random levels of zstd-fast to history_002_pos.ksh - Add documentation - Add man page content - Add README for module/zstd - Add missing persons to AUTHORS Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

zdb can now decode the ZSTD compression header and inspect the size, version and compression level saved in that header. This will be used to create a test that will ensure this information is saved and restored correctly. Example usage: ``` obj=2 dataset=zof/zstd/10 path=/allanbsd.txt type=19 bonustype=44 Object lvl iblk dblk dsize dnsize lsize %full type 2 1 128K 8.50K 0 512 8.50K 100.00 ZFS plain file (K=inherit) (Z=inherit) 168 bonus System attributes dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED dnode maxblkid: 0 uid 0 gid 0 atime Sat Jun 13 16:03:26 2020 mtime Sat Jun 13 16:03:26 2020 ctime Sat Jun 13 16:03:26 2020 crtime Sat Jun 13 16:03:26 2020 gen 3046 mode 100644 size 8193 parent 34 links 1 pflags 40800000004 Indirect blocks: 0 L0 EMBEDDED et=0 2200L/22P B=3046 ZSTD:size=26:version=10405:level=10:EMBEDDED segment [0000000000000000, 0000000000002200) size 8.50K ``` For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

Restore previous output format for zdb Only show the dnode compression details if they are interesting, or if the verbosity level is high enough This solves an issue with the cli_root/zdb/zdb_object_range_pos test Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

1. Add a test case for the ZSTD header: Ensure the actual psize and level are saved and restored correctly. 2. Add a test that compression survives replication 3. Add a new test to confirm that the ZSTD feature flag is activated. When the compression property on a dataset is set to zstd, the feature flag is not changed from 'enabled' to 'active' until the first block is born. If the pool is exported after `zfs set` but before a block is written then imported on a system with an older version of ZFS that does not understand ZSTD, it will cause the userland utilities to panic. This also moves the test data to a common location as it is used by multiple tests. Sponsored-by: The FreeBSD Foundation Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

Sponsored-by: The FreeBSD Foundation Signed-off-by: Allan Jude <allanjude@freebsd.org>

When reading compressed blocks from the L2ARC, with compressed ARC disabled, arc_hdr_size() returns LSIZE rather than PSIZE, but the actual read is PSIZE. This causes l2arc_read_done() to compare the checksum against the wrong size, resulting in checksum failure. This manifests as an increase in the kstat l2_cksum_bad and the read being retried from the main pool, making the L2ARC ineffective. Add new L2ARC tests with Compressed ARC enabled/disabled Blocks are handled differently depending on the state of the zfs_compressed_arc_enabled tunable. If a block is compressed on-disk, and compressed_arc is enabled: - the block is read from disk - It is NOT decompressed - It is added to the ARC in its compressed form - l2arc_write_buffers() may write it to the L2ARC (as is) - l2arc_read_done() compares the checksum to the BP (compressed) However, if compressed_arc is disabled: - the block is read from disk - It is decompressed - It is added to the ARC (uncompressed) - l2arc_write_buffers() will use l2arc_apply_transforms() to recompress the block, before writing it to the L2ARC - l2arc_read_done() compares the checksum to the BP (compressed) - l2arc_read_done() will use l2arc_untransform() to uncompress it This test writes out a test file to a pool consisting of one disk and one cache device, then randomly reads from it. Since the arc_max in the tests is low, this will feed the L2ARC, and result in reads from the L2ARC. We compare the value of the kstat l2_cksum_bad before and after to determine if any blocks failed to survive the trip through the L2ARC. Sponsored-by: The FreeBSD Foundation Signed-off-by: Allan Jude <allanjude@freebsd.org>

Signed-off-by: Allan Jude <allanjude@freebsd.org>

PrivatePuffin · 2020-08-14T10:37:32Z

I'm an idiot and misread. Please ignore.

This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Closes: openzfs#6247 Closes: openzfs#10277 Closes: openzfs#9024 Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>

This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes openzfs#6247 Closes openzfs#9024 Closes openzfs#10277 Closes openzfs#10278

allanjude mentioned this pull request Jul 12, 2019

Support zstd compression (port of Allan Judes patch from FreeBSD) #8941

Closed

12 tasks

behlendorf added the Status: Work in Progress Not yet ready for general review label Jul 12, 2019

lnicola mentioned this pull request Jul 21, 2019

[WIP] gzip compression heuristic #7481

Closed

13 tasks

ahrens added Status: Design Review Needed Architecture or design is under discussion and removed Status: Work in Progress Not yet ready for general review labels Jul 23, 2019

allanjude force-pushed the org.freebsd_zstd_compress branch from 9158460 to 90a0ebe Compare July 23, 2019 18:57

Conan-Kudo suggested changes Jul 23, 2019

View reviewed changes

allanjude force-pushed the org.freebsd_zstd_compress branch from 90a0ebe to 3f7f843 Compare July 23, 2019 19:14

c0d3z3r0 reviewed Jul 26, 2019

View reviewed changes

allanjude and others added 6 commits August 13, 2020 04:52

Track compression level through dataset, zio, and arc header fields

85e37b8

Extend APIs that deal with compression to also pass the compression level Signed-off-by: Allan Jude <allanjude@freebsd.org>

allanjude force-pushed the org.freebsd_zstd_compress branch from c48a095 to aa4dcc7 Compare August 13, 2020 17:06

allanjude and others added 13 commits August 13, 2020 18:50

Add ZStandard compression support

e48bc2b

Signed-off-by: Allan Jude <allanjude@freebsd.org>

Don't skip encryption stage because of insufficient compression gains

5c489af

Sponsored-by: The FreeBSD Foundation Signed-off-by: Allan Jude <allanjude@freebsd.org>

Add new tests for encrypted L2ARC w/ and w/o Compressed ARC

a3635ee

Sponsored-by: The FreeBSD Foundation Signed-off-by: Allan Jude <allanjude@freebsd.org>

Disable an unused function to pass the compile tests

f2f2f68

Signed-off-by: Allan Jude <allanjude@freebsd.org>

Cleanup

50fb4c3

Signed-off-by: Allan Jude <allanjude@freebsd.org>

allanjude force-pushed the org.freebsd_zstd_compress branch from aa4dcc7 to 50fb4c3 Compare August 13, 2020 18:51

[squash me] Add tests to makefile

d1e89a7

Signed-off-by: Allan Jude <allanjude@freebsd.org>

ghost added Status: Work in Progress Not yet ready for general review and removed Status: Design Review Needed Architecture or design is under discussion labels Aug 14, 2020

behlendorf closed this in 10b3c7f Aug 20, 2020

		@@ -0,0 +1,7 @@
		/* This file is in the public domain */
		/* $FreeBSD$ */

		@@ -0,0 +1,3 @@
		/* This file is in the public domain */
		/* $FreeBSD$ */

Implement ZStandard Compression #9024

Implement ZStandard Compression #9024

Conversation

allanjude commented Jul 12, 2019 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

allanjude commented Jul 12, 2019

sempervictus commented Jul 14, 2019

allanjude commented Jul 14, 2019

gmelikov commented Jul 14, 2019

allanjude commented Jul 14, 2019

sempervictus commented Jul 15, 2019

allanjude commented Jul 15, 2019 • edited Loading

sempervictus commented Jul 15, 2019

allanjude commented Jul 15, 2019

johnnyjacq16 commented Jul 17, 2019

allanjude commented Jul 17, 2019

johnnyjacq16 commented Jul 18, 2019

allanjude commented Jul 18, 2019

c0d3z3r0 commented Jul 19, 2019

Conan-Kudo commented Jul 23, 2019

Conan-Kudo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Conan-Kudo commented Jul 23, 2019

allanjude commented Jul 24, 2019

lnicola commented Jul 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c0d3z3r0 Jul 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c0d3z3r0 Jul 28, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c0d3z3r0 commented Jul 26, 2019

BrainSlayer commented Jul 28, 2019 via email

PrivatePuffin commented Aug 14, 2020 • edited Loading

allanjude commented Jul 12, 2019 •

edited

Loading

allanjude commented Jul 15, 2019 •

edited

Loading

c0d3z3r0 Jul 26, 2019 •

edited

Loading

c0d3z3r0 Jul 28, 2019 •

edited

Loading

PrivatePuffin commented Aug 14, 2020 •

edited

Loading