-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ZStandard Compression #9024
Conversation
I need to investigate a possible issue with property inheritance. |
Seems like the same approach would work for the lz4hc work proposed a whole back. Thank you |
Yes, I changed the design to be more generic last year after some consideration. So I was thinking: add a 'new lz4' or 'lz4-2019' or something. Also, does LZ4HC use the same decompression function? is it one pass or do you have to unwind all of the passes? It might also be worth looking at if LZ4HC is still useful, compared to lower levels of zstd and zstd-fast. |
If you're already create new (but hidden) property - I think it's a good idea to have a |
I need to investigate how much trouble having the separate properties is (specifically for inheritance when you rename to a new parent etc) before deciding if we should do additional extra hidden properties, or just use the high bits of the 'compress_level' property to store the version. Note: in general, you can always use the newest decompressor, even on data compressed with the older version, so we likely only need to keep the different versions of the compression function around, for use-cases like nop-write where we need to recompress data and get the same checksum. |
IIRC, LZ4HC does use the same decompressor, and it has some very very fast decompression rates - just shy of 2G/s according to http://fastcompression.blogspot.com/2015/01/zstd-stronger-compression-algorithm.html... So for WORM patterns, this is quite desirable. |
Some raw compression benchmarks from my workstation (these are not in a ZFS context, just general comparative information):
So it seems like like zstd-fast-1 might be a better substitute for LZ4 if you want to decompress at 2GB/sec, although the compression level is not as good. It will really depend on your workload, and exactly how compressible your data is. As you can see, all of the zstd-fast-* levels are 3GB/sec or better decompression (per core). And even the high levels of zstd still maintain 700MB/sec per core for decompression. Not as impressive as LZ4's 6+ GB/sec, but compression levels of 3.0 vs 1.7 mean you might actually save time by reading less off the disk. |
Not championing one vs the other - I've seen enough corner cases where "workload determines" over the years to be a firm believer in giving everyone enough rope/choice. Its dataset & descendant level property anyway. |
It was one of the things that drew me to ZSTD for ZFS, giving the admin a choice between an array of compression levels from many GB/sec compression and decompress, down to slower than gzip but even better compression (although ZSTD maintains decompression speed even with its highest compression levels) |
I am not sure if this pull request should be operational on linux, the older pull request #8941 is operational. With this pull request the zstd_zfs modules is not found hence cannot be loaded.
All other modules loaded:
|
Are there errors during the build? This branch is working for me on CentOS 7 following the instructions from the ZoL Wiki. |
Hi @allanjude the problem still persist and there are no compilation errors that prevent zfs from compiling.
I tried using
however the zstd_zfs module for some reason doesn't get built when creating the dkms package. |
I've only tested with the ./scripts/zfs.sh script or using 'make install'. I've not tested with dkms or rpms, there may be entries I need to add to manifests or something. |
zstd 1.4.1 got released few hours ago improving performance a bit and fixing some bugs https://github.com/facebook/zstd/releases/tag/v1.4.1 |
@allanjude Could you please rebase your PR against current master? There's a conflict which makes the buildbot infra unhappy... |
9158460
to
90a0ebe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple review based on a look through the code turned up a couple of questions...
module/zstd_zfs/include/limits.h
Outdated
@@ -0,0 +1,7 @@ | |||
/* This file is in the public domain */ | |||
/* $FreeBSD$ */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need these relatively useless $FreeBSD$
markers? They don't expand in this source tree...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These files were taken verbatim from FreeBSD. We can remove these lines if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd prefer them gone, as they look silly unexpanded.
module/zstd_zfs/include/stddef.h
Outdated
@@ -0,0 +1,3 @@ | |||
/* This file is in the public domain */ | |||
/* $FreeBSD$ */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
module/zstd_zfs/include/stdint.h
Outdated
@@ -0,0 +1,3 @@ | |||
/* This file is in the public domain */ | |||
/* $FreeBSD$ */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
module/zstd_zfs/include/stdio.h
Outdated
@@ -0,0 +1,3 @@ | |||
/* This file is in the public domain */ | |||
/* $FreeBSD$ */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
module/zstd_zfs/include/stdlib.h
Outdated
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | ||
* SUCH DAMAGE. | ||
* | ||
* $FreeBSD$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
module/zstd_zfs/include/string.h
Outdated
@@ -0,0 +1,3 @@ | |||
/* This file is in the public domain */ | |||
/* $FreeBSD$ */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | ||
* SUCH DAMAGE. | ||
* | ||
* $FreeBSD$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | ||
* SUCH DAMAGE. | ||
* | ||
* $FreeBSD$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | ||
* SUCH DAMAGE. | ||
* | ||
* $FreeBSD$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
module/zfs/dmu.c
Outdated
@@ -2164,6 +2166,11 @@ dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp, zio_prop_t *zp) | |||
} else { | |||
compress = zio_compress_select(os->os_spa, dn->dn_compress, | |||
compress); | |||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we including a code block that is entirely disabled with #if 0
? Do you feel this will get used at all at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question of if this block of code is needed is still outstanding. In the end it will either be enabled or removed.
90a0ebe
to
3f7f843
Compare
So this is a common failure I'm seeing:
This seems like there's a dependency problem here making parallel make fail... Have you tried testing locally with |
I usually use -j8, but I can try some other numbers. Can you see anything obvious in the Makefile that will cause this? The automake stuff is not my strong suit, we don't use it in the FreeBSD kernel. I don't see where to make the dependencies more explicit. |
@Conan-Kudo When the dependency tracking doesn't work, running |
lib/libzpool/Makefile.am
Outdated
@@ -12,11 +16,15 @@ AM_CFLAGS += $(NO_UNUSED_BUT_SET_VARIABLE) | |||
# Includes kernel code generate warnings for large stack frames | |||
AM_CFLAGS += $(FRAME_LARGER_THAN) | |||
|
|||
AM_CFLAGS += -DLIB_ZPOOL_BUILD | |||
AM_CFLAGS += -DLIB_ZPOOL_BUILD -Wframe-larger-than=20480 -Wno-uninitialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-Wframe-larger-than=20480
should get removed here; see #8044 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're keeping the pristine copy of libzstd, so this flag is still required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only if you keep zstd as is. half of the code of zstd is unused. if you keep the unused code out. the whole line is obsolete. see my variant of the zstd patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason for us to mangle the zstd dependency, and it would make it harder to upgrade it over time. Leaving it as is makes the maintenance burden very minor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full ACK. Libs should be kept vanilla whenever possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BrainSlayer I assume this should have gone here...
BrainSlayer commented 14 minutes ago
i would ack too. but there is a counter argument. the code you compile contains alot of unused code. its about 300 kb bigger than it should and the unused code cannot even be used due the high stack size usage. you compile basicly broken code which will crash any kernel if used.
Am 28.07.2019 um 10:14 schrieb Michael Niewöhner:… @.**** commented on this pull request. ------------------------------------------------------------------------ In lib/libzpool/Makefile.am <#9024 (comment)>: > @@ -12,11 +16,15 @@ AM_CFLAGS +=
$(NO_UNUSED_BUT_SET_VARIABLE) # Includes kernel code generate warnings for large stack frames AM_CFLAGS += $ (FRAME_LARGER_THAN) -AM_CFLAGS += -DLIB_ZPOOL_BUILD +AM_CFLAGS += -DLIB_ZPOOL_BUILD -Wframe-larger-than=20480 -Wno-uninitialized Full ACK. Libs should be kept vanilla whenever possible — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9024?email_source=notifications&email_token=AB2WNE5YMXU2NJC46J7Y5C3QBVIP5A5CNFSM4IB75NFKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB7Y2PGA#discussion_r307990857>, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WNEZEQML3MMAYX5R6YJ3QBVIP5ANCNFSM4IB75NFA.
--
module/Makefile.in
Outdated
@@ -6,6 +6,7 @@ subdir-m += spl | |||
subdir-m += unicode | |||
subdir-m += zcommon | |||
subdir-m += zfs | |||
subdir-m += zstd_zfs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this go before zfs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am willing to bet this is the one that is causing the concurrency issues...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@allanjude pretty sure, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and now you've got that error in the tests as well... looks like zstd_zfs must go before zcommon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with zstd_zfs before zcommon, same build failure. I cannot recreate this on a completely fresh build, but somehow can get it to happen when rebuilding incrementally later.
@allanjude DKMS config entries are missing: diff --git a/scripts/dkms.mkconf b/scripts/dkms.mkconf
index e1a49dca1..85fbb7c58 100755
--- a/scripts/dkms.mkconf
+++ b/scripts/dkms.mkconf
@@ -89,6 +89,7 @@ STRIP[4]="\${STRIP[0]}"
STRIP[5]="\${STRIP[0]}"
STRIP[6]="\${STRIP[0]}"
STRIP[7]="\${STRIP[0]}"
+STRIP[8]="\${STRIP[0]}"
BUILT_MODULE_NAME[0]="zavl"
BUILT_MODULE_LOCATION[0]="module/avl/"
DEST_MODULE_LOCATION[0]="/extra/avl/avl"
@@ -113,4 +114,7 @@ DEST_MODULE_LOCATION[6]="/extra/lua/zlua"
BUILT_MODULE_NAME[7]="spl"
BUILT_MODULE_LOCATION[7]="module/spl/"
DEST_MODULE_LOCATION[7]="/extra/spl/spl"
+BUILT_MODULE_NAME[8]="zstd_zfs"
+BUILT_MODULE_LOCATION[8]="module/zstd_zfs/"
+DEST_MODULE_LOCATION[8]="/extra/zstd_zfs/zstd_zfs"
EOF |
i would ack too. but there is a counter argument. the code you compile
contains alot of unused code. its about 300 kb bigger than it should
and the unused code cannot even be used due the high stack size usage.
you compile basicly broken code which will crash any kernel if used.
Am 28.07.2019 um 10:14 schrieb Michael Niewöhner:
…
***@***.**** commented on this pull request.
------------------------------------------------------------------------
In lib/libzpool/Makefile.am
<#9024 (comment)>:
> @@ -12,11 +16,15 @@ AM_CFLAGS += $(NO_UNUSED_BUT_SET_VARIABLE)
# Includes kernel code generate warnings for large stack frames
AM_CFLAGS += $(FRAME_LARGER_THAN)
-AM_CFLAGS += -DLIB_ZPOOL_BUILD
+AM_CFLAGS += -DLIB_ZPOOL_BUILD -Wframe-larger-than=20480 -Wno-uninitialized
Full ACK. Libs should be kept vanilla whenever possible
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#9024>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2WNEZEQML3MMAYX5R6YJ3QBVIP5ANCNFSM4IB75NFA>.
|
Extend APIs that deal with compression to also pass the compression level Signed-off-by: Allan Jude <allanjude@freebsd.org>
Extend APIs that deals with compression to also pass the compression level. Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
If a compression algorithm requires a feature flag, activate the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Sponsored-by: The FreeBSD Foundation Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This imports the unmodified ZStandard single-file library which will be used by ZFS. This code shall not be modified in any way to keep it easily updatable. The library is excluded from codecov calculation as dependencies don't need full codecov. It's also added to the cppcheck suppressions list. Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This adds headers to be able to: a) build ZFS with ZSTD on various platforms/architectures (Linux, FreeBSD, aarch64) b) build ZFS with ZSTD for userspace without modification of the original ZSTD library. Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This adds dummy defines for `__init`, `__exit` to be able to compile kernel code for userspace. Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
c48a095
to
aa4dcc7
Compare
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Add zstd support to zfs ZStandard is a modern, high performance, general compression algorithm originally developed by Yann Collet (the author of LZ4). It aims to provide similar or better compression levels to GZIP, but with much better performance. ZStandard provides a large selection of compression levels to allow a storage administrator to select the preferred performance/compression trade-off. This PR adds the new compression types: zstd and zstd-fast, which is basically a faster "negative" level of zstd. Available compression levels for zstd are zstd-1 through zstd-19. In addition, faster levels supported by zstd-fast are 1-10, 20-100 (in increments of 10), 500 and 1000. As zstd-fast is basically a "negative" level of zstd, it gets faster with every level in stark contrast with "normal" zstd which gets slower with every level. Rather than the approach used when gzip was added, of using a separate compression type for each different level, this PR added a new hidden property, compression_level. All of the zstd levels use the single entry in the zio_compress enum, since all levels use the same decompression function. However, in some cases, it was necessary to know what level a block was compressed within other parts of the code, so the compression level is stored on disk inline with the data, after the compression header (uncompressed size). The level is plumbed through arc_buf_hdr_t, objset_t, and zio_prop_t. There may still be additional places where this needs to be added. This new property is hidden from the user, and the user simply does zfs set compress=zstd-10 dataset and zfs_ioctl.c splits this single value into the two values to be stored in the dataset properties. This logic has been extended to Channel Programs as well. Closes: openzfs#6247 Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This add a new build intermediary `libzstd`, that allows us to keep zstd cleanly separated from libzpool and to override the frame-size compiler option for zstd only. (This is needed due to unused code in the intentionally unmodified zstd source.) Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This will allow us to measure the number of times compression is skipped due to a memory allocation failure, as well as the number of times that we have used the fallback decompression context. It also counts all other failure types: - (de)compression level invalid - (de)compression memory allocation failed - (de)compression failed (zstd error) - Failed to decode zstd_header for decompression (invalid size) Sponsored By: The FreeBSD Foundation Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
- Add zstd to the test suite - Add zstd to history_002_pos.ksh - Add random levels of zstd to history_002_pos.ksh - Add zstd-fast to history_002_pos.ksh - Add random levels of zstd-fast to history_002_pos.ksh - Add documentation - Add man page content - Add README for module/zstd - Add missing persons to AUTHORS Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
zdb can now decode the ZSTD compression header and inspect the size, version and compression level saved in that header. This will be used to create a test that will ensure this information is saved and restored correctly. Example usage: ``` obj=2 dataset=zof/zstd/10 path=/allanbsd.txt type=19 bonustype=44 Object lvl iblk dblk dsize dnsize lsize %full type 2 1 128K 8.50K 0 512 8.50K 100.00 ZFS plain file (K=inherit) (Z=inherit) 168 bonus System attributes dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED dnode maxblkid: 0 uid 0 gid 0 atime Sat Jun 13 16:03:26 2020 mtime Sat Jun 13 16:03:26 2020 ctime Sat Jun 13 16:03:26 2020 crtime Sat Jun 13 16:03:26 2020 gen 3046 mode 100644 size 8193 parent 34 links 1 pflags 40800000004 Indirect blocks: 0 L0 EMBEDDED et=0 2200L/22P B=3046 ZSTD:size=26:version=10405:level=10:EMBEDDED segment [0000000000000000, 0000000000002200) size 8.50K ``` For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Restore previous output format for zdb Only show the dnode compression details if they are interesting, or if the verbosity level is high enough This solves an issue with the cli_root/zdb/zdb_object_range_pos test Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
1. Add a test case for the ZSTD header: Ensure the actual psize and level are saved and restored correctly. 2. Add a test that compression survives replication 3. Add a new test to confirm that the ZSTD feature flag is activated. When the compression property on a dataset is set to zstd, the feature flag is not changed from 'enabled' to 'active' until the first block is born. If the pool is exported after `zfs set` but before a block is written then imported on a system with an older version of ZFS that does not understand ZSTD, it will cause the userland utilities to panic. This also moves the test data to a common location as it is used by multiple tests. Sponsored-by: The FreeBSD Foundation Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Sponsored-by: The FreeBSD Foundation Signed-off-by: Allan Jude <allanjude@freebsd.org>
Sponsored-by: The FreeBSD Foundation Signed-off-by: Allan Jude <allanjude@freebsd.org>
When reading compressed blocks from the L2ARC, with compressed ARC disabled, arc_hdr_size() returns LSIZE rather than PSIZE, but the actual read is PSIZE. This causes l2arc_read_done() to compare the checksum against the wrong size, resulting in checksum failure. This manifests as an increase in the kstat l2_cksum_bad and the read being retried from the main pool, making the L2ARC ineffective. Add new L2ARC tests with Compressed ARC enabled/disabled Blocks are handled differently depending on the state of the zfs_compressed_arc_enabled tunable. If a block is compressed on-disk, and compressed_arc is enabled: - the block is read from disk - It is NOT decompressed - It is added to the ARC in its compressed form - l2arc_write_buffers() may write it to the L2ARC (as is) - l2arc_read_done() compares the checksum to the BP (compressed) However, if compressed_arc is disabled: - the block is read from disk - It is decompressed - It is added to the ARC (uncompressed) - l2arc_write_buffers() will use l2arc_apply_transforms() to recompress the block, before writing it to the L2ARC - l2arc_read_done() compares the checksum to the BP (compressed) - l2arc_read_done() will use l2arc_untransform() to uncompress it This test writes out a test file to a pool consisting of one disk and one cache device, then randomly reads from it. Since the arc_max in the tests is low, this will feed the L2ARC, and result in reads from the L2ARC. We compare the value of the kstat l2_cksum_bad before and after to determine if any blocks failed to survive the trip through the L2ARC. Sponsored-by: The FreeBSD Foundation Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
aa4dcc7
to
50fb4c3
Compare
Signed-off-by: Allan Jude <allanjude@freebsd.org>
|
This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Closes: openzfs#6247 Closes: openzfs#10277 Closes: openzfs#9024 Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Closes: openzfs#6247 Closes: openzfs#10277 Closes: openzfs#9024 Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes openzfs#6247 Closes openzfs#9024 Closes openzfs#10277 Closes openzfs#10278
This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes openzfs#6247 Closes openzfs#9024 Closes openzfs#10277 Closes openzfs#10278
Warning: The on-disk layout has not been finalized yet. Please only use this on test pools, otherwise you may not be able to import your pool with the final version.
Note: This pull request has been broken into a few separate commits, mostly for ease of review (separate out the mechanical changes from the substantive ones). I can reformat this as separate pull requests if desired. We can decide if it makes sense to squash them after review.
Motivation and Context
ZStandard is a modern, high performance, general compression algorithm originally developed by Yann Collet (the author of LZ4). It aims to provide similar or better compression levels to GZIP, but with much better performance. ZStandard provides a large selection of compression levels to allow a storage administrator to select the preferred performance/compression trade-off.
Description
Add the new compression type, zstd, which defaults to compression level 3.
Available compression levels are zstd-1 through zstd-19 (the extreme levels were excluded due to their memory requirements. They are not useful with ZFS block sizes). In addition, ZStandard supports faster levels of compression, implemented as zstd-fast-. Supported levels are 1-10, 20-100 in steps of 10, 500, and 1000.
Rather than the approached used when gzip was added, of using a separate compression type for each different level, I instead added a new hidden property, compression_level. All of the zstd levels use the single entry in the zio_compress enum, since all levels use the same decompression function. However, in some cases, it was necessary to know what level a block was compressed with in other parts of the code, so the compression level is stored on disk inline with the data, after the compression header (uncompressed size). The level is plumbed through arc_buf_hdr_t, objset_t, and zio_prop_t. There may be additional places where this needs to be added still.
This new property is hidden from the user, and the user simply does
zfs set compress=zstd-10 dataset
and zfs_ioctl.c splits this single value into the two values to be stored in the dataset properties. This logic has been extended to Channel Programs as well.How Has This Been Tested?
This has been tested mostly on FreeBSD, although some Linux users have tested it via a port of an earlier version of this work (Pull Requests #8044 and #8941)
Types of changes
Checklist:
Signed-off-by
.