-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ZSTD compression to ZFS #10278
Conversation
@ahrens @behlendorf next try, but be warned. I won't maintin this another year just to finally give up because the promised review are not coming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worked hard on testing this. I can say we explored many forms of performance and stability testing and have yet to find any instability in this PR.
Additions from @BrainSlayer are merged nice and cleanly.
I'd say this looks pretty good now. I will squash the fixes when all tests finally succeed |
CentOS 7 looks like a unrelated, random crash |
@BrainSlayer have a special look at |
yes, but the modification is not neccessary. l2arc uses lz4 only or its uncompressed. the case for zstd doesnt exist. i discussed with @gamanakis in the past since i stumbled accross the same case, he already made a patch for zstd in his tree which i merged. |
ah, thanks! will fix it |
perfect. all running now. i will keep my own git up to date with compatiblity fixes for upstream. so you dont need to take care much about maintainance. |
8f71e1e
to
4cd7cf5
Compare
I just uploaded a little reworked version (squashing the latests fixes in the previous commits and moving some diffs to the correct commits, that where simply in wrong order).
Great, thank you! I will merge that stuff the from time to time to this PR. You should rename #10277 then to something like "[DO NOT MERGE] maintenance and testing PR for 10278". |
4cd7cf5
to
9107ec0
Compare
will do |
This comment has been minimized.
This comment has been minimized.
Thanks for rewiving this PR, much needed change for vide adoption of ZFS to mass storage.
Testsuite runs on Ubuntu 18.04, logs are located here: https://github.com/andrey42/zfs-debian/runs/639344894?check_suite_focus=true |
You can just clone/download this PR, which is already up-to-date with Master. |
Thansk for the idea, tried that too: removed all patches and cloned from this PR source repo, still the same error: https://github.com/andrey42/zfs-debian/runs/639471894?check_suite_focus=true
|
i can just assume that your tree is broken and not identical with ours. i'm running this version here on a productive system |
but one hint. this problem occures if your zfs module version is different from the userspace tools. for instance you have a version installed in /usr/local and another in /usr |
Found the real reason: kernel module failed to compile during the module install as a dynamic kernel module due to absent source code for zstd.
The source code for zstd itself partially located inside contrib/zstd, a bit non-standard location if you ask me. So, once I've updated zfs-dkms package build rules, now Debian packages build & tests are passing with flying colors: https://github.com/andrey42/zfs-debian/actions/runs/94753609 |
contrib/zstd is the location or the original untouched zstd library. we got asked todo so. the implementation for zstd is located in modules/zstd. the good thing for placing it unmodified in contrib/zstd is, you can replace it with any version in future just by copying the original files from the zstd release into this folder and it should compile. my original version used a modified library direct in modules/zstd |
@ahrens @behlendorf @allanjude can we please get any indication if this is ever going to be reviewed? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
there have been some upstream fixes regarding lz4 and static arc. consider the 10277 which is a little bit different but conflict free with upstream |
Openzfs2.0 adds ZSTD and ZSTD-fast compression. See: openzfs/zfs/pull/10278 This commit adds the following compression values to be accepted by the pool middleware: - ZSTD - ZSTD-5 - ZSTD-7 - ZSTD-FAST These levels give an acceptable and balanced spread of both compression ratio and performance. (see graphs in openzfs/zfs/pull/10278 ) Notes: - Requires an update to OpenZFS before being able to be used - Might also want to backport to 12U1 or even 12RC1 - Related middleware changes are also required, see: truenas/middleware/pull/5517 Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Openzfs2.0 adds ZSTD and ZSTD-fast compression. See: openzfs/zfs/pull/10278 This commit adds the following compression values to be accepted by the pool middleware: - ZSTD - ZSTD-5 - ZSTD-7 - ZSTD-FAST These levels give an acceptable and balanced spread of both compression ratio and performance. (see graphs in openzfs/zfs/pull/10278 ) Notes: - Requires an update to OpenZFS before being able to be used - Might also want to backport to 12U1 or even 12RC1 - Related middleware changes are also required, see: truenas/middleware/pull/5517 Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Openzfs2.0 adds ZSTD and ZSTD-fast compression. See: openzfs/zfs/pull/10278 This commit adds the following compression values to be accepted by the pool middleware: - ZSTD - ZSTD-5 - ZSTD-7 - ZSTD-FAST These levels give an acceptable and balanced spread of both compression ratio and performance. (see graphs in openzfs/zfs/pull/10278 ) Notes: - Requires an update to OpenZFS before being able to be used - Might also want to backport to 12U1 or even 12RC1 - Related middleware changes are also required, see: truenas/middleware/pull/5517 Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Openzfs2.0 adds ZSTD and ZSTD-fast compression. See: openzfs/zfs/pull/10278 This commit adds the following compression values to be accepted by the pool middleware: - ZSTD - ZSTD-5 - ZSTD-7 - ZSTD-FAST These levels give an acceptable and balanced spread of both compression ratio and performance. (see graphs in openzfs/zfs/pull/10278 ) Notes: - Requires an update to OpenZFS before being able to be used - Might also want to backport to 12U1 or even 12RC1 - Related middleware changes are also required, see: truenas/middleware/pull/5517 Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Openzfs2.0 adds ZSTD and ZSTD-fast compression. See: openzfs/zfs/pull/10278 This commit adds the following compression values to be accepted by the pool middleware: - ZSTD - ZSTD-5 - ZSTD-7 - ZSTD-FAST These levels give an acceptable and balanced spread of both compression ratio and performance. (see graphs in openzfs/zfs/pull/10278 ) Notes: - Requires an update to OpenZFS before being able to be used - Might also want to backport to 12U1 or even 12RC1 - Related middleware changes are also required, see: truenas/middleware/pull/5517 Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> (cherry picked from commit 606c75a)
As noted by all the tags: |
@Ornias1993 For the record, ZoF is pretty much just for iXsystems internal work-in-progress stuff now, openzfs/zfs is source used for the openzfs ports/pkgs on FreeBSD. |
Good point, so: Didn't know for sure what iX based the nightlies on, thanks for the headsup 👍 |
Close enough :) |
I've had trouble building this on FreeBSD 13-CURRENT and I'm not sure why, since it seems to have been ok in the CI:
|
@freqlabs I'm not sure why you hit this issue and the CI doesn't... @c0d3z3r0 @BrainSlayer This is stack frame size is starting to become a pain that is comming back way to often @freqlabs Any issue on 12? |
The only thing I can think of is I'm using a non-debug kernel config, whereas the CI has all the debug options enabled ( |
@freqlabs those warnings should be ignored: lib/libzstd/makefile.am:
Is there any reason above wouldn't work on FreeBSD 13 CURRENT? |
It might be that those warnings are treated as errors in the newer clang on 13 or due to some defaults change to the same effect elsewhere (or maybe it's always been treated as an error but not hit usually). We'll look into it |
@freqlabs No issue on FreeBSD 12 Stable? |
12 stable is fine on the systems I've tested, and that is not a debug kernel. |
@freqlabs Yeah, in that case I think we can indeed conclude it's most likely a compiler thing on 13... |
This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes openzfs#6247 Closes openzfs#9024 Closes openzfs#10277 Closes openzfs#10278
This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes openzfs#6247 Closes openzfs#9024 Closes openzfs#10277 Closes openzfs#10278
This PR introduces ZSTD compression to ZFS.
Many thanks to:
The original Proof of Concept by @allanjude see: #9024
The Linux compatibility redesign by @BrainSlayer see: #8941
Spin-off kvmem patch by @c0d3z3r0 see: #9034
Spin-off test refactor by @ornias1993 see: #9626
Spin-off Code-coverage Ignore by @ornias1993 see: #9726
Additional testing, validation and review by @dan-and and @ornias1993 see: Test suite
Restructure and Refactor by @c0d3z3r0 and @BrainSlayer see: This PR
PR description and documentation by @ornias1993 see: This PR
Co-authored-by: Allan Jude allanjude@freebsd.org
Co-authored-by: Sebastian Gottschall s.gottschall@dd-wrt.com
Co-authored-by: Kjeld Schouten-Lebbing kjeld@schouten-lebbing.nl
Co-authored-by: Michael Niewöhner foss@mniewoehner.de
Signed-off-by: Allan Jude allanjude@freebsd.org
Signed-off-by: Sebastian Gottschall s.gottschall@dd-wrt.com
Signed-off-by: Kjeld Schouten-Lebbing kjeld@schouten-lebbing.nl
Signed-off-by: Michael Niewöhner foss@mniewoehner.de
See commits for sign-off-by per code-portion
Motivation and Context
ZStandard is a modern, high performance, general compression algorithm originally developed by Yann Collet (the author of LZ4). It aims to provide similar or better compression levels to GZIP, but with much better performance. ZStandard provides a large selection of compression levels to allow a storage administrator to select the preferred performance/compression trade-off.
A Proof of Concept was made by @allanjude for FreeBSD and reworked by @BrainSlayer in his PR, adding the following:
This work was restructured and, somewhat, refactored by @c0d3z3r0 and description/documentation is(re)written by @ornias1993 to ensure it's easy-to and ready-for review.
This PR description might be a long read, but tries to encapsulate multiple previous iterations, much work and many comments.
Description
ZSTD Feature
This PR adds the new compression types: zstd and zstd-fast, which is basically a faster "negative" level of zstd.
Available compression levels for zstd are zstd-1 through zstd-19. In addition, faster levels supported by zstd-fast are 1-10, 20-100 (in increments of 10), 500 and 1000. As zstd-fast is basically a "negative" level of zstd, it gets faster with every level in stark contrast with "normal" zstd which gets slower with every level.
Rather than the approach used when gzip was added, of using a separate compression type for each different level, this PR added a new hidden property,
compression_level
. All of the zstd levels use the single entry in the zio_compress enum, since all levels use the same decompression function.However, in some cases, it was necessary to know what level a block was compressed within other parts of the code, so the compression level is stored on disk inline with the data, after the compression header (uncompressed size). The level is plumbed through arc_buf_hdr_t, objset_t, and zio_prop_t. There may still be additional places where this needs to be added.
This new property is hidden from the user, and the user simply does
zfs set compress=zstd-10 dataset
and zfs_ioctl.c splits this single value into the two values to be stored in the dataset properties. This logic has been extended to Channel Programs as well.How Has This Been Tested?
The following testing has been performed:
Performance testing
Extra care has been taken to guarantee performance and reliability.
Compression performance using a custom development script by @Ornias.
Repeatability of the test is confirmed by @dan-and.
Updated performance graphs: here
Please click the graphs for more detail
Speeds are for comparative reference only and will differ based on the system from which the test is ran
Test Tool: #9756
This test also includes SHA256 checkum verification, which it passes.
Buildbot testing
On top of the normal build-bot procedure, extra manual verification has been done to ensure the test timers of relevant tests stay as close to master as possible. This is done due to early tests indicating situations in which a test would pass, but (due to bugs) with significant performance degradation.
No unexpected slow executions of tests where observed in this PR.
Stress and live testing
Stress testing has been performed on a ssd only pool by @BrainSlayer and on an hdd pool by @dan-and
On top of that, multiple versions of this build have been tested running on a ssd and hdd based live (production) machines with varying load, to ensure stability.
Frequently Asked Questions
A: Most of this code is the ZSTD library, which has not been altered. Therefor it would be safe to skip contrib/zstd during your review.
A: This was a difficult choice, we tried to take into account the problems upgrading lz4 and wanted to keep it as "upgrade friendly" as possible. The performance difference between a cut-down and full zstd library are minimal.
A: A lot can happen in a year. Because of the FreeBSD merger (and to not spend time removing things only to put it back) we wanted to keep it as easy as possible for the BSD developers to port. However: We did not test on FreeBSD.
A: Some tests happen to run/build in user-space, this ensures test compatibility and shouldn't have any adverse effects
A: Copyright headers do not change actual copyright. When we added a header, we just reformatted the rest to ensure it looks nice. We also made extra sure the headers are correct for ZSTD, by going over all changes and adding headers where they where forgotten by previous contributors.
ccflags-y += -Wframe-larger-than=20480
, wouldn't this cause issues down the line?A: This is a consequence of using the unmodified ZSTD library and is only used on said library. However, the functions giving Frame-Size errors are not actually used by us. This is tested by modifying the Library without the troubled code. We can assure you this does not cause issues.
contrib/zstd
ignored from code coverage, don't we want to cover zstd with tests too?A: Most of the code in the unmodified library is not in use by ZFS (and thus: Not covered by tests) and is already tested/supported by the ZSTD project. There are no plans for OpenZFS to maintain our own ZSTD fork. So in short: Calculating coverage on the ZSTD library leads to lots of "uncovered code" which is only a distraction because the ZSTD library isn't supposed to be covered by our tests.
A: While the compression speeds are on par with stock-zstd metrics, indeed on the decompression side there is still room for improvement. Luckily this PR never was meant to be the end of ZSTD-on-ZFS development and we are full of idea's for future incremental performance, quality and feature enhancements...
Types of changes
Checklist:
Signed-off-by
.TODOs left