-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update linux kernel to latest zstd (from 1.4.10) #3275
Comments
Reddit OP here, I have a version of Linux 5.19 with upstream Zstd. If you need help with testing, integration, anything at all - feel free to ask. I can create a preliminary patch on |
We are planning the 1.5.4 zstd release later this year. I will plan on updating the kernel with that release. However, we don't expect too much more development before the release. So I will put up a patch to |
@terrelln Thank you for your confirmation! Is there any way I can assist with pre-release testing when the time comes? |
Did 1.5.3 happen, or do you have the convention that uneven means unstable? |
Sorry to chime in, but I saw it so I figured I would give you a little info in case you did not know. The version notes show quite a few optimizations, however I do not know what will make it into 1.5.4 nor what caused the check fails. I do not know the rest of the history behind it, so I think someone else will correct me or add information. I could speculate but I think you know better in this case. |
We have not released 1.5.3 publicly yet. We were experimenting with a different release process where we release internally first, and publicly second. But other priorities took over and we haven't made the public release yet. I don't know if we will call it 1.5.3 or 1.5.4, but we will make a public release later this year. |
Thanks! I will update here when I've put the patch up. At that point any testing would be greatly appreciated! |
FYI I've started working on this. I expect to have a patch up next week. And I am going to be aiming for the v6.2 merge window. |
Quick update: I'm preparing a patch to update the kernel to v1.5.2 now, aiming for the v6.2 merge window. If the next zstd release is out before then, I will update to that release instead. I've completed the build, boot, and light btrfs/squashfs/zram testing. I've started performance measurements, however I'm seeing a ~10-20% performance regression for compression levels 5-15. I've bisected it to 13cad3a, which was ~neutral in user-space, but seems to not do well in kernel-space. I'm debugging the performance issues, but I'm not sure how long it will take me. I expect to spend a bit more time on it, but the efforts might stall out before I find a satisfactory solution. So quick question for interested parties: Are you still interested in an update patch that would regression performance for levels 5-15? These probably aren't super heavily used in the kernel, since I'd expect most usage to be levels 1-3, and decompression. If I don't find a solution soon, I could still put the patch up & merge into my |
@terrelln Thank you for the update! And thank you for your work thus far. Personally speaking, I do not mind the compression regression but it is hard to say whether that is a shared sentiment. I'll leave that for others. I do use 3+ compression levels quite a bit, but my situation does not care for write speed. Speaking on where to go with it, I think your plan thus far sounds sane, particularly the merge into |
@terrelln Could you maybe switch compression levels to balance the regression or would that drop compression ratio too much? (ie make 15 in latest be more like current 13 or whatever other level with a similar performance and ratio). Or maybe you could revert the bad commit while you figure it out? Personally I use 2-5 but I'd be happy to use higher if performance was better (which I believe was supposed to be the case with newer zstd). |
10-20% is not negligible perf regression, I'd rather not push such update. We got speedups in previous updates so not making it worse should be one of the criteria. I'd understand a minor perf drop (say <3%) in case of some necessary code updates like added safety checks but the linked commit is about improving compilation speed. That's trading time on developer side for all user's run time which is IMHO wrong. Also you're measuring -O3 builds, you're asking the compiler to try its best to produce fast code, that's expected to be slow. A quicker development build can do -O2 without much difference to -O3 in compression speed but with reduced compilation time. The -O3 is more like a production build, that could be tested less frequently, eg. daily or weekly to check if there are regressions but not necessarily for each PR/commit. In the commit there are more indirections done due to separating functions and adding pointer calls, that's always going to hurt in kernel due to all the spectre mitigations. The fix is to avoid indirection if possible and/or replace it by switches, but the patch does the opposite. It may be better from the code organization and design patterns point of view but for a high cost. |
@terrelln I am interested. Share your patch please. |
Yeah, I've verified that the indirect function calls are definitely causing the issue. We went this route to reduce code size. But I can try to re-structure the code to keep the code size wins, but without the indirect function calls. |
Use a switch statement to select the search function instead of an indirect function call. This results in a sizable performance win. This PR is a modification of the approach taken in PR facebook#2828. When I measured performance for that commit, it was neutral. However, I now see a performance regression on gcc, but still neutral on clang. I'm measuring on the same platform, but with newer compilers. The new approach beats both the current dev branch and the baseline before PR facebook#2828 was merged. This PR is necessary for Issue facebook#3275, to update zstd in the kernel. Without this PR there is a large regression in greedy - btlazy2 compression speed. With this PR it is about neutral. gcc version: 12.2.0 clang version: 14.0.6 dataset: silesia.tar | Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta | |----------|-------|------------------|-----------------|--------| | gcc | 5 | 102.6 | 113.7 | +10.8% | | gcc | 7 | 66.6 | 74.8 | +12.3% | | gcc | 9 | 51.5 | 58.9 | +14.3% | | gcc | 13 | 14.3 | 14.3 | +0.0% | | clang | 5 | 108.1 | 114.8 | +6.2% | | clang | 7 | 68.5 | 72.3 | +5.5% | | clang | 9 | 53.2 | 56.2 | +5.6% | | clang | 13 | 14.3 | 14.7 | +2.8% | The binary size stays just about the same for clang and gcc, measured using the `size` command: | Compiler | Branch | Text | Data | BSS | Total | |----------|--------|---------|------|-----|---------| | gcc | dev | 1127950 | 3312 | 280 | 1131542 | | gcc | PR | 1123422 | 2512 | 280 | 1126214 | | clang | dev | 1046254 | 3256 | 216 | 1049726 | | clang | PR | 1048198 | 2296 | 216 | 1050710 |
I've been able to reduce the regression with PR #3295. There is still a bit of work to do, but we should be able to at least get pretty close to neutral for the compression levels where we saw a regression. |
Use a switch statement to select the search function instead of an indirect function call. This results in a sizable performance win. This PR is a modification of the approach taken in PR facebook#2828. When I measured performance for that commit, it was neutral. However, I now see a performance regression on gcc, but still neutral on clang. I'm measuring on the same platform, but with newer compilers. The new approach beats both the current dev branch and the baseline before PR facebook#2828 was merged. This PR is necessary for Issue facebook#3275, to update zstd in the kernel. Without this PR there is a large regression in greedy - btlazy2 compression speed. With this PR it is about neutral. gcc version: 12.2.0 clang version: 14.0.6 dataset: silesia.tar | Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta | |----------|-------|------------------|-----------------|--------| | gcc | 5 | 102.6 | 113.7 | +10.8% | | gcc | 7 | 66.6 | 74.8 | +12.3% | | gcc | 9 | 51.5 | 58.9 | +14.3% | | gcc | 13 | 14.3 | 14.3 | +0.0% | | clang | 5 | 108.1 | 114.8 | +6.2% | | clang | 7 | 68.5 | 72.3 | +5.5% | | clang | 9 | 53.2 | 56.2 | +5.6% | | clang | 13 | 14.3 | 14.7 | +2.8% | The binary size stays just about the same for clang and gcc, measured using the `size` command: | Compiler | Branch | Text | Data | BSS | Total | |----------|--------|---------|------|-----|---------| | gcc | dev | 1127950 | 3312 | 280 | 1131542 | | gcc | PR | 1123422 | 2512 | 280 | 1126214 | | clang | dev | 1046254 | 3256 | 216 | 1049726 | | clang | PR | 1048198 | 2296 | 216 | 1050710 |
Use a switch statement to select the search function instead of an indirect function call. This results in a sizable performance win. This PR is a modification of the approach taken in PR facebook#2828. When I measured performance for that commit, it was neutral. However, I now see a performance regression on gcc, but still neutral on clang. I'm measuring on the same platform, but with newer compilers. The new approach beats both the current dev branch and the baseline before PR facebook#2828 was merged. This PR is necessary for Issue facebook#3275, to update zstd in the kernel. Without this PR there is a large regression in greedy - btlazy2 compression speed. With this PR it is about neutral. gcc version: 12.2.0 clang version: 14.0.6 dataset: silesia.tar | Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta | |----------|-------|------------------|-----------------|--------| | gcc | 5 | 102.6 | 113.7 | +10.8% | | gcc | 7 | 66.6 | 74.8 | +12.3% | | gcc | 9 | 51.5 | 58.9 | +14.3% | | gcc | 13 | 14.3 | 14.3 | +0.0% | | clang | 5 | 108.1 | 114.8 | +6.2% | | clang | 7 | 68.5 | 72.3 | +5.5% | | clang | 9 | 53.2 | 56.2 | +5.6% | | clang | 13 | 14.3 | 14.7 | +2.8% | The binary size stays just about the same for clang and gcc, measured using the `size` command: | Compiler | Branch | Text | Data | BSS | Total | |----------|--------|---------|------|-----|---------| | gcc | dev | 1127950 | 3312 | 280 | 1131542 | | gcc | PR | 1123422 | 2512 | 280 | 1126214 | | clang | dev | 1046254 | 3256 | 216 | 1049726 | | clang | PR | 1048198 | 2296 | 216 | 1050710 |
Use a switch statement to select the search function instead of an indirect function call. This results in a sizable performance win. This PR is a modification of the approach taken in PR #2828. When I measured performance for that commit, it was neutral. However, I now see a performance regression on gcc, but still neutral on clang. I'm measuring on the same platform, but with newer compilers. The new approach beats both the current dev branch and the baseline before PR #2828 was merged. This PR is necessary for Issue #3275, to update zstd in the kernel. Without this PR there is a large regression in greedy - btlazy2 compression speed. With this PR it is about neutral. gcc version: 12.2.0 clang version: 14.0.6 dataset: silesia.tar | Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta | |----------|-------|------------------|-----------------|--------| | gcc | 5 | 102.6 | 113.7 | +10.8% | | gcc | 7 | 66.6 | 74.8 | +12.3% | | gcc | 9 | 51.5 | 58.9 | +14.3% | | gcc | 13 | 14.3 | 14.3 | +0.0% | | clang | 5 | 108.1 | 114.8 | +6.2% | | clang | 7 | 68.5 | 72.3 | +5.5% | | clang | 9 | 53.2 | 56.2 | +5.6% | | clang | 13 | 14.3 | 14.7 | +2.8% | The binary size stays just about the same for clang and gcc, measured using the `size` command: | Compiler | Branch | Text | Data | BSS | Total | |----------|--------|---------|------|-----|---------| | gcc | dev | 1127950 | 3312 | 280 | 1131542 | | gcc | PR | 1123422 | 2512 | 280 | 1126214 | | clang | dev | 1046254 | 3256 | 216 | 1049726 | | clang | PR | 1048198 | 2296 | 216 | 1050710 |
Use a switch statement to select the search function instead of an indirect function call. This results in a sizable performance win. This PR is a modification of the approach taken in PR #2828. When I measured performance for that commit, it was neutral. However, I now see a performance regression on gcc, but still neutral on clang. I'm measuring on the same platform, but with newer compilers. The new approach beats both the current dev branch and the baseline before PR #2828 was merged. This PR is necessary for Issue #3275, to update zstd in the kernel. Without this PR there is a large regression in greedy - btlazy2 compression speed. With this PR it is about neutral. gcc version: 12.2.0 clang version: 14.0.6 dataset: silesia.tar | Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta | |----------|-------|------------------|-----------------|--------| | gcc | 5 | 102.6 | 113.7 | +10.8% | | gcc | 7 | 66.6 | 74.8 | +12.3% | | gcc | 9 | 51.5 | 58.9 | +14.3% | | gcc | 13 | 14.3 | 14.3 | +0.0% | | clang | 5 | 108.1 | 114.8 | +6.2% | | clang | 7 | 68.5 | 72.3 | +5.5% | | clang | 9 | 53.2 | 56.2 | +5.6% | | clang | 13 | 14.3 | 14.7 | +2.8% | The binary size stays just about the same for clang and gcc, measured using the `size` command: | Compiler | Branch | Text | Data | BSS | Total | |----------|--------|---------|------|-----|---------| | gcc | dev | 1127950 | 3312 | 280 | 1131542 | | gcc | PR | 1123422 | 2512 | 280 | 1126214 | | clang | dev | 1046254 | 3256 | 216 | 1049726 | | clang | PR | 1048198 | 2296 | 216 | 1050710 |
Will the update to .4 be easier to do or is still complicated? |
I'm using this patch from Oleksandr Natalenko for 6.2 kernel: https://codeberg.org/pf-kernel/linux/commit/d8907dff4871c104a5a67e43157cc0d8012408e1 No problems here. |
Using it also in linux-cachyos. Is working fine. |
Closing this issue because Linux has been updated to v1.5.2.
It will be easy, but I'm going to wait until the v6.4 merge window, since I haven't been baking the patch in linux-next yet.
Glad to hear it! |
Hi, I notice the update of xxhash lib (required by zstd and ksm) for linux is missing. zstd v1.5.1 had updated it to latest v0.8.1. But zstd v1.5.2 in linux kernel still using a old version. There are no major update since it was added: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/lib/xxhash.c Could you consider updating it to latest version? This may avoid potential compatibility issues and bring some performance improvements. |
There will be no compatibility issues, because zstd only uses the stable public API, and the hash function is fixed. AFAIK there haven't been any performance improvements to XXH64, but @Cyan4973 can verify. I'd recommend opening an issue on the XXHash project to track this. |
Hi, people of reddit asked (https://old.reddit.com/r/kernel/comments/xp2o53/why_is_the_version_of_zstd_in_the_kernel_outdated/) about updating the linux port of zstd to newer version. This has been promised back then when 1.4.10 was synced a year ago and I'd be glad to see an update (or even a regular update in each kernel release) as btrfs is using zstd and any performance improvement is most welcome.
There's automation support (contrib/linux-kernel) so generating the patch itself should not take too much time. As there are more things to do like review and benchmarking it's not the last step but at least the preliminary version can be added to
linux-next
tree. The final pull request sent to Linus may or may not happen depending on the testing results.IMHO neglecting the regular updates is more "expensive", like it was with the 1.4.10 update that took about a year and a lot of convincing. The linux-next tree is really convenient as it does not pose a huge risk for users and helps to catch bugs early.
The text was updated successfully, but these errors were encountered: