-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patch-from speed optimization #3545
patch-from speed optimization #3545
Conversation
…ormal matchfinders
lib/compress/zstd_compress.c
Outdated
} | ||
|
||
/* If the dict is larger than we can reasonably index in our tables, only load the suffix. */ | ||
{ U32 maxDictSize = 8U * (1U << MIN(MAX(params->cParams.hashLog, params->cParams.chainLog), 29)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit too much.
8 * (1<<29)
is equivalent to 1<<32
, so this will not end well on 32-bit values.
1<<31
(2 GB) shall be the max, it is (currently) our limit window size anyway.
8 * MIN(MAX(),28)
should do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, silly mistake! Thanks for catching this.
Great results @daniellerozenblit ! Is it practical to test this patch with compression level The trade-off achieved here is very good, and most users will be glad to trade a little bit of compression ratio for a lot of speed. Furthermore, if they really want more compression, they could still increase the compression level, and get better ratio, for even more speed (could be shown effectively with speed/ratio graph). But users of level |
f086cfe
to
7cba253
Compare
7cba253
to
8d0a06a
Compare
At this stage, we just want to assert the results, and show that they are globally positive. The code looks good to me, changes are sufficiently simple to be properly reviewed. |
And also explicit single threading if you are using the CLI with |
The code LGTM once we have the benchmark results for the mentioned scenarioes! |
Awesome! |
TLDR
This PR is a response to issue #2189, which requests a speedier version of
--patch-from
compression.This PR offers a solution by only loading the suffix of the dictionary into our normal match finders, rather than the entire dictionary. We continue to load the entire "dictionary" into our LDM match finders.
We only load the portion into our normal match finders that can reasonably be indexed by our hash tables, 8 * (1 << max(hashLog, chainLog)). Note that the 8 here is an arbitrary multiplier that shows good results (I also experimented with 4 as a multiplier, which was slightly faster but seemed to perform worse on the zstd regression test). This feature is disabled for strategies
>= ZSTD_btultra
.This optimization offers great improvements on compression speed, with very minimal increase in patch size.
Credit and thanks to @terrelln for the optimization idea.
Benchmarking
I benchmarked on an Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz with core isolation and turbo disabled.
I benchmarked
--patch-from
compression on the linux kernel tree tarball v6.0 -> v6.2. For speed measurements, I ran each scenario five times interleaved and chose the highest result.Compression Speed
There are significant improvements in compression / patch-creation speed across a range of compression levels. These speed improvements are especially present at higher compression levels (i.e. ~617.6% increase at compression level 15).
Patch Size
There is some increase in patch size across compression levels. This increase is more prevalent at higher compression levels, but is still fairly minimal (i.e. ~0.47% increase at compression level 15).