Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch-from speed optimization #3545

Merged

Conversation

daniellerozenblit
Copy link
Contributor

@daniellerozenblit daniellerozenblit commented Mar 10, 2023

TLDR

This PR is a response to issue #2189, which requests a speedier version of --patch-from compression.

This PR offers a solution by only loading the suffix of the dictionary into our normal match finders, rather than the entire dictionary. We continue to load the entire "dictionary" into our LDM match finders.

We only load the portion into our normal match finders that can reasonably be indexed by our hash tables, 8 * (1 << max(hashLog, chainLog)). Note that the 8 here is an arbitrary multiplier that shows good results (I also experimented with 4 as a multiplier, which was slightly faster but seemed to perform worse on the zstd regression test). This feature is disabled for strategies >= ZSTD_btultra.

This optimization offers great improvements on compression speed, with very minimal increase in patch size.

Credit and thanks to @terrelln for the optimization idea.

Benchmarking

I benchmarked on an Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz with core isolation and turbo disabled.

I benchmarked --patch-from compression on the linux kernel tree tarball v6.0 -> v6.2. For speed measurements, I ran each scenario five times interleaved and chose the highest result.

Compression Speed

There are significant improvements in compression / patch-creation speed across a range of compression levels. These speed improvements are especially present at higher compression levels (i.e. ~617.6% increase at compression level 15).

Screenshot 2023-03-10 at 1 04 52 PM

Patch Size

There is some increase in patch size across compression levels. This increase is more prevalent at higher compression levels, but is still fairly minimal (i.e. ~0.47% increase at compression level 15).

Screenshot 2023-03-10 at 9 17 47 AM

@daniellerozenblit daniellerozenblit marked this pull request as ready for review March 10, 2023 14:32
@daniellerozenblit daniellerozenblit marked this pull request as draft March 10, 2023 14:38
@daniellerozenblit daniellerozenblit marked this pull request as ready for review March 10, 2023 15:02
@daniellerozenblit daniellerozenblit marked this pull request as draft March 10, 2023 15:06
@daniellerozenblit daniellerozenblit marked this pull request as ready for review March 10, 2023 18:09
}

/* If the dict is larger than we can reasonably index in our tables, only load the suffix. */
{ U32 maxDictSize = 8U * (1U << MIN(MAX(params->cParams.hashLog, params->cParams.chainLog), 29));
Copy link
Contributor

@Cyan4973 Cyan4973 Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too much.
8 * (1<<29) is equivalent to 1<<32, so this will not end well on 32-bit values.
1<<31 (2 GB) shall be the max, it is (currently) our limit window size anyway.
8 * MIN(MAX(),28) should do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, silly mistake! Thanks for catching this.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Mar 10, 2023

Great results @daniellerozenblit !

Is it practical to test this patch with compression level --ultra -22 ?

The trade-off achieved here is very good, and most users will be glad to trade a little bit of compression ratio for a lot of speed. Furthermore, if they really want more compression, they could still increase the compression level, and get better ratio, for even more speed (could be shown effectively with speed/ratio graph).

But users of level 22 are typically ready to sacrifice speed to get the best possible compression ratio. And a 0.5% ratio regression would be a lot for them.
However, I suspect that your PR actually doesn't impact this case. Yet, it would be better to measure it.

@daniellerozenblit daniellerozenblit force-pushed the patch-from-speed-optimization branch 2 times, most recently from f086cfe to 7cba253 Compare March 13, 2023 18:54
@Cyan4973
Copy link
Contributor

Cyan4973 commented Mar 13, 2023

  1. Could we get some benchmark results from the latest changes for levels 1, 19 and 22 please ?
    (Note : not necessarily in graphs, just giving numbers would be enough).

  2. Could you test multithreading too (even if it's just -T2) ?
    The point is to show that these optimizations remain perfectly valid with multi-threading enabled.

At this stage, we just want to assert the results, and show that they are globally positive.

The code looks good to me, changes are sufficiently simple to be properly reviewed.

@terrelln
Copy link
Contributor

Could you test multithreading too (even if it's just -T2) ?

And also explicit single threading if you are using the CLI with --single-thread

@terrelln
Copy link
Contributor

The code LGTM once we have the benchmark results for the mentioned scenarioes!

@daniellerozenblit
Copy link
Contributor Author

Additional Benchmarking

The results described earlier seem consistent across multiple threading scenarios: --single-thread, -T2, and the default scenario. However, for --single-thread, there is a higher loss of compression ratio for the optimization.

Single Thread

For single threaded compression, speed improvements are consistent with previous results. However, there is some increased loss in compression ratio across compression levels (~4%).
Screenshot 2023-03-14 at 3 56 50 PM

T2

Results for -T2 are very consistent with the default scenario. There even appears to be some increase in the speed improvements.
Screenshot 2023-03-14 at 3 58 29 PM

Levels 1, 19, 22

There are no significant changes in neither compression ratio nor speed for levels 1, 19, 22 in the default scenario.
Screenshot 2023-03-14 at 3 52 44 PM

@terrelln
Copy link
Contributor

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants