Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: Slowdown after upgrading from 1.0.26 to 1.0.28 #395

Closed
3 tasks
ivpavici opened this issue Jan 10, 2024 · 8 comments
Closed
3 tasks

question: Slowdown after upgrading from 1.0.26 to 1.0.28 #395

ivpavici opened this issue Jan 10, 2024 · 8 comments

Comments

@ivpavici
Copy link

ivpavici commented Jan 10, 2024

Hello!

This will be more of a question, sorry in advance if something is missing, but I'm not sure what details might help!

Anyway, not sure why we are experiencing a slowdown (> x2) while running tests in our project locally after bumping flate2 from 1.0.26 to 1.0.28. Everything else in the project stays the same!

Cargo.toml: flate2 = { version = "1.0.28" }

Cargo.lock:

[[package]]
name = "flate2"
version = "1.0.28"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "46303f565772937ffe1d394a4fac6f411c6013172fadde9dcdb1e147a086940e"
dependencies = [
 "crc32fast",
 "miniz_oxide",
]

The problem is that when we run the same tests on github actions there is no slowdown between these versions!

I personally am running tests on Ubuntu on WSL2... I can reproduce this behavior every time between versions!

More information needed

@Byron
Copy link
Member

Byron commented Jan 10, 2024

This configuration would mean that miniz_oxide is used, a pure Rust implementation. The miniz_oxide dependency didn't change in a while, so it looks like the cause should either be something in flate2, or a difference in the Rust compiler (unless you say it wasn't upgraded either).

In any case, I recommend to try 1.0.27, and once the first good and the first bad tag are found, I'd use a local clone of flate2 and set it up to be used with

[patch."crates-io"]
flate2 = { path = "/path/to/clone" }

Then git bisect can be used in the flate2 clone to find the commit that introduced the issue. With zlib-ng I didn't notice any issue on my side, but miniz-oxide is certainly different enough.

As this issue can't be reproduced here, I recommend closing it while considering the submission of a PR for a fix that works for you.

@ivpavici
Copy link
Author

ivpavici commented Jan 11, 2024

thank you @Byron !

We have identified why there is a slowdown locally vs on github actions... because locally I ran my project just with: cargo run ...and seems that this doesn't affect version 1.0.26

But, on github, there the package is built with --release flag (and running inside docker).
And here seems to be a difference for 1.0.28! It is 2 or 3 times slower when just ran by cargo run instead of cargo run --release

Locally , it seems that running both with --release results in same execution time:

cargo run --release with 1.0.26
Time: 268.679 s, estimated 944 s

cargo run --release with 1.0.28
Time: 267.24 s, estimated 269 s

@Byron
Copy link
Member

Byron commented Jan 11, 2024

Thanks for digging in and for solving the puzzle :). I am glad it isn't anything more serious.

@Byron Byron closed this as completed Jan 11, 2024
@jongiddy
Copy link
Contributor

A good candidate for the cause of the slowdown is #373. That PR explicitly prioritized correctness over performance, so it is positive that the times in release mode do not show any change.

kkovaacs added a commit to eqlabs/pathfinder that referenced this issue Feb 27, 2024
flate2 1.0.28 built unoptimized is significantly slower than before
an unsafe unsoundness was fixed. This unnecessarily slows down some
of our tests which use `flate2` during test setup.

For more info please check rust-lang/flate2-rs#395

This change makes sure that `flate2` is optimized even for a `dev` build.
@Hexta
Copy link

Hexta commented Mar 2, 2024

Hey,
Unfortunately, #373 caused issues in release mode as well.
One of the related issue is vectordotdev/vector#19981.

@Byron
Copy link
Member

Byron commented Mar 3, 2024

Thanks for bringing this to my attention!

I took a look at the flamegraph in vectordotdev/vector#19981 and noticed that the runtime is dominated by calls to an unaligned memset function. This is probably caused by the resize(cap, 0) call, which fills the output vec with zeroes. This didn't happen before the change in #373. However, this also means that the output vector isn't reused as much as it maybe should, or that the length of the output vector is changed back to below its capacity, so zeroes are repeatedly written.

Something I couldn't make sense of is this claim in the vector issue:

After version v0.34.0, the memory allocation system call count is very high when to do GzEncoder (I also confirm that this call stack occur when I using the zlib compression rather than gzip)

The change in #373 definitely does not allocate (or deallocate). The only change really is that it can fill unused capacity with zeroes the first time.

Thus I think it's more about an issue in the usage of output vectors during compression in vector that now becomes more prominent. It looks like this comment seems to indicate something like it as well.

I'd wait and see how vector ends up solving the issue, in order to figure out if anything can be done here to prevent issues of the same kind in future.

@Byron
Copy link
Member

Byron commented Mar 4, 2024

The current implementation indeed suffers from a performance degradation when exposed to many small writes during compression.

The problem is that the internal buffer is created with 32kb capacity. However, with each small write it's memset to its full capacity, just to be truncated to what's actually written right after.

These calls accumulate to something very costly. The solution is typically to avoid many small writes by buffering them, so in a way, leaving this as is seems like a net-positive as it can reveal issues with its usage, while being easy to fix in the caller's code.

CHr15F0x pushed a commit to eqlabs/pathfinder that referenced this issue Mar 5, 2024
flate2 1.0.28 built unoptimized is significantly slower than before
an unsafe unsoundness was fixed. This unnecessarily slows down some
of our tests which use `flate2` during test setup.

For more info please check rust-lang/flate2-rs#395

This change makes sure that `flate2` is optimized even for a `dev` build.
CHr15F0x pushed a commit to eqlabs/pathfinder that referenced this issue Mar 5, 2024
flate2 1.0.28 built unoptimized is significantly slower than before
an unsafe unsoundness was fixed. This unnecessarily slows down some
of our tests which use `flate2` during test setup.

For more info please check rust-lang/flate2-rs#395

This change makes sure that `flate2` is optimized even for a `dev` build.
@Shatur
Copy link

Shatur commented May 20, 2024

I also noticed the performance regression when updating from 1.0.27 to 1.0.28.
In my case release doesn't help and it's ~100 times slower...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants