Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zstandard decompression performance #19248

Open
dweiller opened this issue Mar 11, 2024 · 0 comments
Open

Zstandard decompression performance #19248

dweiller opened this issue Mar 11, 2024 · 0 comments
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. optimization standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@dweiller
Copy link
Contributor

dweiller commented Mar 11, 2024

The Zstandard decompression implementation has not had any performance benchmarking and optimisation work done.

To make sure we have a speedy implementation we should:

  • include a decompression benchmark utility in lib/std/compress/zstandard/, and/or add a benchmark to gotta-go-fast
  • compare performance with the reference implementation - unless it requires more implementation complexity or generated code size than is desired, there is no reason we can't be as fast or faster than the reference.

Extracted from #14702.

One thing to think about when optimising is that the current implementation does not decode 4-stream literal sections in parallel in order to avoid memory allocations (or remove a need for an additional 128KiB of stack memory) in the non-allocating API. This presumably sacrifices decompression performance, but implementing and benchmarking a parallel implementation is needed to assess what the impact is in practice. An implementation that decodes literal sections in parallel would require re-working the block decoding in lib/std/compress/zstandard/decode/block.zig to eagerly decode literals into an extra internal buffer using four threads. Another (probably better) option is to decode the streams 'in parallel' on a single core by exploiting instruction level parallelism on super-scalar processors.

@andrewrk andrewrk added contributor friendly This issue is limited in scope and/or knowledge of Zig internals. optimization standard library This issue involves writing Zig code for the standard library. labels Mar 28, 2024
@andrewrk andrewrk added this to the 1.1.0 milestone Mar 28, 2024
@andrewrk andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. optimization standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

2 participants