Zstandard decompression performance #19248
Labels
contributor friendly
This issue is limited in scope and/or knowledge of Zig internals.
enhancement
Solving this issue will likely involve adding new logic or components to the codebase.
optimization
standard library
This issue involves writing Zig code for the standard library.
Milestone
The Zstandard decompression implementation has not had any performance benchmarking and optimisation work done.
To make sure we have a speedy implementation we should:
lib/std/compress/zstandard/
, and/or add a benchmark to gotta-go-fastExtracted from #14702.
One thing to think about when optimising is that the current implementation does not decode 4-stream literal sections in parallel in order to avoid memory allocations (or remove a need for an additional 128KiB of stack memory) in the non-allocating API. This presumably sacrifices decompression performance, but implementing and benchmarking a parallel implementation is needed to assess what the impact is in practice. An implementation that decodes literal sections in parallel would require re-working the block decoding in lib/std/compress/zstandard/decode/block.zig to eagerly decode literals into an extra internal buffer using four threads. Another (probably better) option is to decode the streams 'in parallel' on a single core by exploiting instruction level parallelism on super-scalar processors.
The text was updated successfully, but these errors were encountered: