feat: improve memory usage of zstd encoder by using our own pool management #2375
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently a single zstd encoder with default concurrency is used. Default concurrency causes EncodeAll to create one encoder state per GOMAXPROC - per default per core.
On high core machined (32+) and high compression levels (32MB / state) this leads to 1GB memory consumption per ~32 cores. A 1GB encoder is pretty expensive compared to the 1MB payloads usually sent to kafka.
The new approach limits the encoder to a single core but allows dynamic allocation of additional encoders if no encoder is available. Encoders are returned after use, thus allowing for reuse, with a limit of 1 spare encoder to limit memory overhead.
A benchmark emulating a 96 core system shows the memory effectiveness of the change.
Previous result:
Current result:
A ~4x improvement on total runtime and a 96x improvemenet on memory usage for the first 2x96 messages.
This patch will as a downside increase how often new encoders are created on the fly and the maximum number of encoders might be even higher - however it should be in line with the actual used cores instead of the theoretical available cores.