Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve memory usage of zstd encoder by using our own pool management #2375

Merged
merged 1 commit into from
Nov 14, 2022

Commits on Oct 27, 2022

  1. Reduce the zstd encoder state to pool size to one

    Currently a single zstd encoder with default concurrency is used.
    Default concurrency causes EncodeAll to create one encoder state per
    GOMAXPROC, per default per core.
    
    On high core machined (32+) and high compression levels this leads to
    1GB memory consumption per ~32 cores. A 1GB encoder is pretty expensive
    compared to the 1MB payloads usually sent to kafka.
    
    The new approach limits the encoder to a single core but allows dynamic
    allocation of additional encoders if no encoder is available. Encoders
    are returned after use, thus allowing for reuse.
    
    A benchmark emulating a 96 core system shows the effectiveness of the
    change.
    
    Previous result:
    ```
    goos: linux
    goarch: amd64
    pkg: github.com/Shopify/sarama
    cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
    BenchmarkZstdMemoryConsumption-8               2         834830801 ns/op        3664055292 B/op     4710 allocs/op
    PASS
    ok      github.com/Shopify/sarama       2.181s
    ```
    
    Current result:
    ```
    goos: linux
    goarch: amd64
    pkg: github.com/Shopify/sarama
    cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
    BenchmarkZstdMemoryConsumption-8   	       5	 222605954 ns/op	38960185 B/op	     814 allocs/op
    PASS
    ok  	github.com/Shopify/sarama	3.045s
    ```
    
    ```
    BenchmarkZstdMemoryConsumption-8       2        834830801 ns/op        3664055292 B/op        4710 allocs/op
    BenchmarkZstdMemoryConsumption-8       5        222605954 ns/op          38960185 B/op         814 allocs/op
    ```
    A ~4x improvement on total runtime and a 96x improvemenet on memory usage for the first 2x96 messages.
    
    This patch will as a downside increase how often new encoders are
    created on the fly and the maximum number of encoders might be even
    higher.
    rtreffer committed Oct 27, 2022
    Configuration menu
    Copy the full SHA
    f95f6ee View commit details
    Browse the repository at this point in the history