feat: improve memory usage of zstd encoder by using our own pool management #2375

rtreffer · 2022-10-27T21:21:45Z

Currently a single zstd encoder with default concurrency is used. Default concurrency causes EncodeAll to create one encoder state per GOMAXPROC - per default per core.

On high core machined (32+) and high compression levels (32MB / state) this leads to 1GB memory consumption per ~32 cores. A 1GB encoder is pretty expensive compared to the 1MB payloads usually sent to kafka.

The new approach limits the encoder to a single core but allows dynamic allocation of additional encoders if no encoder is available. Encoders are returned after use, thus allowing for reuse, with a limit of 1 spare encoder to limit memory overhead.

A benchmark emulating a 96 core system shows the memory effectiveness of the change.

Previous result:

goos: linux
goarch: amd64
pkg: github.com/Shopify/sarama
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkZstdMemoryConsumption-8               2         834830801 ns/op        3664055292 B/op     4710 allocs/op
PASS
ok      github.com/Shopify/sarama       2.181s

Current result:

goos: linux
goarch: amd64
pkg: github.com/Shopify/sarama
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkZstdMemoryConsumption-8   	       5	 222605954 ns/op	38960185 B/op	     814 allocs/op
PASS
ok  	github.com/Shopify/sarama	3.045s

BenchmarkZstdMemoryConsumption-8       2        834830801 ns/op        3664055292 B/op        4710 allocs/op
BenchmarkZstdMemoryConsumption-8       5        222605954 ns/op          38960185 B/op         814 allocs/op

A ~4x improvement on total runtime and a 96x improvemenet on memory usage for the first 2x96 messages.

This patch will as a downside increase how often new encoders are created on the fly and the maximum number of encoders might be even higher - however it should be in line with the actual used cores instead of the theoretical available cores.

Currently a single zstd encoder with default concurrency is used. Default concurrency causes EncodeAll to create one encoder state per GOMAXPROC, per default per core. On high core machined (32+) and high compression levels this leads to 1GB memory consumption per ~32 cores. A 1GB encoder is pretty expensive compared to the 1MB payloads usually sent to kafka. The new approach limits the encoder to a single core but allows dynamic allocation of additional encoders if no encoder is available. Encoders are returned after use, thus allowing for reuse. A benchmark emulating a 96 core system shows the effectiveness of the change. Previous result: ``` goos: linux goarch: amd64 pkg: github.com/Shopify/sarama cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz BenchmarkZstdMemoryConsumption-8 2 834830801 ns/op 3664055292 B/op 4710 allocs/op PASS ok github.com/Shopify/sarama 2.181s ``` Current result: ``` goos: linux goarch: amd64 pkg: github.com/Shopify/sarama cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz BenchmarkZstdMemoryConsumption-8 5 222605954 ns/op 38960185 B/op 814 allocs/op PASS ok github.com/Shopify/sarama 3.045s ``` ``` BenchmarkZstdMemoryConsumption-8 2 834830801 ns/op 3664055292 B/op 4710 allocs/op BenchmarkZstdMemoryConsumption-8 5 222605954 ns/op 38960185 B/op 814 allocs/op ``` A ~4x improvement on total runtime and a 96x improvemenet on memory usage for the first 2x96 messages. This patch will as a downside increase how often new encoders are created on the fly and the maximum number of encoders might be even higher.

rtreffer · 2022-10-27T21:28:53Z

CLA signed

dnwe

Thanks for working on this and sharing benchmarks — this change looks good to me

cc @lizthegrey

lizthegrey · 2022-11-14T12:44:27Z

Thanks for working on this and sharing benchmarks — this change looks good to me

cc @lizthegrey

Ooh, thanks. This probably will let us better bin-pack our telemetry ingest processes with other more memory-intensive workloads.

github-actions bot added the cla-needed label Oct 27, 2022

This was referenced Nov 11, 2022

Extremely high memory usage for zstd decompression #1831

Closed

Optimize BMP module, notably when removing a peer akvorado/akvorado#253

Open

github-actions bot removed the cla-needed label Nov 14, 2022

dnwe changed the title ~~Reduce the zstd encoder state to pool size to one~~ feat: improve memory usage of zstd encoder by using our own pool management Nov 14, 2022

dnwe added the feat label Nov 14, 2022

dnwe approved these changes Nov 14, 2022

View reviewed changes

dnwe merged commit 38204cb into IBM:main Nov 14, 2022

This was referenced Aug 19, 2024

zstdMaxBufferedEncoders value is too low which caused frequent zstd encoder object creation #2965

Closed

feat: Make compression encoder pool size controlled by MaxBufferedCompressionEncoders a configurable parameter #2968

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve memory usage of zstd encoder by using our own pool management #2375

feat: improve memory usage of zstd encoder by using our own pool management #2375

rtreffer commented Oct 27, 2022

rtreffer commented Oct 27, 2022

dnwe left a comment

lizthegrey commented Nov 14, 2022

feat: improve memory usage of zstd encoder by using our own pool management #2375

feat: improve memory usage of zstd encoder by using our own pool management #2375

Conversation

rtreffer commented Oct 27, 2022

rtreffer commented Oct 27, 2022

dnwe left a comment

Choose a reason for hiding this comment

lizthegrey commented Nov 14, 2022