-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question - Help with setting the correct options for dgraph-io/badger #196
Comments
Yes, definitely. I would also say very small. The frame header (6-16 bytes) and the CRC will be a rather significant part of the size. If you have something already dealing with bit rot, disabling CRC may be an idea here. Test if using
I think the fastest mode will bring that. If your storage backend had no problem storing 0 length blobs, keep Default with 'fastest' mode set should be fine, use the EncodeAll/DecodeAll. Remember that if you provide an existing slice for output, it should be length zero, but with the capacity for 4KB output. Oh, and when you benchmark, be sure to use real blocks of data and have a bunch of different ones. |
Hey @klauspost, I tried running some benchmarks on two kinds of data
Here's the script I've used https://gist.github.com/jarifibrahim/91920e93d1ecac3006b269e0c05d6a24 I have a couple of questions
|
But your benchmark code looks solid, so except for maybe testing more different types of blocks it should give a fine image. |
Added some experimental code for small blocks: #199 Not a huge improvement, but worth taking. |
Found a much bigger improvement. Now about 15% faster on the fastest setting. |
This is amazing @klauspost . I'll benchmark the new code Thank you so much for helping out with this :) |
Hey @klauspost, thank you for writing this amazing library in Go. I work on https://github.com/dgraph-io/badger and we'd like to use this library instead of the CGO based ZSTD implementation.
We had a small chat about this a while ago https://discuss.dgraph.io/t/badger-compression-feedback/5478
Here's what we're compressing in badger
Badger stores key-values in a table called SST. Each SST is divided into blocks of 4KB by default. We'd like to compress these blocks.
Compression of the blocks is a one-time thing but decompression happens every time a new block is accessed (this is a frequent operation).
I understand that https://github.com/klauspost/compress/tree/master/zstd#blocks can be used to compress small blocks but is 4KB considered as a small block?
We'd like to have a fair tradeoff between the decompression speed and compression ratio.
I see that there are a bunch of options for encoding and decoding but because of my limited knowledge about how ZSTD works, I can't seem to figure out which ones should be tweaked.
I'd really appreciate it if you can help me pick the appropriate options for encoding/decoding :)
The text was updated successfully, but these errors were encountered: