Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question - Help with setting the correct options for dgraph-io/badger #196

Closed
jarifibrahim opened this issue Dec 25, 2019 · 6 comments
Closed
Labels

Comments

@jarifibrahim
Copy link
Contributor

Hey @klauspost, thank you for writing this amazing library in Go. I work on https://github.com/dgraph-io/badger and we'd like to use this library instead of the CGO based ZSTD implementation.
We had a small chat about this a while ago https://discuss.dgraph.io/t/badger-compression-feedback/5478

Here's what we're compressing in badger
Badger stores key-values in a table called SST. Each SST is divided into blocks of 4KB by default. We'd like to compress these blocks.

Compression of the blocks is a one-time thing but decompression happens every time a new block is accessed (this is a frequent operation).

I understand that https://github.com/klauspost/compress/tree/master/zstd#blocks can be used to compress small blocks but is 4KB considered as a small block?

We'd like to have a fair tradeoff between the decompression speed and compression ratio.

I see that there are a bunch of options for encoding and decoding but because of my limited knowledge about how ZSTD works, I can't seem to figure out which ones should be tweaked.
I'd really appreciate it if you can help me pick the appropriate options for encoding/decoding :)

@klauspost
Copy link
Owner

klauspost commented Dec 25, 2019

is 4KB considered as a small block.

Yes, definitely. I would also say very small. The frame header (6-16 bytes) and the CRC will be a rather significant part of the size. If you have something already dealing with bit rot, disabling CRC may be an idea here.

Test if using WithSingleSegment() is a benefit to you - it will make blocks a tiny bit bigger, but maybe a bit faster to decode. WithWindowSize should also automatically be set, so defaults should be fine. Since blocks are so small, WithEncoderLevel(zstd.SpeedFastest) is probably the way to go since 'default' will bring little benefit and has a bigger startup cost.

We'd like to have a fair tradeoff between the decompression speed and compression ratio.

I think the fastest mode will bring that. WithNoEntropyCompression will speed both enc+dec up, but it will compress considerably worse.

If your storage backend had no problem storing 0 length blobs, keep WithZeroFrames off (default). What this means is that 0 bytes input -> 0 bytes output. Otherwise a useless frame header plus maybe even crc is added.

Default with 'fastest' mode set should be fine, use the EncodeAll/DecodeAll. Remember that if you provide an existing slice for output, it should be length zero, but with the capacity for 4KB output. WithSingleSegment(false) will probably save you 2 bytes/block.

Oh, and when you benchmark, be sure to use real blocks of data and have a bunch of different ones.

@jarifibrahim
Copy link
Contributor Author

Hey @klauspost, I tried running some benchmarks on two kinds of data

  1. Table Data (contains some randomly generated data).
Compression Ratio Snappy 1.7531182795698925
Compression Ratio LZ4 1.7861524978089396

Compression Ratio Datadog ZSTD level 1 3.1993720565149135
Compression Ratio Datadog ZSTD level 3 3.099619771863118

Compression Ratio Go ZSTD 3.2170481452249406
Compression Ratio Go ZSTD level 3 3.1474903474903475
name                                        time/op
Comp/Compression/Snappy-16                   4.09µs ± 1%
Comp/Compression/LZ4-16                      5.06µs ± 1%
Comp/Compression/ZSTD_-_Datadog-level1-16    17.6µs ± 3%
Comp/Compression/ZSTD_-_Datadog-level3-16    20.7µs ± 3%
Comp/Compression/ZSTD_-_Go_-_level1-16       27.8µs ± 2%
Comp/Compression/ZSTD_-_Go_-_Default-16      39.1µs ± 1%
Comp/Decompression/Snappy-16                 1.13µs ± 1%
Comp/Decompression/LZ4-16                     642ns ± 1%
Comp/Decompression/ZSTD_-_Datadog-16         7.12µs ± 2%
Comp/Decompression/ZSTD_-_Go-16              13.7µs ± 2%

name                                       speed
Comp/Compression/Snappy-16                 1.00GB/s ± 1%
Comp/Compression/LZ4-16                     806MB/s ± 1%
Comp/Compression/ZSTD_-_Datadog-level1-16   231MB/s ± 3%
Comp/Compression/ZSTD_-_Datadog-level3-16   197MB/s ± 3%
Comp/Compression/ZSTD_-_Go_-_level1-16      147MB/s ± 2%
Comp/Compression/ZSTD_-_Go_-_Default-16     104MB/s ± 1%
Comp/Decompression/Snappy-16               3.60GB/s ± 1%
Comp/Decompression/LZ4-16                  6.34GB/s ± 1%
Comp/Decompression/ZSTD_-_Datadog-16        573MB/s ± 2%
Comp/Decompression/ZSTD_-_Go-16             298MB/s ± 2%
  1. 4KB of text taken from https://gist.github.com/StevenClontz/4445774
Compression Ratio Snappy 1.3053435114503817
Compression Ratio LZ4 1.1712328767123288

Compression Ratio ZSTD level 1 1.9294781382228492
Compression Ratio ZSTD level 3 1.9322033898305084

Compression Ratio Go ZSTD 1.894736842105263
Compression Ratio Go ZSTD level 3 1.927665570690465
name                                       time/op
Comp/Compression/Snappy-16                   6.88µs ± 2%
Comp/Compression/LZ4-16                      5.87µs ± 1%
Comp/Compression/ZSTD_-_Datadog-level1-16    22.7µs ± 4%
Comp/Compression/ZSTD_-_Datadog-level3-16    29.6µs ± 4%
Comp/Compression/ZSTD_-_Go_-_level1-16       35.7µs ± 1%
Comp/Compression/ZSTD_-_Go_-_Default-16      97.9µs ± 1%
Comp/Decompression/Snappy-16                 1.53µs ± 2%
Comp/Decompression/LZ4-16                     623ns ± 1%
Comp/Decompression/ZSTD_-_Datadog-16         8.36µs ± 0%
Comp/Decompression/ZSTD_-_Go-16              16.0µs ± 0%

name                                       speed
Comp/Compression/Snappy-16                  597MB/s ± 2%
Comp/Compression/LZ4-16                     699MB/s ± 1%
Comp/Compression/ZSTD_-_Datadog-level1-16   181MB/s ± 4%
Comp/Compression/ZSTD_-_Datadog-level3-16   139MB/s ± 4%
Comp/Compression/ZSTD_-_Go_-_level1-16      115MB/s ± 1%
Comp/Compression/ZSTD_-_Go_-_Default-16    41.9MB/s ± 1%
Comp/Decompression/Snappy-16               2.69GB/s ± 2%
Comp/Decompression/LZ4-16                  6.58GB/s ± 0%
Comp/Decompression/ZSTD_-_Datadog-16        489MB/s ± 2%
Comp/Decompression/ZSTD_-_Go-16             256MB/s ± 0%

Here's the script I've used https://gist.github.com/jarifibrahim/91920e93d1ecac3006b269e0c05d6a24

I have a couple of questions

  1. Why does the compression ratio worsen when I use actual text instead of random data?
  2. I see a considerable speed difference between the go implementation and cgo based ZSTD. Is this expected?

@klauspost
Copy link
Owner

  1. You need a wider variety of inputs to really judge that. Different data will have different characteristics and with a single input you are not getting the full picture. With really small data set you are training the CPU for specific branches. Using different types of data will give a more realistic picture, which will probably be some losses, some wins, etc.

  2. The C implementation has had many, many hours poured into it so it is pretty much as good as things can get, and Go has a natural disadvantage with a less advanced compiler and certain forced checks and zeroing. That said I have not focused that much on very small blocks yet, so it is likely that there are still some gains to be had.

But your benchmark code looks solid, so except for maybe testing more different types of blocks it should give a fine image.

@klauspost
Copy link
Owner

Added some experimental code for small blocks: #199

Not a huge improvement, but worth taking.

@klauspost
Copy link
Owner

Found a much bigger improvement. Now about 15% faster on the fastest setting.

@jarifibrahim
Copy link
Contributor Author

Found a much bigger improvement. Now about 15% faster on the fastest setting.

This is amazing @klauspost . I'll benchmark the new code

Thank you so much for helping out with this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants