Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update zstd to version 1.5.2 #116

Merged
merged 2 commits into from
Apr 22, 2022
Merged

Update zstd to version 1.5.2 #116

merged 2 commits into from
Apr 22, 2022

Conversation

Viq111
Copy link
Collaborator

@Viq111 Viq111 commented Apr 14, 2022

A bit different this time as zstd added some asm code (huf_decompress_amd64.S)

New workflow:

  • Run below:
git clone https://github.com/facebook/zstd.git c
cd c
git checkout v1.5.2
cd ..
git clone git@github.com:DataDog/zstd.git go
find c/lib \( -name '*.h' -o -name '*.c' -o -name '*.S' \) -exec cp -v {} go/ \;
  • Run python tools/flatten_imports.py
  • Run python tools/insert_libzstd_ifdefs.py

This is the code as of https://github.com/facebook/zstd/releases/tag/v1.5.2 (facebook/zstd@e47e674)

Benchmarks will come in the next comment

@Viq111
Copy link
Collaborator Author

Viq111 commented Apr 15, 2022

The benchmarks on

goos: darwin
goarch: amd64
pkg: github.com/DataDog/zstd
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

are interesting and not a slam dunk with mr payload.

goos: darwin
goarch: amd64
pkg: github.com/DataDog/zstd
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
name                    old time/op    new time/op    delta
BulkCompress-12           1.39µs ± 1%    1.47µs ± 1%   +5.81%  (p=0.000 n=18+20)
BulkDecompress-12          922ns ± 1%     921ns ± 1%     ~     (p=0.174 n=20+18)
CtxCompression-12         71.7ms ± 2%    77.4ms ± 1%   +7.83%  (p=0.000 n=17+18)
CtxDecompression-12       10.6ms ± 2%    10.3ms ± 1%   -2.41%  (p=0.000 n=18+20)
StreamCompression-12      80.2ms ± 1%    89.2ms ± 1%  +11.12%  (p=0.000 n=19+19)
StreamDecompression-12    12.8ms ± 1%    12.6ms ± 1%   -1.50%  (p=0.000 n=17+20)
Compression-12            72.0ms ± 1%    77.8ms ± 0%   +8.16%  (p=0.000 n=17+17)
Decompression-12          10.6ms ± 3%    10.3ms ± 1%   -3.01%  (p=0.000 n=19+19)

name                    old speed      new speed      delta
BulkCompress-12         66.3MB/s ± 1%  62.7MB/s ± 1%   -5.49%  (p=0.000 n=18+20)
BulkDecompress-12       61.8MB/s ± 1%  61.9MB/s ± 1%     ~     (p=0.169 n=20+18)
CtxCompression-12        139MB/s ± 2%   129MB/s ± 1%   -7.27%  (p=0.000 n=17+18)
CtxDecompression-12      942MB/s ± 2%   965MB/s ± 1%   +2.47%  (p=0.000 n=18+20)
StreamCompression-12     124MB/s ± 1%   112MB/s ± 1%  -10.01%  (p=0.000 n=19+19)
StreamDecompression-12   780MB/s ± 1%   792MB/s ± 1%   +1.52%  (p=0.000 n=17+20)
Compression-12           139MB/s ± 1%   128MB/s ± 0%   -7.55%  (p=0.000 n=17+17)
Decompression-12         937MB/s ± 3%   966MB/s ± 1%   +3.10%  (p=0.000 n=19+19)

So need to do benchmarks on other hardware and payload.

@Viq111
Copy link
Collaborator Author

Viq111 commented Apr 22, 2022

Interpreted results

It seems 1.5.2 has roughly similar performance, you can expect a -10% / +10% performance impact on different payloads as the raw data show below

Data points

Benchmark on AWS instances: c5.2xlarge for x86, c6g.2xlarge for ARM:

mr payload:

### AWS c5.2xl ###
name                   old time/op    new time/op    delta
BulkCompress-8           3.31µs ± 1%    3.43µs ± 1%   +3.70%  (p=0.000 n=20+20)
BulkDecompress-8         2.77µs ± 0%    2.78µs ± 1%   +0.52%  (p=0.001 n=20+20)
CtxCompression-8         89.3ms ± 1%    95.9ms ± 2%   +7.41%  (p=0.000 n=20+19)
CtxDecompression-8       12.9ms ± 1%    12.2ms ± 2%   -4.92%  (p=0.000 n=20+20)
StreamCompression-8       102ms ± 1%     113ms ± 2%  +10.47%  (p=0.000 n=20+20)
StreamDecompression-8    15.4ms ± 1%    14.9ms ± 2%   -3.19%  (p=0.000 n=20+20)
Compression-8            91.2ms ± 1%    95.5ms ± 1%   +4.75%  (p=0.000 n=19+20)
Decompression-8          12.8ms ± 1%    12.4ms ± 3%   -2.84%  (p=0.000 n=19+18)

name                   old speed      new speed      delta
BulkCompress-8         27.8MB/s ± 1%  26.8MB/s ± 1%   -3.57%  (p=0.000 n=20+20)
BulkDecompress-8       20.6MB/s ± 0%  20.5MB/s ± 1%   -0.51%  (p=0.001 n=20+20)
CtxCompression-8        112MB/s ± 1%   104MB/s ± 2%   -6.89%  (p=0.000 n=20+19)
CtxDecompression-8      775MB/s ± 1%   815MB/s ± 2%   +5.18%  (p=0.000 n=20+20)
StreamCompression-8    97.7MB/s ± 1%  88.5MB/s ± 2%   -9.47%  (p=0.000 n=20+20)
StreamDecompression-8   646MB/s ± 1%   667MB/s ± 2%   +3.30%  (p=0.000 n=20+20)
Compression-8           109MB/s ± 1%   104MB/s ± 1%   -4.54%  (p=0.000 n=19+20)
Decompression-8         781MB/s ± 1%   804MB/s ± 3%   +3.01%  (p=0.000 n=20+18)


### AWS c6g.2xl ###
name                   old time/op    new time/op    delta
BulkCompress-8           2.13µs ± 3%    2.18µs ± 1%  +2.79%  (p=0.000 n=20+20)
BulkDecompress-8         1.43µs ± 2%    1.43µs ± 2%    ~     (p=0.490 n=20+19)
CtxCompression-8          147ms ± 1%     151ms ± 2%  +2.79%  (p=0.000 n=20+20)
CtxDecompression-8       21.1ms ± 3%    20.3ms ± 2%  -4.03%  (p=0.000 n=20+20)
StreamCompression-8       165ms ± 3%     171ms ± 3%  +4.11%  (p=0.000 n=20+20)
StreamDecompression-8    25.0ms ± 2%    24.3ms ± 2%  -2.65%  (p=0.000 n=19+20)
Compression-8             145ms ± 3%     151ms ± 3%  +4.52%  (p=0.000 n=20+20)
Decompression-8          20.9ms ± 4%    20.1ms ± 2%  -3.90%  (p=0.000 n=20+20)

name                   old speed      new speed      delta
BulkCompress-8         43.3MB/s ± 3%  42.1MB/s ± 1%  -2.73%  (p=0.000 n=20+20)
BulkDecompress-8       39.9MB/s ± 2%  40.0MB/s ± 2%    ~     (p=0.465 n=20+19)
CtxCompression-8       67.8MB/s ± 1%  66.0MB/s ± 2%  -2.71%  (p=0.000 n=20+20)
CtxDecompression-8      472MB/s ± 3%   492MB/s ± 2%  +4.19%  (p=0.000 n=20+20)
StreamCompression-8    60.6MB/s ± 3%  58.2MB/s ± 3%  -3.96%  (p=0.000 n=20+20)
StreamDecompression-8   399MB/s ± 2%   410MB/s ± 2%  +2.73%  (p=0.000 n=19+20)
Compression-8          68.9MB/s ± 3%  65.9MB/s ± 3%  -4.32%  (p=0.000 n=20+20)
Decompression-8         476MB/s ± 3%   496MB/s ± 2%  +4.06%  (p=0.000 n=20+20)

On mr:

  • On x86, compression is 4-9% slower
  • On x86, decompression is 0-5% faster
  • On ARM, compression is 3-4% slower
  • On ARM, decompression is 4% faster

datadog batch_series result json payload:

### AWS c5.2xl ###
name                   old time/op    new time/op    delta
BulkCompress-8           3.30µs ± 0%    3.45µs ± 1%  +4.46%  (p=0.000 n=20+20)
BulkDecompress-8         2.77µs ± 1%    2.77µs ± 0%    ~     (p=0.794 n=20+20)
CtxCompression-8          173µs ± 1%     170µs ± 1%  -1.35%  (p=0.000 n=19+20)
CtxDecompression-8       27.9µs ± 0%    28.2µs ± 0%  +0.94%  (p=0.000 n=19+20)
StreamCompression-8      3.35µs ± 0%    3.37µs ± 0%  +0.43%  (p=0.000 n=20+19)
StreamDecompression-8    30.2µs ± 0%    30.4µs ± 0%  +0.82%  (p=0.000 n=18+20)
Compression-8             180µs ± 1%     176µs ± 0%  -2.08%  (p=0.000 n=19+19)
Decompression-8          29.8µs ± 1%    30.0µs ± 0%  +0.48%  (p=0.000 n=20+19)

name                   old speed      new speed      delta
BulkCompress-8         27.8MB/s ± 0%  26.7MB/s ± 1%  -4.27%  (p=0.000 n=20+20)
BulkDecompress-8       20.5MB/s ± 1%  20.6MB/s ± 0%    ~     (p=0.732 n=20+20)
CtxCompression-8        116MB/s ± 1%   118MB/s ± 1%  +1.37%  (p=0.000 n=19+20)
CtxDecompression-8      718MB/s ± 0%   712MB/s ± 0%  -0.94%  (p=0.000 n=19+20)
StreamCompression-8    5.98GB/s ± 0%  5.95GB/s ± 0%  -0.42%  (p=0.000 n=20+19)
StreamDecompression-8   664MB/s ± 0%   659MB/s ± 0%  -0.81%  (p=0.000 n=18+20)
Compression-8           112MB/s ± 1%   114MB/s ± 0%  +2.13%  (p=0.000 n=19+19)
Decompression-8         672MB/s ± 1%   669MB/s ± 0%  -0.48%  (p=0.000 n=20+19)

### AWS c6g.2xl ###
name                   old time/op    new time/op    delta
BulkCompress-8           2.08µs ± 1%    2.17µs ± 1%   +4.27%  (p=0.000 n=19+20)
BulkDecompress-8         1.44µs ± 2%    1.44µs ± 1%   -0.39%  (p=0.034 n=20+20)
CtxCompression-8          247µs ± 0%     224µs ± 1%   -9.46%  (p=0.000 n=20+20)
CtxDecompression-8       42.4µs ± 0%    42.4µs ± 0%     ~     (p=0.658 n=19+18)
StreamCompression-8      3.61µs ± 1%    3.58µs ± 1%   -1.06%  (p=0.000 n=20+20)
StreamDecompression-8    43.0µs ± 0%    42.9µs ± 0%     ~     (p=0.525 n=20+20)
Compression-8             248µs ± 0%     228µs ± 1%   -8.05%  (p=0.000 n=20+20)
Decompression-8          42.5µs ± 0%    41.8µs ± 0%   -1.50%  (p=0.000 n=18+19)

name                   old speed      new speed      delta
BulkCompress-8         44.1MB/s ± 1%  42.3MB/s ± 1%   -4.09%  (p=0.000 n=19+20)
BulkDecompress-8       39.4MB/s ± 2%  39.6MB/s ± 1%   +0.51%  (p=0.013 n=19+20)
CtxCompression-8       81.1MB/s ± 0%  89.6MB/s ± 1%  +10.44%  (p=0.000 n=20+20)
CtxDecompression-8      473MB/s ± 0%   473MB/s ± 0%     ~     (p=0.647 n=19+18)
StreamCompression-8    5.55GB/s ± 1%  5.61GB/s ± 1%   +1.07%  (p=0.000 n=20+20)
StreamDecompression-8   467MB/s ± 0%   467MB/s ± 0%     ~     (p=0.542 n=20+20)
Compression-8          81.0MB/s ± 0%  88.1MB/s ± 1%   +8.76%  (p=0.000 n=20+20)
Decompression-8         472MB/s ± 0%   479MB/s ± 0%   +1.52%  (p=0.000 n=18+19)

On json payload:
Roughly same before/after profile. It seems Compression on ARM is 8-10% faster

Since we want to stay on upstream version, we should merge this PR but I will make a note of the performance impact in the release notes

@Viq111 Viq111 marked this pull request as ready for review April 22, 2022 19:28
Copy link

@anatolebeuzon anatolebeuzon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the benchmarks 👌

@Viq111 Viq111 merged commit e5990c1 into 1.x Apr 22, 2022
@Viq111 Viq111 deleted the viq111/1.5.2 branch April 22, 2022 19:53
kodiakhq bot referenced this pull request in cloudquery/cloudquery Jan 1, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/DataDog/zstd](https://github.com/DataDog/zstd) | indirect | patch | `v1.5.0` -> `v1.5.2` |

---

### Release Notes

<details>
<summary>DataDog/zstd</summary>

### [`v1.5.2`](https://github.com/DataDog/zstd/releases/tag/v1.5.2): zstd 1.5.2

[Compare Source](https://github.com/DataDog/zstd/compare/v1.5.2...v1.5.2)

This release updates the upstream zstd version to [1.5.2](https://github.com/facebook/zstd/releases/tag/v1.5.2) ([https://github.com/DataDog/zstd/pull/116](https://github.com/DataDog/zstd/pull/116))

The update `1.5.0` -> `1.5.2` overall has a similar performance profile. Please note that depending on the workload, performance could vary by -10% / +10%

### [`v1.5.2+patch1`](https://github.com/DataDog/zstd/releases/tag/v1.5.2%2Bpatch1): zstd 1.5.2 - wrapper patches 1

[Compare Source](https://github.com/DataDog/zstd/compare/v1.5.0...v1.5.2)

#### What's Changed

-   Fix unneededly allocated large decompression buffer by [@&#8203;XiaochenCui](https://github.com/XiaochenCui) ([#&#8203;118](https://github.com/DataDog/zstd/issues/118)) & [@&#8203;Viq111](https://github.com/Viq111) in [https://github.com/DataDog/zstd/pull/120](https://github.com/DataDog/zstd/pull/120)
-   Add SetNbWorkers api to the writer code (see [#&#8203;108](https://github.com/DataDog/zstd/issues/108)) by [@&#8203;bsergean](https://github.com/bsergean) in [https://github.com/DataDog/zstd/pull/117](https://github.com/DataDog/zstd/pull/117)
    -   For large workloads, the performance can be improved by 3-6x (see [https://github.com/DataDog/zstd/pull/117#issuecomment-1147812767](https://github.com/DataDog/zstd/pull/117#issuecomment-1147812767))
    -   `Write()` becomes async with workers > 1, make sure you read the method documentation before using

#### New Contributors

-   [@&#8203;bsergean](https://github.com/bsergean) made their first contribution in [https://github.com/DataDog/zstd/pull/117](https://github.com/DataDog/zstd/pull/117)
-   [@&#8203;XiaochenCui](https://github.com/XiaochenCui) for his work on [https://github.com/DataDog/zstd/pull/118](https://github.com/DataDog/zstd/pull/118) that led to [#&#8203;120](https://github.com/DataDog/zstd/issues/120)

**Full Changelog**: DataDog/zstd@v1.5.2...v1.5.2+patch1

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 3am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC43Ny4wIiwidXBkYXRlZEluVmVyIjoiMzQuNzcuMCJ9-->
@Viq111 Viq111 mentioned this pull request Apr 20, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants