Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zlib-ng library, optimized for ARM64 #61883

Closed
wants to merge 3 commits into from

Conversation

carlossanlop
Copy link
Member

@carlossanlop carlossanlop commented Nov 21, 2021

I'd like us to explore the possibility of consuming zlib-ng, either as one of our zlib implementations, or as the sole one.

https://github.com/zlib-ng/zlib-ng

From the readme:

The motivation for this fork came after seeing several 3rd party contributions containing new optimizations not getting implemented into the official zlib repository.
Mark Adler has been maintaining zlib for a very long time, and he has done a great job and hopefully he will continue for a long time yet. The idea of zlib-ng is not to replace zlib, but to co-exist as a drop-in replacement with a lower threshold for code change.

Features:

  • Zlib compatible API with support for dual-linking
  • Modernized native API based on zlib API for ease of porting
  • Modern C11 syntax and a clean code layout
  • Deflate medium and quick algorithms based on Intels zlib fork
  • Support for CPU intrinsics when available
  • Adler32 implementation using SSSE3, AVX2, Neon, VMX & VSX
  • CRC32-B implementation using PCLMULQDQ & ACLE
  • Hash table implementation using CRC32-C intrinsics on x86 and ARM
  • Slide hash implementations using SSE2, AVX2, Neon, VMX & VSX
  • Compare256/258 implementations using SSE4.2 & AVX2
  • Inflate chunk copying using SSE2, AVX2, Neon & VSX
  • CRC32 implementation using IBM Z vector instructions
  • Support for hardware-accelerated deflate using IBM Z DFLTCC
  • Unaligned memory read/writes and large bit buffer improvements
  • Includes improvements from Cloudflare and Intel forks
  • Configure, CMake, and NMake build system support

Note that Mark Adler is one of the top contributors to that repo: https://github.com/zlib-ng/zlib-ng/graphs/contributors

Motivation

We currently consume two zlib versions:

  • zlib-intel: optimized for x86 and x64.
  • zlib: the original flavor, which is the fallback for all other architectures, like arm and arm64.

The zlib-ng library is optimized for arm and arm64, so we can take advantage of it, and leave the zlib original library as fallback for all other architectures.

License

The zlib-ng library is provided with the zlib License: https://github.com/zlib-ng/zlib-ng/blob/develop/LICENSE.md
It's similar to the Booster licenses from madler/zlib (original zlib flavor): https://github.com/madler/zlib/contrib/dotzlib/LICENSE_1_0.txt
And from jtkukunas/zlib (intel): https://github.com/jtkukunas/zlib/contrib/dotzlib/LICENSE_1_0.txt

Testing

I tested my changes in both my x64 and arm64 machines.
All unit test passed.

Performance

The perf comparison in my arm64 machine zlib-ng is faster than zlib for most cases:

DeflateStream
Method Job Toolchain level file Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Allocated
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 4,516.9 μs 45.25 μs 42.33 μs 4,497.4 μs 4,479.4 μs 4,609.3 μs 1.48 0.02 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 3,051.0 μs 9.84 μs 8.72 μs 3,047.3 μs 3,042.8 μs 3,071.6 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 350.3 μs 0.96 μs 0.75 μs 350.1 μs 349.6 μs 352.2 μs 0.78 0.00 2.8409 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 446.4 μs 0.86 μs 0.76 μs 446.4 μs 445.0 μs 448.0 μs 1.00 0.00 3.5714 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 4,599.0 μs 11.17 μs 9.90 μs 4,599.3 μs 4,586.3 μs 4,616.1 μs 0.53 0.00 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 8,647.3 μs 17.93 μs 15.90 μs 8,649.4 μs 8,617.3 μs 8,676.2 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 583.3 μs 3.23 μs 3.02 μs 582.4 μs 578.9 μs 586.9 μs 0.98 0.01 2.3148 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 597.5 μs 4.07 μs 3.81 μs 595.8 μs 593.1 μs 605.2 μs 1.00 0.00 2.4038 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 870.6 μs 9.99 μs 9.35 μs 873.6 μs 856.2 μs 889.9 μs 0.44 0.01 3.4722 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 1,979.4 μs 7.90 μs 7.39 μs 1,980.1 μs 1,966.2 μs 1,993.3 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 134.4 μs 1.22 μs 1.02 μs 134.0 μs 133.2 μs 136.3 μs 0.91 0.01 3.8043 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 147.8 μs 0.92 μs 0.82 μs 147.7 μs 146.1 μs 149.3 μs 1.00 0.00 3.5377 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,628.7 μs 11.08 μs 10.36 μs 2,630.5 μs 2,607.2 μs 2,645.6 μs 0.91 0.01 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,901.1 μs 20.49 μs 19.17 μs 2,898.1 μs 2,874.2 μs 2,940.5 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 388.8 μs 0.79 μs 0.74 μs 388.8 μs 387.2 μs 389.9 μs 0.86 0.00 3.1250 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 452.2 μs 1.37 μs 1.28 μs 451.9 μs 449.3 μs 454.7 μs 1.00 0.00 3.6765 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,641.2 μs 12.02 μs 11.25 μs 2,643.9 μs 2,620.4 μs 2,654.6 μs 1.14 0.01 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,308.9 μs 10.24 μs 9.07 μs 2,307.3 μs 2,297.7 μs 2,325.3 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 625.9 μs 4.90 μs 4.58 μs 624.6 μs 619.8 μs 632.4 μs 0.99 0.01 2.5000 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 634.9 μs 3.48 μs 3.09 μs 633.7 μs 631.1 μs 641.6 μs 1.00 0.00 2.5000 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 606.2 μs 3.34 μs 2.96 μs 606.8 μs 598.0 μs 609.4 μs 1.29 0.01 2.4038 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 471.3 μs 3.67 μs 3.26 μs 471.1 μs 466.1 μs 477.1 μs 1.00 0.00 3.7879 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 136.8 μs 0.46 μs 0.43 μs 136.8 μs 135.7 μs 137.3 μs 0.85 0.01 3.8043 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 160.4 μs 0.92 μs 0.82 μs 160.3 μs 158.9 μs 162.1 μs 1.00 0.00 3.9063 8 KB
GZipStream
Method Job Toolchain level file Mean Error StdDev Median Min Max Ratio Gen 0 Allocated
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 4,598.6 μs 11.21 μs 9.93 μs 4,595.6 μs 4,586.4 μs 4,616.6 μs 1.46 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 3,144.2 μs 6.69 μs 5.93 μs 3,144.8 μs 3,130.6 μs 3,150.9 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 452.5 μs 1.45 μs 1.28 μs 452.3 μs 450.7 μs 454.9 μs 0.82 3.5714 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 549.6 μs 1.08 μs 0.96 μs 549.5 μs 548.1 μs 551.9 μs 1.00 2.2321 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 4,711.0 μs 4.58 μs 3.82 μs 4,710.8 μs 4,705.3 μs 4,718.9 μs 0.54 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 8,778.4 μs 18.25 μs 17.07 μs 8,781.9 μs 8,754.0 μs 8,806.8 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 708.1 μs 4.59 μs 4.30 μs 707.4 μs 703.0 μs 716.3 μs 0.98 2.8409 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 721.1 μs 4.34 μs 4.06 μs 719.5 μs 716.8 μs 730.7 μs 1.00 2.8409 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 901.9 μs 10.42 μs 9.24 μs 904.1 μs 888.6 μs 915.7 μs 0.45 3.4722 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 2,009.7 μs 6.21 μs 5.81 μs 2,009.5 μs 1,998.0 μs 2,018.5 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 168.8 μs 1.35 μs 1.26 μs 168.5 μs 167.4 μs 171.3 μs 0.94 4.0323 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 180.5 μs 1.67 μs 1.56 μs 180.4 μs 178.6 μs 183.8 μs 1.00 3.5920 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,727.5 μs 10.01 μs 8.87 μs 2,728.6 μs 2,712.8 μs 2,743.3 μs 0.90 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 3,027.0 μs 15.30 μs 13.56 μs 3,029.0 μs 3,000.0 μs 3,048.8 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 499.1 μs 3.58 μs 3.35 μs 499.6 μs 495.5 μs 505.4 μs 0.90 4.0323 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 554.7 μs 1.61 μs 1.43 μs 554.5 μs 552.4 μs 557.4 μs 1.00 2.2321 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,761.1 μs 6.86 μs 5.73 μs 2,759.5 μs 2,754.1 μs 2,771.4 μs 1.13 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,448.7 μs 17.88 μs 15.85 μs 2,444.4 μs 2,431.1 μs 2,488.6 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 752.8 μs 4.91 μs 4.36 μs 755.1 μs 746.5 μs 758.2 μs 0.99 2.9762 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 759.1 μs 2.41 μs 2.14 μs 759.2 μs 755.6 μs 762.8 μs 1.00 2.9762 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 637.9 μs 6.66 μs 5.90 μs 638.7 μs 630.8 μs 652.5 μs 1.27 2.5000 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 501.3 μs 3.89 μs 3.64 μs 502.5 μs 494.6 μs 506.8 μs 1.00 3.9063 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 169.2 μs 0.43 μs 0.38 μs 169.1 μs 168.5 μs 169.9 μs 0.88 3.3967 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 192.7 μs 1.92 μs 1.80 μs 192.7 μs 190.5 μs 196.6 μs 1.00 3.8110 8 KB
ZLibStream
Method Job Toolchain level file Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Allocated
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 4,538.8 μs 8.48 μs 7.52 μs 4,538.3 μs 4,528.5 μs 4,555.5 μs 1.46 0.01 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 3,114.4 μs 14.89 μs 13.20 μs 3,115.4 μs 3,085.3 μs 3,136.3 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 396.1 μs 0.91 μs 0.76 μs 396.1 μs 394.9 μs 397.4 μs 0.80 0.00 3.2051 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 494.2 μs 1.34 μs 1.25 μs 494.0 μs 492.5 μs 496.1 μs 1.00 0.00 3.9063 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 4,640.9 μs 7.07 μs 6.27 μs 4,642.3 μs 4,627.6 μs 4,647.9 μs 0.53 0.00 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 8,705.4 μs 11.68 μs 10.93 μs 8,706.9 μs 8,685.1 μs 8,723.6 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 639.9 μs 2.75 μs 2.57 μs 640.8 μs 635.7 μs 643.6 μs 0.98 0.01 2.5000 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 653.2 μs 3.92 μs 3.67 μs 653.0 μs 647.7 μs 658.6 μs 1.00 0.00 2.6042 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 887.6 μs 14.27 μs 12.65 μs 889.3 μs 868.6 μs 907.7 μs 0.45 0.01 3.4722 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 1,991.8 μs 7.30 μs 6.47 μs 1,992.8 μs 1,979.6 μs 2,000.2 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 149.2 μs 0.97 μs 0.91 μs 149.1 μs 147.9 μs 151.0 μs 0.92 0.01 3.5714 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 162.3 μs 1.42 μs 1.33 μs 161.6 μs 160.5 μs 165.0 μs 1.00 0.00 3.9063 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,678.2 μs 11.88 μs 11.11 μs 2,676.0 μs 2,655.7 μs 2,701.0 μs 0.90 0.01 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,961.9 μs 34.88 μs 30.92 μs 2,957.6 μs 2,923.1 μs 3,006.7 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 436.6 μs 0.88 μs 0.83 μs 436.4 μs 435.2 μs 438.1 μs 0.87 0.00 3.4722 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 501.5 μs 0.72 μs 0.60 μs 501.6 μs 500.3 μs 502.5 μs 1.00 0.00 3.9063 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,705.5 μs 12.04 μs 11.27 μs 2,707.4 μs 2,689.8 μs 2,727.7 μs 1.09 0.04 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,470.7 μs 83.19 μs 89.01 μs 2,423.3 μs 2,383.9 μs 2,707.8 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 684.7 μs 5.68 μs 5.32 μs 688.1 μs 677.8 μs 690.6 μs 0.98 0.02 2.7174 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 699.6 μs 8.10 μs 7.58 μs 696.9 μs 690.0 μs 711.5 μs 1.00 0.00 2.8409 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 620.1 μs 6.18 μs 5.48 μs 620.6 μs 612.0 μs 631.8 μs 1.28 0.02 2.4038 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 485.9 μs 4.65 μs 4.35 μs 484.4 μs 480.6 μs 496.3 μs 1.00 0.00 3.7879 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 151.6 μs 1.22 μs 1.14 μs 151.2 μs 150.0 μs 153.6 μs 0.86 0.01 3.6765 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 176.5 μs 0.79 μs 0.66 μs 176.6 μs 175.2 μs 177.5 μs 1.00 0.00 3.5112 8 KB

We didn't have a ZLibStream benchmark. I created one so I could use it for this change. dotnet/performance#2152

@ghost
Copy link

ghost commented Nov 21, 2021

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

https://github.com/zlib-ng/zlib-ng

From the readme:

The motivation for this fork came after seeing several 3rd party contributions containing new optimizations not getting implemented into the official zlib repository.
Mark Adler has been maintaining zlib for a very long time, and he has done a great job and hopefully he will continue for a long time yet. The idea of zlib-ng is not to replace zlib, but to co-exist as a drop-in replacement with a lower threshold for code change.

Features:

  • Zlib compatible API with support for dual-linking
  • Modernized native API based on zlib API for ease of porting
  • Modern C11 syntax and a clean code layout
  • Deflate medium and quick algorithms based on Intels zlib fork
  • Support for CPU intrinsics when available
  • Adler32 implementation using SSSE3, AVX2, Neon, VMX & VSX
  • CRC32-B implementation using PCLMULQDQ & ACLE
  • Hash table implementation using CRC32-C intrinsics on x86 and ARM
  • Slide hash implementations using SSE2, AVX2, Neon, VMX & VSX
  • Compare256/258 implementations using SSE4.2 & AVX2
  • Inflate chunk copying using SSE2, AVX2, Neon & VSX
  • CRC32 implementation using IBM Z vector instructions
  • Support for hardware-accelerated deflate using IBM Z DFLTCC
  • Unaligned memory read/writes and large bit buffer improvements
  • Includes improvements from Cloudflare and Intel forks
  • Configure, CMake, and NMake build system support

Note that Mark Adler is one of the top contributors to that repo: https://github.com/zlib-ng/zlib-ng/graphs/contributors

Motivation

We currently consume two zlib versions:

  • zlib-intel: optimized for x86 and x64.
  • zlib: the original flavor, which is the fallback for all other architectures, like arm and arm64.

License

The zlib-ng library is provided with the zlib License: https://github.com/zlib-ng/zlib-ng/blob/develop/LICENSE.md
It's similar to the Booster licenses from madler/zlib (original zlib flavor): https://github.com/madler/zlib/contrib/dotzlib/LICENSE_1_0.txt
And from jtkukunas/zlib (intel): https://github.com/jtkukunas/zlib/contrib/dotzlib/LICENSE_1_0.txt

Testing

I tested my changes in both my x64 and arm64 machines.
All unit test passed.

Performance

The perf comparison in my arm64 machine zlib-ng is faster than zlib for most cases:

DeflateStream
Method Job Toolchain level file Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Allocated
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 4,516.9 μs 45.25 μs 42.33 μs 4,497.4 μs 4,479.4 μs 4,609.3 μs 1.48 0.02 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 3,051.0 μs 9.84 μs 8.72 μs 3,047.3 μs 3,042.8 μs 3,071.6 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 350.3 μs 0.96 μs 0.75 μs 350.1 μs 349.6 μs 352.2 μs 0.78 0.00 2.8409 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 446.4 μs 0.86 μs 0.76 μs 446.4 μs 445.0 μs 448.0 μs 1.00 0.00 3.5714 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 4,599.0 μs 11.17 μs 9.90 μs 4,599.3 μs 4,586.3 μs 4,616.1 μs 0.53 0.00 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 8,647.3 μs 17.93 μs 15.90 μs 8,649.4 μs 8,617.3 μs 8,676.2 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 583.3 μs 3.23 μs 3.02 μs 582.4 μs 578.9 μs 586.9 μs 0.98 0.01 2.3148 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 597.5 μs 4.07 μs 3.81 μs 595.8 μs 593.1 μs 605.2 μs 1.00 0.00 2.4038 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 870.6 μs 9.99 μs 9.35 μs 873.6 μs 856.2 μs 889.9 μs 0.44 0.01 3.4722 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 1,979.4 μs 7.90 μs 7.39 μs 1,980.1 μs 1,966.2 μs 1,993.3 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 134.4 μs 1.22 μs 1.02 μs 134.0 μs 133.2 μs 136.3 μs 0.91 0.01 3.8043 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 147.8 μs 0.92 μs 0.82 μs 147.7 μs 146.1 μs 149.3 μs 1.00 0.00 3.5377 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,628.7 μs 11.08 μs 10.36 μs 2,630.5 μs 2,607.2 μs 2,645.6 μs 0.91 0.01 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,901.1 μs 20.49 μs 19.17 μs 2,898.1 μs 2,874.2 μs 2,940.5 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 388.8 μs 0.79 μs 0.74 μs 388.8 μs 387.2 μs 389.9 μs 0.86 0.00 3.1250 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 452.2 μs 1.37 μs 1.28 μs 451.9 μs 449.3 μs 454.7 μs 1.00 0.00 3.6765 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,641.2 μs 12.02 μs 11.25 μs 2,643.9 μs 2,620.4 μs 2,654.6 μs 1.14 0.01 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,308.9 μs 10.24 μs 9.07 μs 2,307.3 μs 2,297.7 μs 2,325.3 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 625.9 μs 4.90 μs 4.58 μs 624.6 μs 619.8 μs 632.4 μs 0.99 0.01 2.5000 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 634.9 μs 3.48 μs 3.09 μs 633.7 μs 631.1 μs 641.6 μs 1.00 0.00 2.5000 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 606.2 μs 3.34 μs 2.96 μs 606.8 μs 598.0 μs 609.4 μs 1.29 0.01 2.4038 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 471.3 μs 3.67 μs 3.26 μs 471.1 μs 466.1 μs 477.1 μs 1.00 0.00 3.7879 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 136.8 μs 0.46 μs 0.43 μs 136.8 μs 135.7 μs 137.3 μs 0.85 0.01 3.8043 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 160.4 μs 0.92 μs 0.82 μs 160.3 μs 158.9 μs 162.1 μs 1.00 0.00 3.9063 8 KB
GZipStream
Method Job Toolchain level file Mean Error StdDev Median Min Max Ratio Gen 0 Allocated
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 4,598.6 μs 11.21 μs 9.93 μs 4,595.6 μs 4,586.4 μs 4,616.6 μs 1.46 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 3,144.2 μs 6.69 μs 5.93 μs 3,144.8 μs 3,130.6 μs 3,150.9 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 452.5 μs 1.45 μs 1.28 μs 452.3 μs 450.7 μs 454.9 μs 0.82 3.5714 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 549.6 μs 1.08 μs 0.96 μs 549.5 μs 548.1 μs 551.9 μs 1.00 2.2321 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 4,711.0 μs 4.58 μs 3.82 μs 4,710.8 μs 4,705.3 μs 4,718.9 μs 0.54 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 8,778.4 μs 18.25 μs 17.07 μs 8,781.9 μs 8,754.0 μs 8,806.8 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 708.1 μs 4.59 μs 4.30 μs 707.4 μs 703.0 μs 716.3 μs 0.98 2.8409 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 721.1 μs 4.34 μs 4.06 μs 719.5 μs 716.8 μs 730.7 μs 1.00 2.8409 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 901.9 μs 10.42 μs 9.24 μs 904.1 μs 888.6 μs 915.7 μs 0.45 3.4722 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 2,009.7 μs 6.21 μs 5.81 μs 2,009.5 μs 1,998.0 μs 2,018.5 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 168.8 μs 1.35 μs 1.26 μs 168.5 μs 167.4 μs 171.3 μs 0.94 4.0323 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 180.5 μs 1.67 μs 1.56 μs 180.4 μs 178.6 μs 183.8 μs 1.00 3.5920 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,727.5 μs 10.01 μs 8.87 μs 2,728.6 μs 2,712.8 μs 2,743.3 μs 0.90 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 3,027.0 μs 15.30 μs 13.56 μs 3,029.0 μs 3,000.0 μs 3,048.8 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 499.1 μs 3.58 μs 3.35 μs 499.6 μs 495.5 μs 505.4 μs 0.90 4.0323 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 554.7 μs 1.61 μs 1.43 μs 554.5 μs 552.4 μs 557.4 μs 1.00 2.2321 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,761.1 μs 6.86 μs 5.73 μs 2,759.5 μs 2,754.1 μs 2,771.4 μs 1.13 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,448.7 μs 17.88 μs 15.85 μs 2,444.4 μs 2,431.1 μs 2,488.6 μs 1.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 752.8 μs 4.91 μs 4.36 μs 755.1 μs 746.5 μs 758.2 μs 0.99 2.9762 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 759.1 μs 2.41 μs 2.14 μs 759.2 μs 755.6 μs 762.8 μs 1.00 2.9762 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 637.9 μs 6.66 μs 5.90 μs 638.7 μs 630.8 μs 652.5 μs 1.27 2.5000 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 501.3 μs 3.89 μs 3.64 μs 502.5 μs 494.6 μs 506.8 μs 1.00 3.9063 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 169.2 μs 0.43 μs 0.38 μs 169.1 μs 168.5 μs 169.9 μs 0.88 3.3967 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 192.7 μs 1.92 μs 1.80 μs 192.7 μs 190.5 μs 196.6 μs 1.00 3.8110 8 KB
ZLibStream
Method Job Toolchain level file Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Allocated
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 4,538.8 μs 8.48 μs 7.52 μs 4,538.3 μs 4,528.5 μs 4,555.5 μs 1.46 0.01 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 3,114.4 μs 14.89 μs 13.20 μs 3,115.4 μs 3,085.3 μs 3,136.3 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 396.1 μs 0.91 μs 0.76 μs 396.1 μs 394.9 μs 397.4 μs 0.80 0.00 3.2051 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal TestDocument.pdf 494.2 μs 1.34 μs 1.25 μs 494.0 μs 492.5 μs 496.1 μs 1.00 0.00 3.9063 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 4,640.9 μs 7.07 μs 6.27 μs 4,642.3 μs 4,627.6 μs 4,647.9 μs 0.53 0.00 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 8,705.4 μs 11.68 μs 10.93 μs 8,706.9 μs 8,685.1 μs 8,723.6 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 639.9 μs 2.75 μs 2.57 μs 640.8 μs 635.7 μs 643.6 μs 0.98 0.01 2.5000 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal alice29.txt 653.2 μs 3.92 μs 3.67 μs 653.0 μs 647.7 μs 658.6 μs 1.00 0.00 2.6042 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 887.6 μs 14.27 μs 12.65 μs 889.3 μs 868.6 μs 907.7 μs 0.45 0.01 3.4722 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 1,991.8 μs 7.30 μs 6.47 μs 1,992.8 μs 1,979.6 μs 2,000.2 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 149.2 μs 0.97 μs 0.91 μs 149.1 μs 147.9 μs 151.0 μs 0.92 0.01 3.5714 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Optimal sum 162.3 μs 1.42 μs 1.33 μs 161.6 μs 160.5 μs 165.0 μs 1.00 0.00 3.9063 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,678.2 μs 11.88 μs 11.11 μs 2,676.0 μs 2,655.7 μs 2,701.0 μs 0.90 0.01 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 2,961.9 μs 34.88 μs 30.92 μs 2,957.6 μs 2,923.1 μs 3,006.7 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 436.6 μs 0.88 μs 0.83 μs 436.4 μs 435.2 μs 438.1 μs 0.87 0.00 3.4722 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest TestDocument.pdf 501.5 μs 0.72 μs 0.60 μs 501.6 μs 500.3 μs 502.5 μs 1.00 0.00 3.9063 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,705.5 μs 12.04 μs 11.27 μs 2,707.4 μs 2,689.8 μs 2,727.7 μs 1.09 0.04 - 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 2,470.7 μs 83.19 μs 89.01 μs 2,423.3 μs 2,383.9 μs 2,707.8 μs 1.00 0.00 - 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 684.7 μs 5.68 μs 5.32 μs 688.1 μs 677.8 μs 690.6 μs 0.98 0.02 2.7174 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest alice29.txt 699.6 μs 8.10 μs 7.58 μs 696.9 μs 690.0 μs 711.5 μs 1.00 0.00 2.8409 8 KB
Compress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 620.1 μs 6.18 μs 5.48 μs 620.6 μs 612.0 μs 631.8 μs 1.28 0.02 2.4038 8 KB
Compress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 485.9 μs 4.65 μs 4.35 μs 484.4 μs 480.6 μs 496.3 μs 1.00 0.00 3.7879 8 KB
Decompress Job-VVXQFG \runtime\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 151.6 μs 1.22 μs 1.14 μs 151.2 μs 150.0 μs 153.6 μs 0.86 0.01 3.6765 8 KB
Decompress Job-AZGVVU \runtime_base\artifacts\bin\testhost\net7.0-windows-Release-arm64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe Fastest sum 176.5 μs 0.79 μs 0.66 μs 176.6 μs 175.2 μs 177.5 μs 1.00 0.00 3.5112 8 KB

We didn't have a ZLibStream benchmark. I created one so I could use it for this change. dotnet/performance#2152

Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.IO.Compression

Milestone: .NET 7.0

@carlossanlop
Copy link
Member Author

Need help getting the 3rd commit 88b80e5 reviewed closely, since I had to make some changes in cmake, *.c and *.h files to get rid of build failures in arm64.

Also, I am including their license file, the original readme file from the zlib-ng repo. I am imitating one of the other two zlibs, since we also included some non-compilable files.

@jkotas
Copy link
Member

jkotas commented Nov 21, 2021

Need help getting the 3rd commit 88b80e5 reviewed closely, since I had to make some changes in cmake, *.c and *.h files to get rid of build failures in arm64.

Some of these changes look like a good candidate for upstreaming into the zlib-ng repo.

@jkotas
Copy link
Member

jkotas commented Nov 21, 2021

zlib-intel: optimized for x86 and x64.

Can we delete this copy? The readme for zlib-ng says that it includes all Intel optimizations.

@nmoinvaz
Copy link

nmoinvaz commented Nov 21, 2021

The zlib-ng library is optimized for arm and arm64, so we can take advantage of it, and leave the zlib original library as fallback for all other architectures.

You shouldn't need zlib as a fallback. Zlib-ng has optimizations for several architectures (including x86/x64/arm/aarch64). We have runtime cpu feature detection and use non-optimized functions if the hardware doesn't support it. In other words, zlib-ng has its fallbacks for all other architectures.

@am11
Copy link
Member

am11 commented Nov 21, 2021

This is windows only, right? i.e. on Unix, we still use zlib development package from package management?

Please add a version file similar to:

and to easily keep track of upstream changes.

@nmoinvaz
Copy link

It would be better to use zlib-ng on all platforms because then you get the speed benefits on all platforms.

@stephentoub
Copy link
Member

stephentoub commented Nov 21, 2021

Can we delete this copy? The readme for zlib-ng says that it includes all Intel optimizations.

Taking this a step further, this is now the third set of zlib files we have in libraries. Can we delete the other two and just use this one?

It would be better to use zlib-ng on all platforms

Historically we shied away from this, preferring to use the zlib inbox; we had to ship it for Windows because Windows didn't have one, but the Linux distros we target mostly do. However, at this point, with the improvements from Intel and CloudFlare, covering both x64 and ARM, with more platform targets varying in what they provide, and with the consistency benefits that come from doing the same thing across platforms, we should consider changing course and having the sources built into S.IO Compression.Native on all relevant platforms.

Some of these changes look like a good candidate for upstreaming into the zlib-ng repo.

+1. In addition to the general goodness of contributing back, for our own maintainability benefits we want to minimize our diff from upstream.

@@ -65,6 +65,13 @@ if (${CLR_CMAKE_HOST_ARCH} STREQUAL "x86")
add_compile_options(/Gz)
endif ()

if(${CLR_CMAKE_HOST_ARCH} STREQUAL arm OR ${CLR_CMAKE_HOST_ARCH} STREQUAL arm64)
add_compile_options(-DZLIB_COMPAT=0) # Must be explicitly set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want a zlib compatible API? What does this actually affect?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,19 @@
(C) 1995-2013 Jean-loup Gailly and Mark Adler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@nmoinvaz nmoinvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work so far.

@@ -10,7 +10,33 @@ if (GEN_SHARED_LIB)
include (GenerateExportHeader)
endif()

if(${CLR_CMAKE_HOST_ARCH} STREQUAL x86 OR ${CLR_CMAKE_HOST_ARCH} STREQUAL x64)
if(${CLR_CMAKE_HOST_ARCH} STREQUAL arm OR ${CLR_CMAKE_HOST_ARCH} STREQUAL arm64)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit should probably be broken into two. One that only adds the zlib-ng files. And another that connects dotnet runtime build system to use them. It would make it easier to review because then we can see what build system changes are needed for zlib-ng. But I realize at this point it would be difficult to split it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll do that.

zlib-ng/gzlib.c
zlib-ng/gzread.c
zlib-ng/gzwrite.c
# Exclude files to fix error LNK2019: unresolved external symbol 'gzFile' open referenced in functions gz_open, gz_close, gz_load, gz_comp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should find out why it is giving that linker error. If dotnet does not use any of the gz_* functions then you don’t need those sources and can also set -DWITH_GZFILEOP=0.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use add_subdirectory to add our CMake project?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into this again now that we're back from the Thanksgiving holidays.

I'm exploring the use of add_subdirectory and have some questions, but I'd like to first rebase this PR with the latest bits in main, because there was a big refactoring of these folders. I'll ask you the questions when I refresh the PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first rebase this PR with the latest bits in main

Maybe you can put zlib-ng under src/native/external as part of this rebase as discussed in
#61883 (comment), so that we do not have to move it later?


/* Added ZLIB_COMPAT check to choose between the two available zconf files */
#if defined(ZLIB_COMPAT)
# include "zconf.h.in"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be zconf.h. The zconf.h.in file is input to CMake and configure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically don’t conditionally include zconf.h. Instead we conditionally include zlib.h/zlib-ng.h in our projects based on ZLIB_COMPAT.

@jeffhandley jeffhandley modified the milestones: .NET 7.0, 7.0.0 Nov 22, 2021
@mtl1979
Copy link

mtl1979 commented Nov 22, 2021

Bypassing zlib-ng's own build system will disable most of the architecture-specific optimizations. This pull request is also missing most of the essential files.

@danmoseley
Copy link
Member

It's great to see the improvements in your Arm64 numbers, but some of the regressions are significant.

For Optimal | TestDocument.pdf, is the output size the same (not sure whether the compression level corresponding to Optimal is part of the spec or not). If not, can we repro this regression without .NET in the picture - perhaps that would be worth an issue opened in the upstream repo. If we become a consumer of it, we should certainly try to help improve it.

@nmoinvaz
Copy link

nmoinvaz commented Nov 22, 2021

It is worth noting that the compression levels used in zlib-ng are not 1-1 with zlib. This is because zlib-ng uses different/Intel compression algorithms that are not available in zlib. In some cases speed is favored over compression as in level 1.
https://github.com/zlib-ng/zlib-ng/wiki/Deflate-config-comparison

Here are some zlib-ng benchmarks for anybody who is interested:
zlib-ng/zlib-ng#871
You can also find many benchmarks performed across our PRs.

As @mtl1979 mentioned above, that by bypassing zlib-ng's CMake, it might not have been built with some optimizations configured.

@mtl1979
Copy link

mtl1979 commented Nov 22, 2021

Preferred way of including external CMake projects is using add_subdirectory() with EXCLUDE_FROM_ALL option... This makes building faster but includes all targets essential for building. Both source_dir and binary_dir parameters must be specified, as some files are generated by the build system.

Mixing "static" libraries and "dynamic" libraries is discouraged as on most Unix-systems those use different compiler and linker flags and on Windows, the linker can find matching symbols in unexpected places. This also means that order of the libraries (static vs. dynamic/shared) passed to linker is significant.

Build system of zlib-ng currently doesn't support compiling static library as position-independent code (PIC), but the compiler parameters can be passed manually if needed. This works only if no other library in the project (or parent projects) includes or requires zlib-ng.

@carlossanlop carlossanlop marked this pull request as draft November 22, 2021 20:12
@carlossanlop
Copy link
Member Author

Bypassing zlib-ng's own build system will disable most of the architecture-specific optimizations. This pull request is also missing most of the essential files.

Marking this PR as draft while I address the feedback. @mtl1979 @nmoinvaz thank you for your help. I am not very experienced in cmake, so your expertise is extremely valuable here so we can get it to build properly.

@mtl1979
Copy link

mtl1979 commented Nov 22, 2021

@carlossanlop If you need something added upstream to zlib-ng, you can ask me or @nmoinvaz as I have been contributor for zlib-ng since early days and @nmoinvaz has reviewed a lot of my commits and vice versa...

@carlossanlop
Copy link
Member Author

In PR #61958, I was made aware of a zlib implementation located in src/mono/mono/zlib.

@lambdageek, @akoeplinger, do you know if it's possible making mono consume the source code related to this PR?

@mtl1979
Copy link

mtl1979 commented Nov 24, 2021

@carlossanlop Like I said earlier, when a library is own project or target, it can be consumed by multiple dependent targets. cmake will automatically add dependency and link to correct files when using the target name instead of library name or source files.

At runtime, the full path to the shared/dynamic library must be known by the library or program requiring it. If not known at runtime, the default library search path is traversed to find matching library. For static libraries, runtime search obviously is not necessary.

@lambdageek
Copy link
Member

In PR #61958, I was made aware of a zlib implementation located in src/mono/mono/zlib.

@lambdageek, @akoeplinger, do you know if it's possible making mono consume the source code related to this PR?

@carlossanlop I think it should be fine to replace the one in the mono build with zlib-ng. as far as I know (@akoeplinger please correct me) we just have some old copy of upstream zlib that we update at irregular intervals.

@carlossanlop
Copy link
Member Author

Thanks for confirming, @lambdageek.

My next question: Is src/libraries/Native a correct location for code that would be shared by both mono and libraries?

@jkotas
Copy link
Member

jkotas commented Nov 24, 2021

src/libraries/Native a correct location for code that would be shared by both mono and libraries?

It should be src/native (after the other PR merges).

@am11
Copy link
Member

am11 commented Nov 24, 2021

@carlossanlop, basically it's am11@4cbed64; which will shave off ~11K lines from the repo.

@jkotas
Copy link
Member

jkotas commented Nov 25, 2021

I am wondering whether it would make sense to put all vendored unmanaged libraries in dedicated directory, for example src/native/external. It is a common practice in projects out there and it would make reusing zlib in mono a bit more natural.

@am11
Copy link
Member

am11 commented Nov 25, 2021

Makes sense to put all vendor code under a dedicated directory. We have brotli, libunwind, rapidjson, and zlib in the runtime repo (and perhaps few others in runtimelab). If it is not urgent, I can take a look once this PR is completed.

@ghost ghost closed this Jan 7, 2022
@ghost
Copy link

ghost commented Jan 7, 2022

Draft Pull Request was automatically closed for inactivity. Please let us know if you'd like to reopen it.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 6, 2022
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants