Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[huf] Improve fast C & ASM performance on small data #3827

Merged
merged 2 commits into from
Nov 20, 2023

Commits on Nov 20, 2023

  1. [huf] Improve fast C & ASM performance on small data

    * Rename `ilimit` to `ilowest` and set it equal to `src` instead of
      `src + 6 + 8`. This is safe because the fast decoding loops guarantee
      to never read below `ilowest` already. This allows the fast decoder to
      run for at least two more iterations, because it consumes at most 7
      bytes per iteration.
    * Continue the fast loop all the way until the number of safe iterations
     is 0. Initially, I thought that when it got towards the end, the
     computation of how many iterations of safe might become expensive. But
     it ends up being slower to have to decode each of the 4 streams
     individually, which makes sense.
    
    This drastically speeds up the Huffman decoder on the `github` dataset
    for the issue raised in facebook#3762, measured with `zstd -b1e1r github/`.
    
    | Decoder  | Speed before | Speed after |
    |----------|--------------|-------------|
    | Fallback | 477 MB/s     | 477 MB/s    |
    | Fast C   | 384 MB/s     | 492 MB/s    |
    | Assembly | 385 MB/s     | 501 MB/s    |
    
    We can also look at the speed delta for different block sizes of silesia
    using `zstd -b1e1r silesia.tar -B#`.
    
    | Decoder  | -B1K ∆ | -B2K ∆ | -B4K ∆ | -B8K ∆ | -B16K ∆ | -B32K ∆ | -B64K ∆ | -B128K ∆ |
    |----------|--------|--------|--------|--------|---------|---------|---------|----------|
    | Fast C   | +11.2% | +8.2%  | +6.1%  | +4.4%  | +2.7%   | +1.5%   | +0.6%   | +0.2%    |
    | Assembly | +12.5% | +9.0%  | +6.2%  | +3.6%  | +1.5%   | +0.7%   | +0.2%   | +0.03%   |
    Nick Terrell committed Nov 20, 2023
    Configuration menu
    Copy the full SHA
    6385862 View commit details
    Browse the repository at this point in the history
  2. [huf] Fix null pointer addition

    `HUF_DecompressFastArgs_init()` was adding 0 to NULL. Fix it by exiting
    early for empty outputs. This is no change in behavior, because the
    function was already exiting 0 in this case, just slightly later.
    Nick Terrell committed Nov 20, 2023
    Configuration menu
    Copy the full SHA
    e9d4fd9 View commit details
    Browse the repository at this point in the history