Pipelined Implementation of ZSTD_fast (~+5% Speed) #2749

felixhandte · 2021-08-17T22:18:15Z

This PR introduces a new implementation of the ZSTD_fast parser for single-segment compressions. This new match-finder achieves up to 5% speed improvements, and improves compression slightly on average.

Description

If you squint hard enough (and ignore repcodes), the search operation at any given position is broken into 4 stages:

Hash (map position to hash value via input read)
Lookup (map hash val to index via hashtable read)
Load (map index to value at that position via input read)
Compare (determine whether a match exists)

Each of these steps involves a memory read at an address which is computed from the previous step. This means that for each position, these steps must be sequenced and their latencies are cumulative.

Originally, ZSTD_fast simply did each step sequentially:

Pos | Time -->
----|-----------------
N   | 1234
N+1 |     1234
N+2 |         1234
N+3 |             1234

In #1562, @terrelln changed the implementation to work on two positions at a time:

Pos | Time -->
----|-----------------
N   | 1 2 3 4
N+1 |  1 2 3 4
N+2 |         1 2 3 4
N+3 |          1 2 3 4

This PR changes to a different strategy of parallelizing work, and does approximately the following:

R = Repcode Read & Compare
H = Hash Position
T = HashTable Lookup
M = Match Read & Compare

Pos | Time -->
----+-------------------
N   | ... M
N+1 | ...   TM
N+2 |    R H   T M
N+3 |         H    TM
N+4 |           R H   T M
N+5 |                H   ...
N+6 |                  R ...

This is very much analogous to the pipelining of execution in a CPU, and has the same benefits and drawbacks. This approach appears to more successfully parallelize read latencies.

However, just like a CPU, we have to dump the pipeline when we find a match (take a branch). When this happens, we throw away our current state, record the match, and then do the following prep to re-enter the loop:

Pos | Time -->
----+-------------------
N   | H T
N+1 |  H

This is also the work we do at the beginning to enter the loop initially.

In addition to this broad rearchitecture, various implementation details are tweaked to coax the best performance possible. This includes:

Rather than recalculate the next step size every time we need it, we keep track of it and simply increment it when we fail to find a match within a certain number of searches. This is equivalent to the previous approach, but slightly less work.
This also gives us a good infrequent checkpoint outside the core loop which ends up being a good opportunity to do some prefetching of the input.
The loop is manually unrolled so we do two positions per iteration. This lets us check our loop condition less frequently and lets us do a repcode search at every other position, like Nick's loop.

Parsing Differences

This PR parses slightly differently than the current strategy.

A big change is that the sensitivity is greatly increased to the acceleration factor derived from negative compression levels. In Nick's implementation, the step was applied only every other advance (each pair of searches in a loop iteration remained 1 byte apart). Whereas here, we return to the pre-#1562 behavior of applying the step between every search.

Benchmarks

As expected, given the above discussion, this is comparatively faster on less compressible inputs, but roughly on more compressible inputs, especially those with short matches. Here are some benchmarks:

As of 69b8ee9 ("Initial Pipelined Implementation for ZSTD_fast"):

Corpus      | gcc-10                    
            | ratio         | speed     
            |  dev  |  exp  | dev | exp 
------------+-------+-------+-----+-----
        -P5 | 1.250 | 1.246 | 789 | 903
       -P25 | 1.863 | 1.841 | 411 | 454
       -P50 | 3.170 | 3.152 | 452 | 464
       -P75 | 5.595 | 5.579 | 550 | 542
       -P95 | 9.662 | 9.660 | 649 | 633
silesia.tar | 2.884 | 2.888 | 355 | 352
------------+-------+-------+-----+-----
Corpus      | clang-11                  
            | ratio         | speed     
            |  dev  |  exp  | dev | exp 
------------+-------+-------+-----+-----
        -P5 | 1.250 | 1.246 | 786 | 882
       -P25 | 1.863 | 1.841 | 425 | 428
       -P50 | 3.170 | 3.152 | 465 | 436
       -P75 | 5.595 | 5.579 | 561 | 526
       -P95 | 9.662 | 9.660 | 650 | 627
silesia.tar | 2.884 | 2.888 | 363 | 336

(Benchmarked on an Intel Xeon E5-1650 v3 @ 3.50GHz.)

As of e2afc28 ("Nit: Only Store 2 Hash Variables"):

Corpus      | gcc-10                    |
            | ratio         | speed     |
            |  dev  |  exp  | dev | exp |
------------+-------+-------+-----+-----+
        -P5 | 1.250 | 1.246 | 789 | 905 |
       -P25 | 1.863 | 1.842 | 411 | 488 |
       -P50 | 3.170 | 3.152 | 452 | 487 |
       -P75 | 5.595 | 5.579 | 550 | 583 |
       -P95 | 9.662 | 9.660 | 649 | 667 |
silesia.tar | 2.884 | 2.887 | 355 | 369 |
     enwik9 | 2.798 | 2.804 | 301 | 301 |
------------+-------+-------+-----+-----+
Corpus      | clang-11                  |
            | ratio         | speed     |
            |  dev  |  exp  | dev | exp |
------------+-------+-------+-----+-----+
        -P5 | 1.250 | 1.246 | 786 | 888 |
       -P25 | 1.863 | 1.842 | 425 | 467 |
       -P50 | 3.170 | 3.152 | 465 | 481 |
       -P75 | 5.595 | 5.579 | 561 | 567 |
       -P95 | 9.662 | 9.660 | 650 | 650 |
silesia.tar | 2.884 | 2.887 | 363 | 356 |
     enwik9 | 2.798 | 2.804 | 299 | 289 |

(Benchmarked on an Intel Xeon E5-1650 v3 @ 3.50GHz.)

Corpus      | gcc-10                    |
            | ratio         | speed     |
            |  dev  |  exp  | dev | exp |
------------+-------+-------+-----+-----+
        -P5 | 1.250 | 1.246 | 222 | 233 |
       -P25 | 1.863 | 1.842 | 117 | 131 |
       -P50 | 3.170 | 3.152 | 130 | 136 |
       -P75 | 5.595 | 5.579 | 163 | 163 |
       -P95 | 9.662 | 9.660 | 201 | 196 |
silesia.tar | 2.884 | 2.887 | 113 | 109 |
     enwik8 | 2.455 | 2.459 |  83 |  80 |
------------+-------+-------+-----+-----+
Corpus      | clang-11                  |
            | ratio         | speed     |
            |  dev  |  exp  | dev | exp |
------------+-------+-------+-----+-----+
        -P5 | 1.250 | 1.246 | 213 | 224 |
       -P25 | 1.863 | 1.842 | 109 | 117 |
       -P50 | 3.170 | 3.152 | 121 | 123 |
       -P75 | 5.595 | 5.579 | 154 | 152 |
       -P95 | 9.662 | 9.660 | 197 | 189 |
silesia.tar | 2.884 | 2.887 | 107 | 102 |
     enwik8 | 2.455 | 2.459 |  78 |  74 |

(Benchmarked on a Raspberry Pi 4 aka "Broadcom BCM2711 SoC with a 1.5 GHz 64-bit quad-core ARM Cortex-A72 processor".)

As of 687c591 ("Tweak Step"):

Corpus      | gcc-10                           |
            | ratio         | speed            |
            |  dev  |  exp  | dev | exp | diff |
------------+-------+-------+-----+-----+------+
        -P5 | 1.250 | 1.246 | 789 | 916 | +16% |
       -P25 | 1.863 | 1.841 | 411 | 503 | +22% |
       -P50 | 3.170 | 3.152 | 452 | 516 | +14% |
       -P75 | 5.595 | 5.579 | 550 | 607 | +10% |
       -P95 | 9.662 | 9.662 | 649 | 685 | + 6% |
silesia.tar | 2.884 | 2.888 | 355 | 377 | + 6% |
     enwik9 | 2.798 | 2.802 | 301 | 315 | + 5% |
------------+-------+-------+-----+-----+------+
Corpus      | clang-11                         |
            | ratio         | speed            |
            |  dev  |  exp  | dev | exp | diff |
------------+-------+-------+-----+-----+------+
        -P5 | 1.250 | 1.246 | 786 | 900 | +15% |
       -P25 | 1.863 | 1.841 | 425 | 495 | +16% |
       -P50 | 3.170 | 3.152 | 465 | 510 | +10% |
       -P75 | 5.595 | 5.579 | 561 | 595 | + 6% |
       -P95 | 9.662 | 9.662 | 650 | 678 | + 4% |
silesia.tar | 2.884 | 2.888 | 363 | 372 | + 2% |
     enwik9 | 2.798 | 2.802 | 299 | 306 | + 2% |
------------+-------+-------+-----+-----+------+

(Benchmarked on an Intel Xeon E5-1650 v3 @ 3.50GHz.)

Corpus      | gcc-10                           |
            | ratio         | speed            |
            |  dev  |  exp  | dev | exp | diff |
------------+-------+-------+-----+-----+------+
        -P5 | 1.250 | 1.246 | 222 | 233 | + 5% |
       -P25 | 1.863 | 1.841 | 117 | 135 | +15% |
       -P50 | 3.170 | 3.152 | 130 | 139 | + 7% |
       -P75 | 5.595 | 5.579 | 163 | 168 | + 3% |
       -P95 | 9.662 | 9.662 | 201 | 199 | - 1% |
silesia.tar | 2.884 | 2.888 | 112 | 109 | - 3% |
     enwik8 | 2.455 | 2.459 |  83 |  80 | - 4% |
------------+-------+-------+-----+-----+------+
Corpus      | clang-11                         |
            | ratio         | speed            |
            |  dev  |  exp  | dev | exp | diff |
------------+-------+-------+-----+-----+------+
        -P5 | 1.250 | 1.246 | 213 | 224 | + 5% |
       -P25 | 1.863 | 1.841 | 109 | 129 | +18% |
       -P50 | 3.170 | 3.152 | 121 | 133 | +10% |
       -P75 | 5.595 | 5.579 | 154 | 161 | + 5% |
       -P95 | 9.662 | 9.662 | 197 | 193 | - 2% |
silesia.tar | 2.884 | 2.888 | 107 | 105 | - 2% |
     enwik8 | 2.455 | 2.459 |  78 |  77 | - 1% |
------------+-------+-------+-----+-----+------+

(Benchmarked on a Raspberry Pi 4 aka "Broadcom BCM2711 SoC with a 1.5 GHz 64-bit quad-core ARM Cortex-A72 processor".)

Extended Benchmark on Multiple Compilers, Levels, and Corpuses:

                         | Compression Speed        | Compression Ratio
Corpus      Compiler Lvl |   dev    exp     diff    |   dev    exp     diff
-------------------------+--------------------------+-------------------------
dickens     gcc-4.8   -1 |  207.8  209.3 ( +0.722%) |  1.763  1.765 ( +0.113%)
dickens     gcc-5     -1 |  201.2  210.3 ( +4.523%) |  1.763  1.765 ( +0.113%)
dickens     gcc-6     -1 |  199.9  213.1 ( +6.603%) |  1.763  1.765 ( +0.113%)
dickens     gcc-7     -1 |  207.6  204.4 ( -1.541%) |  1.763  1.765 ( +0.113%)
dickens     gcc-8     -1 |  206.9  211.2 ( +2.078%) |  1.763  1.765 ( +0.113%)
dickens     gcc-10    -1 |  201.1  208.1 ( +3.481%) |  1.763  1.765 ( +0.113%)
dickens     clang-6.0 -1 |  203.1  210.4 ( +3.594%) |  1.763  1.765 ( +0.113%)
dickens     clang-7   -1 |  206.2  207.0 ( +0.388%) |  1.763  1.765 ( +0.113%)
dickens     clang-8   -1 |  204.7  209.7 ( +2.443%) |  1.763  1.765 ( +0.113%)
dickens     clang-9   -1 |  208.9  214.8 ( +2.824%) |  1.763  1.765 ( +0.113%)
dickens     clang-11  -1 |  204.8  198.3 ( -3.174%) |  1.763  1.765 ( +0.113%)
dickens     clang-12  -1 |  204.6  210.7 ( +2.981%) |  1.763  1.765 ( +0.113%)
dickens     gcc-4.8    1 |  179.9  198.4 (+10.283%) |  2.388  2.392 ( +0.168%)
dickens     gcc-5      1 |  186.1  196.6 ( +5.642%) |  2.388  2.392 ( +0.168%)
dickens     gcc-6      1 |  194.3  182.7 ( -5.970%) |  2.388  2.392 ( +0.168%)
dickens     gcc-7      1 |  189.0  193.5 ( +2.381%) |  2.388  2.392 ( +0.168%)
dickens     gcc-8      1 |  193.4  189.9 ( -1.810%) |  2.388  2.392 ( +0.168%)
dickens     gcc-10     1 |  189.8  195.4 ( +2.950%) |  2.388  2.392 ( +0.168%)
dickens     clang-6.0  1 |  192.1  193.3 ( +0.625%) |  2.388  2.392 ( +0.168%)
dickens     clang-7    1 |  198.7  196.9 ( -0.906%) |  2.388  2.392 ( +0.168%)
dickens     clang-8    1 |  189.9  194.3 ( +2.317%) |  2.388  2.392 ( +0.168%)
dickens     clang-9    1 |  197.4  201.3 ( +1.976%) |  2.388  2.392 ( +0.168%)
dickens     clang-11   1 |  190.6  192.5 ( +0.997%) |  2.388  2.392 ( +0.168%)
dickens     clang-12   1 |  189.7  196.8 ( +3.743%) |  2.388  2.392 ( +0.168%)
dickens     gcc-4.8    2 |  118.7  124.1 ( +4.549%) |  2.630  2.637 ( +0.266%)
dickens     gcc-5      2 |  122.1  123.1 ( +0.819%) |  2.630  2.637 ( +0.266%)
dickens     gcc-6      2 |  122.4  117.0 ( -4.412%) |  2.630  2.637 ( +0.266%)
dickens     gcc-7      2 |  120.2  124.6 ( +3.661%) |  2.630  2.637 ( +0.266%)
dickens     gcc-8      2 |  122.5  118.5 ( -3.265%) |  2.630  2.637 ( +0.266%)
dickens     gcc-10     2 |  121.1  123.0 ( +1.569%) |  2.630  2.637 ( +0.266%)
dickens     clang-6.0  2 |  121.4  119.4 ( -1.647%) |  2.630  2.637 ( +0.266%)
dickens     clang-7    2 |  124.4  121.3 ( -2.492%) |  2.630  2.637 ( +0.266%)
dickens     clang-8    2 |  125.9  122.8 ( -2.462%) |  2.630  2.637 ( +0.266%)
dickens     clang-9    2 |  120.8  122.6 ( +1.490%) |  2.630  2.637 ( +0.266%)
dickens     clang-11   2 |  124.6  125.0 ( +0.321%) |  2.630  2.637 ( +0.266%)
dickens     clang-12   2 |  127.3  123.4 ( -3.064%) |  2.630  2.637 ( +0.266%)
mozilla     gcc-4.8   -1 |  328.9  329.5 ( +0.182%) |  2.325  2.324 ( -0.043%)
mozilla     gcc-5     -1 |  300.4  311.3 ( +3.628%) |  2.325  2.324 ( -0.043%)
mozilla     gcc-6     -1 |  317.3  348.6 ( +9.864%) |  2.325  2.324 ( -0.043%)
mozilla     gcc-7     -1 |  324.9  330.1 ( +1.600%) |  2.325  2.324 ( -0.043%)
mozilla     gcc-8     -1 |  334.4  329.6 ( -1.435%) |  2.325  2.324 ( -0.043%)
mozilla     gcc-10    -1 |  320.9  329.4 ( +2.649%) |  2.325  2.324 ( -0.043%)
mozilla     clang-6.0 -1 |  336.6  336.8 ( +0.059%) |  2.325  2.324 ( -0.043%)
mozilla     clang-7   -1 |  320.9  333.5 ( +3.926%) |  2.325  2.324 ( -0.043%)
mozilla     clang-8   -1 |  328.3  324.0 ( -1.310%) |  2.325  2.324 ( -0.043%)
mozilla     clang-9   -1 |  339.5  333.2 ( -1.856%) |  2.325  2.324 ( -0.043%)
mozilla     clang-11  -1 |  333.1  328.3 ( -1.441%) |  2.325  2.324 ( -0.043%)
mozilla     clang-12  -1 |  335.7  327.3 ( -2.502%) |  2.325  2.324 ( -0.043%)
mozilla     gcc-4.8    1 |  276.9  289.9 ( +4.695%) |  2.546  2.546 ( +0.000%)
mozilla     gcc-5      1 |  272.1  276.6 ( +1.654%) |  2.546  2.546 ( +0.000%)
mozilla     gcc-6      1 |  271.2  273.8 ( +0.959%) |  2.546  2.546 ( +0.000%)
mozilla     gcc-7      1 |  274.5  277.1 ( +0.947%) |  2.546  2.546 ( +0.000%)
mozilla     gcc-8      1 |  281.1  288.3 ( +2.561%) |  2.546  2.546 ( +0.000%)
mozilla     gcc-10     1 |  283.1  283.0 ( -0.035%) |  2.546  2.546 ( +0.000%)
mozilla     clang-6.0  1 |  291.9  286.1 ( -1.987%) |  2.546  2.546 ( +0.000%)
mozilla     clang-7    1 |  271.4  285.8 ( +5.306%) |  2.546  2.546 ( +0.000%)
mozilla     clang-8    1 |  283.2  278.7 ( -1.589%) |  2.546  2.546 ( +0.000%)
mozilla     clang-9    1 |  286.9  272.2 ( -5.124%) |  2.546  2.546 ( +0.000%)
mozilla     clang-11   1 |  286.3  292.3 ( +2.096%) |  2.546  2.546 ( +0.000%)
mozilla     clang-12   1 |  287.1  286.5 ( -0.209%) |  2.546  2.546 ( +0.000%)
mozilla     gcc-4.8    2 |  222.7  225.6 ( +1.302%) |  2.686  2.688 ( +0.074%)
mozilla     gcc-5      2 |  209.6  216.5 ( +3.292%) |  2.686  2.688 ( +0.074%)
mozilla     gcc-6      2 |  214.9  214.9 ( +0.000%) |  2.686  2.688 ( +0.074%)
mozilla     gcc-7      2 |  214.3  221.0 ( +3.126%) |  2.686  2.688 ( +0.074%)
mozilla     gcc-8      2 |  226.4  225.7 ( -0.309%) |  2.686  2.688 ( +0.074%)
mozilla     gcc-10     2 |  221.5  224.2 ( +1.219%) |  2.686  2.688 ( +0.074%)
mozilla     clang-6.0  2 |  230.5  230.2 ( -0.130%) |  2.686  2.688 ( +0.074%)
mozilla     clang-7    2 |  216.4  225.8 ( +4.344%) |  2.686  2.688 ( +0.074%)
mozilla     clang-8    2 |  213.8  218.3 ( +2.105%) |  2.686  2.688 ( +0.074%)
mozilla     clang-9    2 |  227.0  211.8 ( -6.696%) |  2.686  2.688 ( +0.074%)
mozilla     clang-11   2 |  224.4  224.7 ( +0.134%) |  2.686  2.688 ( +0.074%)
mozilla     clang-12   2 |  223.6  223.5 ( -0.045%) |  2.686  2.688 ( +0.074%)
mr          gcc-4.8   -1 |  299.1  320.0 ( +6.988%) |  1.958  1.961 ( +0.153%)
mr          gcc-5     -1 |  292.0  301.4 ( +3.219%) |  1.958  1.961 ( +0.153%)
mr          gcc-6     -1 |  293.4  327.6 (+11.656%) |  1.958  1.961 ( +0.153%)
mr          gcc-7     -1 |  299.1  321.3 ( +7.422%) |  1.958  1.961 ( +0.153%)
mr          gcc-8     -1 |  300.6  315.5 ( +4.957%) |  1.958  1.961 ( +0.153%)
mr          gcc-10    -1 |  285.4  319.0 (+11.773%) |  1.958  1.961 ( +0.153%)
mr          clang-6.0 -1 |  309.5  315.5 ( +1.939%) |  1.958  1.961 ( +0.153%)
mr          clang-7   -1 |  321.6  314.8 ( -2.114%) |  1.958  1.961 ( +0.153%)
mr          clang-8   -1 |  326.5  327.1 ( +0.184%) |  1.958  1.961 ( +0.153%)
mr          clang-9   -1 |  327.4  320.8 ( -2.016%) |  1.958  1.961 ( +0.153%)
mr          clang-11  -1 |  327.9  302.2 ( -7.838%) |  1.958  1.961 ( +0.153%)
mr          clang-12  -1 |  296.6  328.0 (+10.587%) |  1.958  1.961 ( +0.153%)
mr          gcc-4.8    1 |  246.8  264.3 ( +7.091%) |  2.596  2.611 ( +0.578%)
mr          gcc-5      1 |  241.1  252.8 ( +4.853%) |  2.596  2.611 ( +0.578%)
mr          gcc-6      1 |  250.9  266.7 ( +6.297%) |  2.596  2.611 ( +0.578%)
mr          gcc-7      1 |  254.7  258.3 ( +1.413%) |  2.596  2.611 ( +0.578%)
mr          gcc-8      1 |  237.1  259.6 ( +9.490%) |  2.596  2.611 ( +0.578%)
mr          gcc-10     1 |  239.1  265.2 (+10.916%) |  2.596  2.611 ( +0.578%)
mr          clang-6.0  1 |  255.9  255.2 ( -0.274%) |  2.596  2.611 ( +0.578%)
mr          clang-7    1 |  256.1  263.8 ( +3.007%) |  2.596  2.611 ( +0.578%)
mr          clang-8    1 |  257.4  255.1 ( -0.894%) |  2.596  2.611 ( +0.578%)
mr          clang-9    1 |  264.9  259.9 ( -1.888%) |  2.596  2.611 ( +0.578%)
mr          clang-11   1 |  261.8  261.4 ( -0.153%) |  2.596  2.611 ( +0.578%)
mr          clang-12   1 |  251.4  276.8 (+10.103%) |  2.596  2.611 ( +0.578%)
mr          gcc-4.8    2 |  174.2  182.7 ( +4.879%) |  2.769  2.771 ( +0.072%)
mr          gcc-5      2 |  170.2  165.7 ( -2.644%) |  2.769  2.771 ( +0.072%)
mr          gcc-6      2 |  171.5  173.4 ( +1.108%) |  2.769  2.771 ( +0.072%)
mr          gcc-7      2 |  173.8  168.3 ( -3.165%) |  2.769  2.771 ( +0.072%)
mr          gcc-8      2 |  166.8  166.8 ( +0.000%) |  2.769  2.771 ( +0.072%)
mr          gcc-10     2 |  171.7  177.7 ( +3.494%) |  2.769  2.771 ( +0.072%)
mr          clang-6.0  2 |  178.0  177.9 ( -0.056%) |  2.769  2.771 ( +0.072%)
mr          clang-7    2 |  174.3  178.1 ( +2.180%) |  2.769  2.771 ( +0.072%)
mr          clang-8    2 |  180.6  169.5 ( -6.146%) |  2.769  2.771 ( +0.072%)
mr          clang-9    2 |  184.8  177.3 ( -4.058%) |  2.769  2.771 ( +0.072%)
mr          clang-11   2 |  176.2  176.1 ( -0.057%) |  2.769  2.771 ( +0.072%)
mr          clang-12   2 |  176.5  187.2 ( +6.062%) |  2.769  2.771 ( +0.072%)
nci         gcc-4.8   -1 |  564.8  587.0 ( +3.931%) |  9.427  9.435 ( +0.085%)
nci         gcc-5     -1 |  581.3  600.4 ( +3.286%) |  9.427  9.435 ( +0.085%)
nci         gcc-6     -1 |  557.2  599.7 ( +7.627%) |  9.427  9.435 ( +0.085%)
nci         gcc-7     -1 |  583.8  593.5 ( +1.662%) |  9.427  9.435 ( +0.085%)
nci         gcc-8     -1 |  590.5  584.2 ( -1.067%) |  9.427  9.435 ( +0.085%)
nci         gcc-10    -1 |  578.8  584.3 ( +0.950%) |  9.427  9.435 ( +0.085%)
nci         clang-6.0 -1 |  603.4  582.4 ( -3.480%) |  9.427  9.435 ( +0.085%)
nci         clang-7   -1 |  603.1  598.3 ( -0.796%) |  9.427  9.435 ( +0.085%)
nci         clang-8   -1 |  580.0  591.2 ( +1.931%) |  9.427  9.435 ( +0.085%)
nci         clang-9   -1 |  597.8  586.3 ( -1.924%) |  9.427  9.435 ( +0.085%)
nci         clang-11  -1 |  598.7  588.6 ( -1.687%) |  9.427  9.435 ( +0.085%)
nci         clang-12  -1 |  602.8  581.9 ( -3.467%) |  9.427  9.435 ( +0.085%)
nci         gcc-4.8    1 |  513.6  545.2 ( +6.153%) | 11.780 11.780 ( +0.000%)
nci         gcc-5      1 |  508.2  556.9 ( +9.583%) | 11.780 11.780 ( +0.000%)
nci         gcc-6      1 |  549.3  552.2 ( +0.528%) | 11.780 11.780 ( +0.000%)
nci         gcc-7      1 |  551.5  539.6 ( -2.158%) | 11.780 11.780 ( +0.000%)
nci         gcc-8      1 |  544.4  551.1 ( +1.231%) | 11.780 11.780 ( +0.000%)
nci         gcc-10     1 |  551.3  549.0 ( -0.417%) | 11.780 11.780 ( +0.000%)
nci         clang-6.0  1 |  579.4  537.1 ( -7.301%) | 11.780 11.780 ( +0.000%)
nci         clang-7    1 |  538.7  550.1 ( +2.116%) | 11.780 11.780 ( +0.000%)
nci         clang-8    1 |  564.3  538.5 ( -4.572%) | 11.780 11.780 ( +0.000%)
nci         clang-9    1 |  562.0  545.3 ( -2.972%) | 11.780 11.780 ( +0.000%)
nci         clang-11   1 |  568.8  547.6 ( -3.727%) | 11.780 11.780 ( +0.000%)
nci         clang-12   1 |  550.0  536.0 ( -2.545%) | 11.780 11.780 ( +0.000%)
nci         gcc-4.8    2 |  474.5  460.0 ( -3.056%) | 11.610 11.620 ( +0.086%)
nci         gcc-5      2 |  474.8  480.1 ( +1.116%) | 11.610 11.620 ( +0.086%)
nci         gcc-6      2 |  486.0  495.7 ( +1.996%) | 11.610 11.620 ( +0.086%)
nci         gcc-7      2 |  494.4  470.3 ( -4.875%) | 11.610 11.620 ( +0.086%)
nci         gcc-8      2 |  495.7  482.4 ( -2.683%) | 11.610 11.620 ( +0.086%)
nci         gcc-10     2 |  487.6  492.0 ( +0.902%) | 11.610 11.620 ( +0.086%)
nci         clang-6.0  2 |  492.4  470.1 ( -4.529%) | 11.610 11.620 ( +0.086%)
nci         clang-7    2 |  486.6  481.6 ( -1.028%) | 11.610 11.620 ( +0.086%)
nci         clang-8    2 |  481.6  451.9 ( -6.167%) | 11.610 11.620 ( +0.086%)
nci         clang-9    2 |  508.1  486.7 ( -4.212%) | 11.610 11.620 ( +0.086%)
nci         clang-11   2 |  490.6  480.3 ( -2.099%) | 11.610 11.620 ( +0.086%)
nci         clang-12   2 |  493.0  480.9 ( -2.454%) | 11.610 11.620 ( +0.086%)
ooffice     gcc-4.8   -1 |  274.1  275.1 ( +0.365%) |  1.515  1.513 ( -0.132%)
ooffice     gcc-5     -1 |  264.1  285.7 ( +8.179%) |  1.515  1.513 ( -0.132%)
ooffice     gcc-6     -1 |  254.8  284.8 (+11.774%) |  1.515  1.513 ( -0.132%)
ooffice     gcc-7     -1 |  268.1  287.6 ( +7.273%) |  1.515  1.513 ( -0.132%)
ooffice     gcc-8     -1 |  261.9  283.3 ( +8.171%) |  1.515  1.513 ( -0.132%)
ooffice     gcc-10    -1 |  272.7  285.0 ( +4.510%) |  1.515  1.513 ( -0.132%)
ooffice     clang-6.0 -1 |  280.2  283.4 ( +1.142%) |  1.515  1.513 ( -0.132%)
ooffice     clang-7   -1 |  274.2  284.0 ( +3.574%) |  1.515  1.513 ( -0.132%)
ooffice     clang-8   -1 |  276.1  285.7 ( +3.477%) |  1.515  1.513 ( -0.132%)
ooffice     clang-9   -1 |  285.9  283.8 ( -0.735%) |  1.515  1.513 ( -0.132%)
ooffice     clang-11  -1 |  273.9  296.6 ( +8.288%) |  1.515  1.513 ( -0.132%)
ooffice     clang-12  -1 |  280.5  288.7 ( +2.923%) |  1.515  1.513 ( -0.132%)
ooffice     gcc-4.8    1 |  217.9  228.7 ( +4.956%) |  1.713  1.711 ( -0.117%)
ooffice     gcc-5      1 |  211.4  233.6 (+10.501%) |  1.713  1.711 ( -0.117%)
ooffice     gcc-6      1 |  220.8  240.2 ( +8.786%) |  1.713  1.711 ( -0.117%)
ooffice     gcc-7      1 |  220.7  239.6 ( +8.564%) |  1.713  1.711 ( -0.117%)
ooffice     gcc-8      1 |  221.8  230.1 ( +3.742%) |  1.713  1.711 ( -0.117%)
ooffice     gcc-10     1 |  222.8  228.5 ( +2.558%) |  1.713  1.711 ( -0.117%)
ooffice     clang-6.0  1 |  233.2  233.2 ( +0.000%) |  1.713  1.711 ( -0.117%)
ooffice     clang-7    1 |  226.0  230.7 ( +2.080%) |  1.713  1.711 ( -0.117%)
ooffice     clang-8    1 |  218.4  233.6 ( +6.960%) |  1.713  1.711 ( -0.117%)
ooffice     clang-9    1 |  235.6  232.4 ( -1.358%) |  1.713  1.711 ( -0.117%)
ooffice     clang-11   1 |  231.4  221.6 ( -4.235%) |  1.713  1.711 ( -0.117%)
ooffice     clang-12   1 |  224.4  238.7 ( +6.373%) |  1.713  1.711 ( -0.117%)
ooffice     gcc-4.8    2 |  160.0  159.3 ( -0.437%) |  1.851  1.850 ( -0.054%)
ooffice     gcc-5      2 |  159.7  163.2 ( +2.192%) |  1.851  1.850 ( -0.054%)
ooffice     gcc-6      2 |  152.5  166.3 ( +9.049%) |  1.851  1.850 ( -0.054%)
ooffice     gcc-7      2 |  152.7  167.3 ( +9.561%) |  1.851  1.850 ( -0.054%)
ooffice     gcc-8      2 |  163.5  162.8 ( -0.428%) |  1.851  1.850 ( -0.054%)
ooffice     gcc-10     2 |  158.9  155.5 ( -2.140%) |  1.851  1.850 ( -0.054%)
ooffice     clang-6.0  2 |  167.0  155.3 ( -7.006%) |  1.851  1.850 ( -0.054%)
ooffice     clang-7    2 |  158.5  160.2 ( +1.073%) |  1.851  1.850 ( -0.054%)
ooffice     clang-8    2 |  156.6  165.6 ( +5.747%) |  1.851  1.850 ( -0.054%)
ooffice     clang-9    2 |  164.4  165.8 ( +0.852%) |  1.851  1.850 ( -0.054%)
ooffice     clang-11   2 |  157.8  160.9 ( +1.965%) |  1.851  1.850 ( -0.054%)
ooffice     clang-12   2 |  151.6  165.2 ( +8.971%) |  1.851  1.850 ( -0.054%)
osdb        gcc-4.8   -1 |  301.4  298.1 ( -1.095%) |  2.383  2.389 ( +0.252%)
osdb        gcc-5     -1 |  289.1  295.3 ( +2.145%) |  2.383  2.389 ( +0.252%)
osdb        gcc-6     -1 |  282.3  311.3 (+10.273%) |  2.383  2.389 ( +0.252%)
osdb        gcc-7     -1 |  298.4  293.9 ( -1.508%) |  2.383  2.389 ( +0.252%)
osdb        gcc-8     -1 |  304.6  295.6 ( -2.955%) |  2.383  2.389 ( +0.252%)
osdb        gcc-10    -1 |  293.6  295.9 ( +0.783%) |  2.383  2.389 ( +0.252%)
osdb        clang-6.0 -1 |  295.6  291.4 ( -1.421%) |  2.383  2.389 ( +0.252%)
osdb        clang-7   -1 |  300.6  292.6 ( -2.661%) |  2.383  2.389 ( +0.252%)
osdb        clang-8   -1 |  298.3  300.8 ( +0.838%) |  2.383  2.389 ( +0.252%)
osdb        clang-9   -1 |  312.5  303.9 ( -2.752%) |  2.383  2.389 ( +0.252%)
osdb        clang-11  -1 |  303.2  295.5 ( -2.540%) |  2.383  2.389 ( +0.252%)
osdb        clang-12  -1 |  296.1  301.5 ( +1.824%) |  2.383  2.389 ( +0.252%)
osdb        gcc-4.8    1 |  259.6  266.8 ( +2.773%) |  2.697  2.704 ( +0.260%)
osdb        gcc-5      1 |  255.7  264.8 ( +3.559%) |  2.697  2.704 ( +0.260%)
osdb        gcc-6      1 |  256.4  277.4 ( +8.190%) |  2.697  2.704 ( +0.260%)
osdb        gcc-7      1 |  262.4  264.8 ( +0.915%) |  2.697  2.704 ( +0.260%)
osdb        gcc-8      1 |  269.8  268.3 ( -0.556%) |  2.697  2.704 ( +0.260%)
osdb        gcc-10     1 |  268.9  267.9 ( -0.372%) |  2.697  2.704 ( +0.260%)
osdb        clang-6.0  1 |  262.2  262.9 ( +0.267%) |  2.697  2.704 ( +0.260%)
osdb        clang-7    1 |  253.5  259.9 ( +2.525%) |  2.697  2.704 ( +0.260%)
osdb        clang-8    1 |  272.8  252.4 ( -7.478%) |  2.697  2.704 ( +0.260%)
osdb        clang-9    1 |  248.4  266.3 ( +7.206%) |  2.697  2.704 ( +0.260%)
osdb        clang-11   1 |  267.0  268.9 ( +0.712%) |  2.697  2.704 ( +0.260%)
osdb        clang-12   1 |  268.5  270.8 ( +0.857%) |  2.697  2.704 ( +0.260%)
osdb        gcc-4.8    2 |  208.0  205.0 ( -1.442%) |  2.887  2.888 ( +0.035%)
osdb        gcc-5      2 |  201.5  207.9 ( +3.176%) |  2.887  2.888 ( +0.035%)
osdb        gcc-6      2 |  202.4  210.2 ( +3.854%) |  2.887  2.888 ( +0.035%)
osdb        gcc-7      2 |  206.4  207.3 ( +0.436%) |  2.887  2.888 ( +0.035%)
osdb        gcc-8      2 |  208.1  205.2 ( -1.394%) |  2.887  2.888 ( +0.035%)
osdb        gcc-10     2 |  202.7  208.6 ( +2.911%) |  2.887  2.888 ( +0.035%)
osdb        clang-6.0  2 |  205.6  205.7 ( +0.049%) |  2.887  2.888 ( +0.035%)
osdb        clang-7    2 |  194.6  204.9 ( +5.293%) |  2.887  2.888 ( +0.035%)
osdb        clang-8    2 |  205.7  197.2 ( -4.132%) |  2.887  2.888 ( +0.035%)
osdb        clang-9    2 |  193.5  196.1 ( +1.344%) |  2.887  2.888 ( +0.035%)
osdb        clang-11   2 |  210.8  209.5 ( -0.617%) |  2.887  2.888 ( +0.035%)
osdb        clang-12   2 |  203.6  212.5 ( +4.371%) |  2.887  2.888 ( +0.035%)
reymont     gcc-4.8   -1 |  203.4  206.1 ( +1.327%) |  2.702  2.709 ( +0.259%)
reymont     gcc-5     -1 |  195.9  204.1 ( +4.186%) |  2.702  2.709 ( +0.259%)
reymont     gcc-6     -1 |  194.0  205.0 ( +5.670%) |  2.702  2.709 ( +0.259%)
reymont     gcc-7     -1 |  199.1  202.7 ( +1.808%) |  2.702  2.709 ( +0.259%)
reymont     gcc-8     -1 |  196.4  191.9 ( -2.291%) |  2.702  2.709 ( +0.259%)
reymont     gcc-10    -1 |  197.9  198.8 ( +0.455%) |  2.702  2.709 ( +0.259%)
reymont     clang-6.0 -1 |  199.4  201.9 ( +1.254%) |  2.702  2.709 ( +0.259%)
reymont     clang-7   -1 |  203.6  200.7 ( -1.424%) |  2.702  2.709 ( +0.259%)
reymont     clang-8   -1 |  206.6  215.0 ( +4.066%) |  2.702  2.709 ( +0.259%)
reymont     clang-9   -1 |  207.7  201.5 ( -2.985%) |  2.702  2.709 ( +0.259%)
reymont     clang-11  -1 |  214.3  205.5 ( -4.106%) |  2.702  2.709 ( +0.259%)
reymont     clang-12  -1 |  201.6  212.3 ( +5.308%) |  2.702  2.709 ( +0.259%)
reymont     gcc-4.8    1 |  198.8  205.0 ( +3.119%) |  3.078  3.085 ( +0.227%)
reymont     gcc-5      1 |  194.1  201.1 ( +3.606%) |  3.078  3.085 ( +0.227%)
reymont     gcc-6      1 |  202.9  203.9 ( +0.493%) |  3.078  3.085 ( +0.227%)
reymont     gcc-7      1 |  196.6  187.2 ( -4.781%) |  3.078  3.085 ( +0.227%)
reymont     gcc-8      1 |  199.6  193.9 ( -2.856%) |  3.078  3.085 ( +0.227%)
reymont     gcc-10     1 |  200.6  200.1 ( -0.249%) |  3.078  3.085 ( +0.227%)
reymont     clang-6.0  1 |  197.0  202.7 ( +2.893%) |  3.078  3.085 ( +0.227%)
reymont     clang-7    1 |  203.1  193.7 ( -4.628%) |  3.078  3.085 ( +0.227%)
reymont     clang-8    1 |  209.4  211.3 ( +0.907%) |  3.078  3.085 ( +0.227%)
reymont     clang-9    1 |  190.7  203.9 ( +6.922%) |  3.078  3.085 ( +0.227%)
reymont     clang-11   1 |  205.8  206.3 ( +0.243%) |  3.078  3.085 ( +0.227%)
reymont     clang-12   1 |  190.1  207.5 ( +9.153%) |  3.078  3.085 ( +0.227%)
reymont     gcc-4.8    2 |  161.0  158.8 ( -1.366%) |  3.200  3.208 ( +0.250%)
reymont     gcc-5      2 |  158.4  161.0 ( +1.641%) |  3.200  3.208 ( +0.250%)
reymont     gcc-6      2 |  162.2  160.8 ( -0.863%) |  3.200  3.208 ( +0.250%)
reymont     gcc-7      2 |  166.5  154.5 ( -7.207%) |  3.200  3.208 ( +0.250%)
reymont     gcc-8      2 |  158.6  151.3 ( -4.603%) |  3.200  3.208 ( +0.250%)
reymont     gcc-10     2 |  161.1  159.6 ( -0.931%) |  3.200  3.208 ( +0.250%)
reymont     clang-6.0  2 |  164.7  161.4 ( -2.004%) |  3.200  3.208 ( +0.250%)
reymont     clang-7    2 |  163.8  150.9 ( -7.875%) |  3.200  3.208 ( +0.250%)
reymont     clang-8    2 |  165.2  162.3 ( -1.755%) |  3.200  3.208 ( +0.250%)
reymont     clang-9    2 |  153.6  163.0 ( +6.120%) |  3.200  3.208 ( +0.250%)
reymont     clang-11   2 |  166.3  161.8 ( -2.706%) |  3.200  3.208 ( +0.250%)
reymont     clang-12   2 |  156.4  162.5 ( +3.900%) |  3.200  3.208 ( +0.250%)
samba       gcc-4.8   -1 |  370.4  365.1 ( -1.431%) |  3.433  3.438 ( +0.146%)
samba       gcc-5     -1 |  354.2  350.4 ( -1.073%) |  3.433  3.438 ( +0.146%)
samba       gcc-6     -1 |  361.3  366.8 ( +1.522%) |  3.433  3.438 ( +0.146%)
samba       gcc-7     -1 |  353.4  367.9 ( +4.103%) |  3.433  3.438 ( +0.146%)
samba       gcc-8     -1 |  368.7  365.7 ( -0.814%) |  3.433  3.438 ( +0.146%)
samba       gcc-10    -1 |  360.1  360.9 ( +0.222%) |  3.433  3.438 ( +0.146%)
samba       clang-6.0 -1 |  361.5  363.7 ( +0.609%) |  3.433  3.438 ( +0.146%)
samba       clang-7   -1 |  364.2  360.3 ( -1.071%) |  3.433  3.438 ( +0.146%)
samba       clang-8   -1 |  364.0  368.6 ( +1.264%) |  3.433  3.438 ( +0.146%)
samba       clang-9   -1 |  361.7  369.7 ( +2.212%) |  3.433  3.438 ( +0.146%)
samba       clang-11  -1 |  369.4  370.4 ( +0.271%) |  3.433  3.438 ( +0.146%)
samba       clang-12  -1 |  371.5  361.3 ( -2.746%) |  3.433  3.438 ( +0.146%)
samba       gcc-4.8    1 |  329.2  337.8 ( +2.612%) |  3.921  3.927 ( +0.153%)
samba       gcc-5      1 |  317.9  323.0 ( +1.604%) |  3.921  3.927 ( +0.153%)
samba       gcc-6      1 |  322.0  338.0 ( +4.969%) |  3.921  3.927 ( +0.153%)
samba       gcc-7      1 |  319.7  331.5 ( +3.691%) |  3.921  3.927 ( +0.153%)
samba       gcc-8      1 |  330.9  326.9 ( -1.209%) |  3.921  3.927 ( +0.153%)
samba       gcc-10     1 |  352.4  328.2 ( -6.867%) |  3.921  3.927 ( +0.153%)
samba       clang-6.0  1 |  330.3  337.1 ( +2.059%) |  3.921  3.927 ( +0.153%)
samba       clang-7    1 |  351.0  326.3 ( -7.037%) |  3.921  3.927 ( +0.153%)
samba       clang-8    1 |  323.5  335.5 ( +3.709%) |  3.921  3.927 ( +0.153%)
samba       clang-9    1 |  344.5  341.2 ( -0.958%) |  3.921  3.927 ( +0.153%)
samba       clang-11   1 |  326.5  337.1 ( +3.247%) |  3.921  3.927 ( +0.153%)
samba       clang-12   1 |  331.9  342.9 ( +3.314%) |  3.921  3.927 ( +0.153%)
samba       gcc-4.8    2 |  264.6  256.6 ( -3.023%) |  4.123  4.131 ( +0.194%)
samba       gcc-5      2 |  247.4  255.5 ( +3.274%) |  4.123  4.131 ( +0.194%)
samba       gcc-6      2 |  253.8  260.0 ( +2.443%) |  4.123  4.131 ( +0.194%)
samba       gcc-7      2 |  261.2  261.2 ( +0.000%) |  4.123  4.131 ( +0.194%)
samba       gcc-8      2 |  258.3  257.0 ( -0.503%) |  4.123  4.131 ( +0.194%)
samba       gcc-10     2 |  257.8  259.5 ( +0.659%) |  4.123  4.131 ( +0.194%)
samba       clang-6.0  2 |  254.2  253.5 ( -0.275%) |  4.123  4.131 ( +0.194%)
samba       clang-7    2 |  261.1  263.0 ( +0.728%) |  4.123  4.131 ( +0.194%)
samba       clang-8    2 |  258.1  249.0 ( -3.526%) |  4.123  4.131 ( +0.194%)
samba       clang-9    2 |  265.4  262.7 ( -1.017%) |  4.123  4.131 ( +0.194%)
samba       clang-11   2 |  264.6  267.3 ( +1.020%) |  4.123  4.131 ( +0.194%)
samba       clang-12   2 |  250.4  258.5 ( +3.235%) |  4.123  4.131 ( +0.194%)
sao         gcc-4.8   -1 |  261.1  299.5 (+14.707%) |  1.107  1.107 ( +0.000%)
sao         gcc-5     -1 |  250.8  289.7 (+15.510%) |  1.107  1.107 ( +0.000%)
sao         gcc-6     -1 |  249.3  292.1 (+17.168%) |  1.107  1.107 ( +0.000%)
sao         gcc-7     -1 |  259.6  293.3 (+12.982%) |  1.107  1.107 ( +0.000%)
sao         gcc-8     -1 |  268.7  269.8 ( +0.409%) |  1.107  1.107 ( +0.000%)
sao         gcc-10    -1 |  267.8  281.6 ( +5.153%) |  1.107  1.107 ( +0.000%)
sao         clang-6.0 -1 |  283.7  284.9 ( +0.423%) |  1.107  1.107 ( +0.000%)
sao         clang-7   -1 |  264.9  281.5 ( +6.267%) |  1.107  1.107 ( +0.000%)
sao         clang-8   -1 |  268.3  283.8 ( +5.777%) |  1.107  1.107 ( +0.000%)
sao         clang-9   -1 |  275.0  284.1 ( +3.309%) |  1.107  1.107 ( +0.000%)
sao         clang-11  -1 |  279.8  280.0 ( +0.071%) |  1.107  1.107 ( +0.000%)
sao         clang-12  -1 |  265.9  282.0 ( +6.055%) |  1.107  1.107 ( +0.000%)
sao         gcc-4.8    1 |  192.4  213.0 (+10.707%) |  1.159  1.159 ( +0.000%)
sao         gcc-5      1 |  185.1  219.9 (+18.801%) |  1.159  1.159 ( +0.000%)
sao         gcc-6      1 |  189.9  205.7 ( +8.320%) |  1.159  1.159 ( +0.000%)
sao         gcc-7      1 |  191.3  218.6 (+14.271%) |  1.159  1.159 ( +0.000%)
sao         gcc-8      1 |  197.1  212.1 ( +7.610%) |  1.159  1.159 ( +0.000%)
sao         gcc-10     1 |  197.5  206.4 ( +4.506%) |  1.159  1.159 ( +0.000%)
sao         clang-6.0  1 |  219.3  211.7 ( -3.466%) |  1.159  1.159 ( +0.000%)
sao         clang-7    1 |  197.8  212.3 ( +7.331%) |  1.159  1.159 ( +0.000%)
sao         clang-8    1 |  209.3  210.5 ( +0.573%) |  1.159  1.159 ( +0.000%)
sao         clang-9    1 |  208.4  214.5 ( +2.927%) |  1.159  1.159 ( +0.000%)
sao         clang-11   1 |  208.6  219.5 ( +5.225%) |  1.159  1.159 ( +0.000%)
sao         clang-12   1 |  203.1  207.0 ( +1.920%) |  1.159  1.159 ( +0.000%)
sao         gcc-4.8    2 |  123.3  129.5 ( +5.028%) |  1.248  1.249 ( +0.080%)
sao         gcc-5      2 |  122.6  136.0 (+10.930%) |  1.248  1.249 ( +0.080%)
sao         gcc-6      2 |  123.5  127.2 ( +2.996%) |  1.248  1.249 ( +0.080%)
sao         gcc-7      2 |  127.1  135.9 ( +6.924%) |  1.248  1.249 ( +0.080%)
sao         gcc-8      2 |  124.5  135.2 ( +8.594%) |  1.248  1.249 ( +0.080%)
sao         gcc-10     2 |  128.5  134.3 ( +4.514%) |  1.248  1.249 ( +0.080%)
sao         clang-6.0  2 |  140.6  131.7 ( -6.330%) |  1.248  1.249 ( +0.080%)
sao         clang-7    2 |  130.4  125.1 ( -4.064%) |  1.248  1.249 ( +0.080%)
sao         clang-8    2 |  132.3  136.6 ( +3.250%) |  1.248  1.249 ( +0.080%)
sao         clang-9    2 |  134.2  133.9 ( -0.224%) |  1.248  1.249 ( +0.080%)
sao         clang-11   2 |  135.9  133.8 ( -1.545%) |  1.248  1.249 ( +0.080%)
sao         clang-12   2 |  130.0  131.7 ( +1.308%) |  1.248  1.249 ( +0.080%)
webster     gcc-4.8   -1 |  239.2  244.3 ( +2.132%) |  2.346  2.351 ( +0.213%)
webster     gcc-5     -1 |  223.2  235.9 ( +5.690%) |  2.346  2.351 ( +0.213%)
webster     gcc-6     -1 |  225.6  249.5 (+10.594%) |  2.346  2.351 ( +0.213%)
webster     gcc-7     -1 |  234.4  242.7 ( +3.541%) |  2.346  2.351 ( +0.213%)
webster     gcc-8     -1 |  224.5  243.3 ( +8.374%) |  2.346  2.351 ( +0.213%)
webster     gcc-10    -1 |  232.3  236.0 ( +1.593%) |  2.346  2.351 ( +0.213%)
webster     clang-6.0 -1 |  236.8  240.2 ( +1.436%) |  2.346  2.351 ( +0.213%)
webster     clang-7   -1 |  237.9  245.3 ( +3.111%) |  2.346  2.351 ( +0.213%)
webster     clang-8   -1 |  232.8  245.7 ( +5.541%) |  2.346  2.351 ( +0.213%)
webster     clang-9   -1 |  243.9  244.5 ( +0.246%) |  2.346  2.351 ( +0.213%)
webster     clang-11  -1 |  241.8  243.0 ( +0.496%) |  2.346  2.351 ( +0.213%)
webster     clang-12  -1 |  230.7  242.5 ( +5.115%) |  2.346  2.351 ( +0.213%)
webster     gcc-4.8    1 |  220.1  224.8 ( +2.135%) |  3.028  3.035 ( +0.231%)
webster     gcc-5      1 |  211.5  228.3 ( +7.943%) |  3.028  3.035 ( +0.231%)
webster     gcc-6      1 |  216.7  224.8 ( +3.738%) |  3.028  3.035 ( +0.231%)
webster     gcc-7      1 |  219.5  225.7 ( +2.825%) |  3.028  3.035 ( +0.231%)
webster     gcc-8      1 |  218.5  220.5 ( +0.915%) |  3.028  3.035 ( +0.231%)
webster     gcc-10     1 |  221.0  217.1 ( -1.765%) |  3.028  3.035 ( +0.231%)
webster     clang-6.0  1 |  225.8  224.4 ( -0.620%) |  3.028  3.035 ( +0.231%)
webster     clang-7    1 |  227.8  225.1 ( -1.185%) |  3.028  3.035 ( +0.231%)
webster     clang-8    1 |  221.7  223.1 ( +0.631%) |  3.028  3.035 ( +0.231%)
webster     clang-9    1 |  226.3  229.8 ( +1.547%) |  3.028  3.035 ( +0.231%)
webster     clang-11   1 |  228.8  225.8 ( -1.311%) |  3.028  3.035 ( +0.231%)
webster     clang-12   1 |  218.4  226.3 ( +3.617%) |  3.028  3.035 ( +0.231%)
webster     gcc-4.8    2 |  152.1  153.3 ( +0.789%) |  3.228  3.237 ( +0.279%)
webster     gcc-5      2 |  147.9  156.5 ( +5.815%) |  3.228  3.237 ( +0.279%)
webster     gcc-6      2 |  149.3  155.6 ( +4.220%) |  3.228  3.237 ( +0.279%)
webster     gcc-7      2 |  149.8  154.2 ( +2.937%) |  3.228  3.237 ( +0.279%)
webster     gcc-8      2 |  151.4  154.6 ( +2.114%) |  3.228  3.237 ( +0.279%)
webster     gcc-10     2 |  157.0  151.5 ( -3.503%) |  3.228  3.237 ( +0.279%)
webster     clang-6.0  2 |  149.0  150.9 ( +1.275%) |  3.228  3.237 ( +0.279%)
webster     clang-7    2 |  152.7  158.1 ( +3.536%) |  3.228  3.237 ( +0.279%)
webster     clang-8    2 |  149.7  155.6 ( +3.941%) |  3.228  3.237 ( +0.279%)
webster     clang-9    2 |  159.0  160.5 ( +0.943%) |  3.228  3.237 ( +0.279%)
webster     clang-11   2 |  155.4  159.6 ( +2.703%) |  3.228  3.237 ( +0.279%)
webster     clang-12   2 |  154.7  154.1 ( -0.388%) |  3.228  3.237 ( +0.279%)
xml         gcc-4.8   -1 |  472.6  483.4 ( +2.285%) |  6.055  6.048 ( -0.116%)
xml         gcc-5     -1 |  462.5  478.6 ( +3.481%) |  6.055  6.048 ( -0.116%)
xml         gcc-6     -1 |  443.3  481.0 ( +8.504%) |  6.055  6.048 ( -0.116%)
xml         gcc-7     -1 |  482.6  505.4 ( +4.724%) |  6.055  6.048 ( -0.116%)
xml         gcc-8     -1 |  488.7  485.5 ( -0.655%) |  6.055  6.048 ( -0.116%)
xml         gcc-10    -1 |  480.4  484.2 ( +0.791%) |  6.055  6.048 ( -0.116%)
xml         clang-6.0 -1 |  488.2  477.5 ( -2.192%) |  6.055  6.048 ( -0.116%)
xml         clang-7   -1 |  466.0  484.7 ( +4.013%) |  6.055  6.048 ( -0.116%)
xml         clang-8   -1 |  503.4  480.0 ( -4.648%) |  6.055  6.048 ( -0.116%)
xml         clang-9   -1 |  457.3  476.1 ( +4.111%) |  6.055  6.048 ( -0.116%)
xml         clang-11  -1 |  479.6  489.3 ( +2.023%) |  6.055  6.048 ( -0.116%)
xml         clang-12  -1 |  484.4  476.1 ( -1.713%) |  6.055  6.048 ( -0.116%)
xml         gcc-4.8    1 |  463.3  466.2 ( +0.626%) |  7.673  7.668 ( -0.065%)
xml         gcc-5      1 |  450.2  467.9 ( +3.932%) |  7.673  7.668 ( -0.065%)
xml         gcc-6      1 |  452.8  439.0 ( -3.048%) |  7.673  7.668 ( -0.065%)
xml         gcc-7      1 |  470.0  477.6 ( +1.617%) |  7.673  7.668 ( -0.065%)
xml         gcc-8      1 |  484.8  470.0 ( -3.053%) |  7.673  7.668 ( -0.065%)
xml         gcc-10     1 |  467.3  473.4 ( +1.305%) |  7.673  7.668 ( -0.065%)
xml         clang-6.0  1 |  467.9  481.0 ( +2.800%) |  7.673  7.668 ( -0.065%)
xml         clang-7    1 |  465.3  472.8 ( +1.612%) |  7.673  7.668 ( -0.065%)
xml         clang-8    1 |  470.6  470.8 ( +0.042%) |  7.673  7.668 ( -0.065%)
xml         clang-9    1 |  464.8  457.5 ( -1.571%) |  7.673  7.668 ( -0.065%)
xml         clang-11   1 |  468.0  473.0 ( +1.068%) |  7.673  7.668 ( -0.065%)
xml         clang-12   1 |  474.8  463.9 ( -2.296%) |  7.673  7.668 ( -0.065%)
xml         gcc-4.8    2 |  379.5  375.9 ( -0.949%) |  7.843  7.861 ( +0.230%)
xml         gcc-5      2 |  357.5  383.1 ( +7.161%) |  7.843  7.861 ( +0.230%)
xml         gcc-6      2 |  369.2  362.4 ( -1.842%) |  7.843  7.861 ( +0.230%)
xml         gcc-7      2 |  387.1  388.7 ( +0.413%) |  7.843  7.861 ( +0.230%)
xml         gcc-8      2 |  387.7  389.8 ( +0.542%) |  7.843  7.861 ( +0.230%)
xml         gcc-10     2 |  382.4  385.7 ( +0.863%) |  7.843  7.861 ( +0.230%)
xml         clang-6.0  2 |  384.1  389.6 ( +1.432%) |  7.843  7.861 ( +0.230%)
xml         clang-7    2 |  381.7  384.3 ( +0.681%) |  7.843  7.861 ( +0.230%)
xml         clang-8    2 |  397.6  393.6 ( -1.006%) |  7.843  7.861 ( +0.230%)
xml         clang-9    2 |  392.9  389.4 ( -0.891%) |  7.843  7.861 ( +0.230%)
xml         clang-11   2 |  387.1  384.0 ( -0.801%) |  7.843  7.861 ( +0.230%)
xml         clang-12   2 |  385.0  380.6 ( -1.143%) |  7.843  7.861 ( +0.230%)
x-ray       gcc-4.8   -1 | 1882.3 2375.9 (+26.223%) |  1.001  1.000 ( -0.100%)
x-ray       gcc-5     -1 | 1904.6 2396.6 (+25.832%) |  1.001  1.000 ( -0.100%)
x-ray       gcc-6     -1 | 1952.3 2457.6 (+25.882%) |  1.001  1.000 ( -0.100%)
x-ray       gcc-7     -1 | 1937.5 2398.3 (+23.783%) |  1.001  1.000 ( -0.100%)
x-ray       gcc-8     -1 | 2043.5 2402.6 (+17.573%) |  1.001  1.000 ( -0.100%)
x-ray       gcc-10    -1 | 1977.7 2405.1 (+21.611%) |  1.001  1.000 ( -0.100%)
x-ray       clang-6.0 -1 | 2021.1 2281.0 (+12.859%) |  1.001  1.000 ( -0.100%)
x-ray       clang-7   -1 | 1871.8 2459.9 (+31.419%) |  1.001  1.000 ( -0.100%)
x-ray       clang-8   -1 | 2068.2 2424.7 (+17.237%) |  1.001  1.000 ( -0.100%)
x-ray       clang-9   -1 | 2038.7 2481.4 (+21.715%) |  1.001  1.000 ( -0.100%)
x-ray       clang-11  -1 | 2076.8 2417.0 (+16.381%) |  1.001  1.000 ( -0.100%)
x-ray       clang-12  -1 | 1944.6 2382.8 (+22.534%) |  1.001  1.000 ( -0.100%)
x-ray       gcc-4.8    1 |  526.3  537.8 ( +2.185%) |  1.251  1.252 ( +0.080%)
x-ray       gcc-5      1 |  550.0  561.3 ( +2.055%) |  1.251  1.252 ( +0.080%)
x-ray       gcc-6      1 |  563.5  576.4 ( +2.289%) |  1.251  1.252 ( +0.080%)
x-ray       gcc-7      1 |  565.4  562.0 ( -0.601%) |  1.251  1.252 ( +0.080%)
x-ray       gcc-8      1 |  562.1  563.6 ( +0.267%) |  1.251  1.252 ( +0.080%)
x-ray       gcc-10     1 |  528.1  548.2 ( +3.806%) |  1.251  1.252 ( +0.080%)
x-ray       clang-6.0  1 |  550.6  564.6 ( +2.543%) |  1.251  1.252 ( +0.080%)
x-ray       clang-7    1 |  574.3  575.3 ( +0.174%) |  1.251  1.252 ( +0.080%)
x-ray       clang-8    1 |  556.0  593.5 ( +6.745%) |  1.251  1.252 ( +0.080%)
x-ray       clang-9    1 |  538.5  585.4 ( +8.709%) |  1.251  1.252 ( +0.080%)
x-ray       clang-11   1 |  544.1  568.7 ( +4.521%) |  1.251  1.252 ( +0.080%)
x-ray       clang-12   1 |  529.7  529.2 ( -0.094%) |  1.251  1.252 ( +0.080%)
x-ray       gcc-4.8    2 |  229.5  286.4 (+24.793%) |  1.268  1.264 ( -0.315%)
x-ray       gcc-5      2 |  228.0  297.9 (+30.658%) |  1.268  1.264 ( -0.315%)
x-ray       gcc-6      2 |  230.2  301.2 (+30.843%) |  1.268  1.264 ( -0.315%)
x-ray       gcc-7      2 |  224.7  286.9 (+27.681%) |  1.268  1.264 ( -0.315%)
x-ray       gcc-8      2 |  234.7  300.6 (+28.078%) |  1.268  1.264 ( -0.315%)
x-ray       gcc-10     2 |  225.7  302.2 (+33.895%) |  1.268  1.264 ( -0.315%)
x-ray       clang-6.0  2 |  242.7  300.7 (+23.898%) |  1.268  1.264 ( -0.315%)
x-ray       clang-7    2 |  218.0  304.1 (+39.495%) |  1.268  1.264 ( -0.315%)
x-ray       clang-8    2 |  238.3  301.9 (+26.689%) |  1.268  1.264 ( -0.315%)
x-ray       clang-9    2 |  238.3  320.3 (+34.410%) |  1.268  1.264 ( -0.315%)
x-ray       clang-11   2 |  238.2  300.1 (+25.987%) |  1.268  1.264 ( -0.315%)
x-ray       clang-12   2 |  231.3  286.7 (+23.952%) |  1.268  1.264 ( -0.315%)
silesia.tar gcc-4.8   -1 |  307.2  325.6 ( +5.990%) |  2.434  2.436 ( +0.082%)
silesia.tar gcc-5     -1 |  307.0  325.8 ( +6.124%) |  2.434  2.436 ( +0.082%)
silesia.tar gcc-6     -1 |  308.5  331.0 ( +7.293%) |  2.434  2.436 ( +0.082%)
silesia.tar gcc-7     -1 |  318.4  320.9 ( +0.785%) |  2.434  2.436 ( +0.082%)
silesia.tar gcc-8     -1 |  300.2  325.1 ( +8.294%) |  2.434  2.436 ( +0.082%)
silesia.tar gcc-10    -1 |  316.8  321.8 ( +1.578%) |  2.434  2.436 ( +0.082%)
silesia.tar clang-6.0 -1 |  324.9  326.6 ( +0.523%) |  2.434  2.436 ( +0.082%)
silesia.tar clang-7   -1 |  327.8  315.9 ( -3.630%) |  2.434  2.436 ( +0.082%)
silesia.tar clang-8   -1 |  332.7  321.7 ( -3.306%) |  2.434  2.436 ( +0.082%)
silesia.tar clang-9   -1 |  326.1  323.5 ( -0.797%) |  2.434  2.436 ( +0.082%)
silesia.tar clang-11  -1 |  318.7  323.8 ( +1.600%) |  2.434  2.436 ( +0.082%)
silesia.tar clang-12  -1 |  326.5  325.6 ( -0.276%) |  2.434  2.436 ( +0.082%)
silesia.tar gcc-4.8    1 |  273.5  286.8 ( +4.863%) |  2.884  2.888 ( +0.139%)
silesia.tar gcc-5      1 |  277.6  290.2 ( +4.539%) |  2.884  2.888 ( +0.139%)
silesia.tar gcc-6      1 |  281.8  285.5 ( +1.313%) |  2.884  2.888 ( +0.139%)
silesia.tar gcc-7      1 |  278.9  290.6 ( +4.195%) |  2.884  2.888 ( +0.139%)
silesia.tar gcc-8      1 |  273.6  287.5 ( +5.080%) |  2.884  2.888 ( +0.139%)
silesia.tar gcc-10     1 |  275.8  289.6 ( +5.004%) |  2.884  2.888 ( +0.139%)
silesia.tar clang-6.0  1 |  283.6  284.8 ( +0.423%) |  2.884  2.888 ( +0.139%)
silesia.tar clang-7    1 |  285.8  284.7 ( -0.385%) |  2.884  2.888 ( +0.139%)
silesia.tar clang-8    1 |  278.0  280.6 ( +0.935%) |  2.884  2.888 ( +0.139%)
silesia.tar clang-9    1 |  296.6  289.7 ( -2.326%) |  2.884  2.888 ( +0.139%)
silesia.tar clang-11   1 |  270.9  302.6 (+11.702%) |  2.884  2.888 ( +0.139%)
silesia.tar clang-12   1 |  288.6  291.6 ( +1.040%) |  2.884  2.888 ( +0.139%)
silesia.tar gcc-4.8    2 |  200.0  211.4 ( +5.700%) |  3.046  3.048 ( +0.066%)
silesia.tar gcc-5      2 |  197.8  211.1 ( +6.724%) |  3.046  3.048 ( +0.066%)
silesia.tar gcc-6      2 |  205.6  205.5 ( -0.049%) |  3.046  3.048 ( +0.066%)
silesia.tar gcc-7      2 |  197.3  209.4 ( +6.133%) |  3.046  3.048 ( +0.066%)
silesia.tar gcc-8      2 |  194.1  217.4 (+12.004%) |  3.046  3.048 ( +0.066%)
silesia.tar gcc-10     2 |  204.9  211.9 ( +3.416%) |  3.046  3.048 ( +0.066%)
silesia.tar clang-6.0  2 |  211.2  209.5 ( -0.805%) |  3.046  3.048 ( +0.066%)
silesia.tar clang-7    2 |  212.4  203.5 ( -4.190%) |  3.046  3.048 ( +0.066%)
silesia.tar clang-8    2 |  200.6  208.0 ( +3.689%) |  3.046  3.048 ( +0.066%)
silesia.tar clang-9    2 |  216.2  206.7 ( -4.394%) |  3.046  3.048 ( +0.066%)
silesia.tar clang-11   2 |  202.8  215.1 ( +6.065%) |  3.046  3.048 ( +0.066%)
silesia.tar clang-12   2 |  207.3  210.6 ( +1.592%) |  3.046  3.048 ( +0.066%)
enwik8      gcc-4.8   -1 |  225.6  234.6 ( +3.989%) |  1.934  1.937 ( +0.155%)
enwik8      gcc-5     -1 |  222.8  231.8 ( +4.039%) |  1.934  1.937 ( +0.155%)
enwik8      gcc-6     -1 |  225.7  235.7 ( +4.431%) |  1.934  1.937 ( +0.155%)
enwik8      gcc-7     -1 |  225.5  235.2 ( +4.302%) |  1.934  1.937 ( +0.155%)
enwik8      gcc-8     -1 |  245.0  235.4 ( -3.918%) |  1.934  1.937 ( +0.155%)
enwik8      gcc-10    -1 |  231.4  237.4 ( +2.593%) |  1.934  1.937 ( +0.155%)
enwik8      clang-6.0 -1 |  232.0  234.5 ( +1.078%) |  1.934  1.937 ( +0.155%)
enwik8      clang-7   -1 |  235.4  233.4 ( -0.850%) |  1.934  1.937 ( +0.155%)
enwik8      clang-8   -1 |  229.4  225.3 ( -1.787%) |  1.934  1.937 ( +0.155%)
enwik8      clang-9   -1 |  229.1  238.6 ( +4.147%) |  1.934  1.937 ( +0.155%)
enwik8      clang-11  -1 |  221.0  231.4 ( +4.706%) |  1.934  1.937 ( +0.155%)
enwik8      clang-12  -1 |  227.1  232.6 ( +2.422%) |  1.934  1.937 ( +0.155%)
enwik8      gcc-4.8    1 |  200.0  216.6 ( +8.300%) |  2.455  2.459 ( +0.163%)
enwik8      gcc-5      1 |  201.6  211.1 ( +4.712%) |  2.455  2.459 ( +0.163%)
enwik8      gcc-6      1 |  210.1  212.8 ( +1.285%) |  2.455  2.459 ( +0.163%)
enwik8      gcc-7      1 |  206.2  215.5 ( +4.510%) |  2.455  2.459 ( +0.163%)
enwik8      gcc-8      1 |  210.5  211.8 ( +0.618%) |  2.455  2.459 ( +0.163%)
enwik8      gcc-10     1 |  211.3  207.9 ( -1.609%) |  2.455  2.459 ( +0.163%)
enwik8      clang-6.0  1 |  206.1  218.3 ( +5.919%) |  2.455  2.459 ( +0.163%)
enwik8      clang-7    1 |  207.6  210.5 ( +1.397%) |  2.455  2.459 ( +0.163%)
enwik8      clang-8    1 |  197.9  214.9 ( +8.590%) |  2.455  2.459 ( +0.163%)
enwik8      clang-9    1 |  215.4  211.6 ( -1.764%) |  2.455  2.459 ( +0.163%)
enwik8      clang-11   1 |  208.0  214.5 ( +3.125%) |  2.455  2.459 ( +0.163%)
enwik8      clang-12   1 |  204.3  213.6 ( +4.552%) |  2.455  2.459 ( +0.163%)
enwik8      gcc-4.8    2 |  135.1  148.1 ( +9.623%) |  2.671  2.679 ( +0.300%)
enwik8      gcc-5      2 |  140.7  144.8 ( +2.914%) |  2.671  2.679 ( +0.300%)
enwik8      gcc-6      2 |  143.5  147.0 ( +2.439%) |  2.671  2.679 ( +0.300%)
enwik8      gcc-7      2 |  138.8  148.4 ( +6.916%) |  2.671  2.679 ( +0.300%)
enwik8      gcc-8      2 |  142.3  143.3 ( +0.703%) |  2.671  2.679 ( +0.300%)
enwik8      gcc-10     2 |  135.9  141.4 ( +4.047%) |  2.671  2.679 ( +0.300%)
enwik8      clang-6.0  2 |  142.7  147.2 ( +3.153%) |  2.671  2.679 ( +0.300%)
enwik8      clang-7    2 |  139.5  140.9 ( +1.004%) |  2.671  2.679 ( +0.300%)
enwik8      clang-8    2 |  133.3  145.2 ( +8.927%) |  2.671  2.679 ( +0.300%)
enwik8      clang-9    2 |  148.1  144.9 ( -2.161%) |  2.671  2.679 ( +0.300%)
enwik8      clang-11   2 |  144.0  148.5 ( +3.125%) |  2.671  2.679 ( +0.300%)
enwik8      clang-12   2 |  140.8  145.5 ( +3.338%) |  2.671  2.679 ( +0.300%)
enwik9      gcc-4.8   -1 |  261.1  253.2 ( -3.026%) |  2.204  2.207 ( +0.136%)
enwik9      gcc-5     -1 |  247.3  251.1 ( +1.537%) |  2.204  2.207 ( +0.136%)
enwik9      gcc-6     -1 |  239.2  261.2 ( +9.197%) |  2.204  2.207 ( +0.136%)
enwik9      gcc-7     -1 |  249.9  265.7 ( +6.323%) |  2.204  2.207 ( +0.136%)
enwik9      gcc-8     -1 |  257.5  267.5 ( +3.883%) |  2.204  2.207 ( +0.136%)
enwik9      gcc-10    -1 |  259.1  265.0 ( +2.277%) |  2.204  2.207 ( +0.136%)
enwik9      clang-6.0 -1 |  252.5  253.5 ( +0.396%) |  2.204  2.207 ( +0.136%)
enwik9      clang-7   -1 |  253.6  262.3 ( +3.431%) |  2.204  2.207 ( +0.136%)
enwik9      clang-8   -1 |  269.3  265.3 ( -1.485%) |  2.204  2.207 ( +0.136%)
enwik9      clang-9   -1 |  247.1  265.2 ( +7.325%) |  2.204  2.207 ( +0.136%)
enwik9      clang-11  -1 |  252.0  264.7 ( +5.040%) |  2.204  2.207 ( +0.136%)
enwik9      clang-12  -1 |  252.8  260.2 ( +2.927%) |  2.204  2.207 ( +0.136%)
enwik9      gcc-4.8    1 |  233.1  235.9 ( +1.201%) |  2.798  2.802 ( +0.143%)
enwik9      gcc-5      1 |  226.2  233.2 ( +3.095%) |  2.798  2.802 ( +0.143%)
enwik9      gcc-6      1 |  222.0  236.1 ( +6.351%) |  2.798  2.802 ( +0.143%)
enwik9      gcc-7      1 |  228.4  233.6 ( +2.277%) |  2.798  2.802 ( +0.143%)
enwik9      gcc-8      1 |  235.2  226.8 ( -3.571%) |  2.798  2.802 ( +0.143%)
enwik9      gcc-10     1 |  218.8  241.2 (+10.238%) |  2.798  2.802 ( +0.143%)
enwik9      clang-6.0  1 |  227.1  234.2 ( +3.126%) |  2.798  2.802 ( +0.143%)
enwik9      clang-7    1 |  223.4  243.0 ( +8.774%) |  2.798  2.802 ( +0.143%)
enwik9      clang-8    1 |  238.7  227.7 ( -4.608%) |  2.798  2.802 ( +0.143%)
enwik9      clang-9    1 |  231.4  238.8 ( +3.198%) |  2.798  2.802 ( +0.143%)
enwik9      clang-11   1 |  235.5  234.8 ( -0.297%) |  2.798  2.802 ( +0.143%)
enwik9      clang-12   1 |  234.8  237.9 ( +1.320%) |  2.798  2.802 ( +0.143%)
enwik9      gcc-4.8    2 |  161.0  161.2 ( +0.124%) |  3.039  3.047 ( +0.263%)
enwik9      gcc-5      2 |  159.0  161.6 ( +1.635%) |  3.039  3.047 ( +0.263%)
enwik9      gcc-6      2 |  160.1  163.5 ( +2.124%) |  3.039  3.047 ( +0.263%)
enwik9      gcc-7      2 |  160.8  157.0 ( -2.363%) |  3.039  3.047 ( +0.263%)
enwik9      gcc-8      2 |  162.8  161.0 ( -1.106%) |  3.039  3.047 ( +0.263%)
enwik9      gcc-10     2 |  156.2  165.6 ( +6.018%) |  3.039  3.047 ( +0.263%)
enwik9      clang-6.0  2 |  162.9  165.7 ( +1.719%) |  3.039  3.047 ( +0.263%)
enwik9      clang-7    2 |  163.0  165.2 ( +1.350%) |  3.039  3.047 ( +0.263%)
enwik9      clang-8    2 |  158.3  158.0 ( -0.190%) |  3.039  3.047 ( +0.263%)
enwik9      clang-9    2 |  163.6  162.9 ( -0.428%) |  3.039  3.047 ( +0.263%)
enwik9      clang-11   2 |  164.7  160.6 ( -2.489%) |  3.039  3.047 ( +0.263%)
enwik9      clang-12   2 |  161.3  163.7 ( +1.488%) |  3.039  3.047 ( +0.263%)

(Benchmarked on an Intel Xeon E5-2680 v4 @ 2.40GHz.)

Status

I am satisfied with this PR and feel that it is ready to merge.

To-Do:

Achieve correctness.
Achieve ratio parity (approximately).
Achieve speed parity (approximately).
~~Search at the end of the block.~~ (Decided not to.)
Consistently improve speed over existing implementation.
Benchmark on other CPUs and architectures.

lib/compress/zstd_fast.c

Amusingly, it seems to be a non-trivial performance hit to add in final searches or even hash table insertions during cleanup. So let's not. It seems to not make any meaningful difference in compression ratio.

… Speed) Unrolling the loop to handle 2 positions in each iteration allows us to reduce the frequency of some operations that don't need to happen at every position. One such operation is the step calculation, which is a very rough heuristic anyways. It's fine if we do this a position later. The other operation is the repcode check. But since the repcode check already tries expanding back one position, we're really not missing much of importance by only trying it every other position. This commit also slightly reorders some operations.

This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new `ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is functionally a no-op.

It's a bit strange, because this is hitting the dictionary special case where the dictionary is contiguous with the input and still runs in the single- segment path. We should probably change that to hit the `extDict` path instead?

Cyan4973 · 2021-09-07T23:52:15Z

The maintenance complexity of the PR is pretty good, and a tractable level,
which is a good sign.

For additional control, I've been benchmarking this PR on a stable desktop system, using a variety of compilers.

With gcc-9.3 (default compiler), I see a generally consistent speed win at level 1,
by variable amounts, sometimes as high as +8%, sometimes at just +1%.
In one specific instance, calgary/geo, the new variant features an impressive speed gain (+30%) albeit for a slightly worse compression ratio (1.450 -> 1.435).
There is only a single instance of speed regression (silesia/nci) and even then it's tiny.

However, clang-10.0 was much less friendlier, with performances of this new PR being mostly slower,
by a small yet consistent and measurable amount, in the range -2% (with variations depending on exact file), calgary/geo being the only remaining case with a clear speed win.

clang-10.0 is the default version of my linux distro, but I can test other versions, to compensate from a biais specific to a version (or a specific build). So I tested with clang-11.0. And the good news is : it's better with this version. Though it's not better enough to make it clearly positive. It's more in the -0% territory, meaning both dev and this PR get almost the same performance, and differences are insignificant (except calgary/geo, as usual).
clang-12.0 shows a slightly better picture, with changes generally positive, although mostly by very little. But overall rather positive.

So that makes clang relatively "neutral" to this change across 3 versions.
What about gcc ? v9.3.0 is clearly positive. Let's control with a different version, like gcc-10.3.
Yep, it's still clearly positive. Maybe a bit less positive, but still clearly, with everything being faster by some amount (except silesia/nci, as previously).

So, these results make this PR a clear gain for gcc, and somewhat neutral for clang.
Not sure if something can be done to improve clang, but in the meantime, we can probably live by accepting some amount of variation (sometimes negative, sometimes positive) for this compiler.

facebook-github-bot added the CLA Signed label Aug 17, 2021

felixhandte added optimization CLA Signed and removed CLA Signed labels Aug 17, 2021

Cyan4973 reviewed Aug 19, 2021

View reviewed changes

lib/compress/zstd_fast.c Outdated Show resolved Hide resolved

Cyan4973 reviewed Aug 19, 2021

View reviewed changes

lib/compress/zstd_fast.c Show resolved Hide resolved

felixhandte changed the title ~~[WIP] Pipelined Implementation of ZSTD_fast~~ Pipelined Implementation of ZSTD_fast (~+5% Speed) Aug 20, 2021

Cyan4973 reviewed Aug 21, 2021

View reviewed changes

lib/compress/zstd_fast.c Show resolved Hide resolved

Cyan4973 reviewed Aug 23, 2021

View reviewed changes

lib/compress/zstd_fast.c Show resolved Hide resolved

felixhandte added 16 commits September 1, 2021 14:15

Fix Benchmark Corruption Display

ab8aa49

Initial Pipelined Implementation for ZSTD_fast

80bc12b

Track Step Size Statefully, Rather than Recalculating Every Time

bc768bc

Re-Order Operations for Slightly Better Performance

387840a

Shrink Pipeline from 4 Positions to 3

b092dd7

Prefetch Input in Incompressible Sections (+0.25% Speed)

35932ab

Give Up on Searching End of Block

7c24c3e

Amusingly, it seems to be a non-trivial performance hit to add in final searches or even hash table insertions during cleanup. So let's not. It seems to not make any meaningful difference in compression ratio.

Nit: Dedup idx0 and idx1

8706bc1

Nit: Only Store 2 Hash Variables

991d660

Add ip1 + 128 Prefetch; Tiny Cleanup

57a100f

Tweak Step

64054de

Deduplicate Implementations

15e67bf

This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new `ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is functionally a no-op.

Change Target Size in Fuzzer

98d3df3

It's a bit strange, because this is hitting the dictionary special case where the dictionary is contiguous with the input and still runs in the single- segment path. We should probably change that to hit the `extDict` path instead?

Fix VS Build: Explicitly Cast to Narrow Ints

d6fd776

Update results.csv

b0977e4

felixhandte force-pushed the zstd-fast-pipelined branch from 9d0f7e9 to b0977e4 Compare September 1, 2021 18:45

Cyan4973 approved these changes Sep 7, 2021

View reviewed changes

felixhandte merged commit d68aa19 into facebook:dev Sep 9, 2021

felixhandte mentioned this pull request Sep 9, 2021

Pipelined Implementation of ZSTD_dfast #2774

Merged

5 tasks

danlark1 mentioned this pull request Oct 21, 2021

Compression ratio for --fast=2 and higher became significantly worse. Expected? #2827

Closed

felixhandte mentioned this pull request Dec 10, 2021

Stagger Stepping in Negative Levels #2921

Merged

embg mentioned this pull request Mar 4, 2022

Software pipeline for ZSTD_compressBlock_fast_dictMatchState (+5-6% compression speed) #3086

Merged

embg mentioned this pull request Apr 19, 2022

Software pipeline for ZSTD_compressBlock_fast_extDict (+4-9% compression speed) #3114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipelined Implementation of ZSTD_fast (~+5% Speed) #2749

Pipelined Implementation of ZSTD_fast (~+5% Speed) #2749

felixhandte commented Aug 17, 2021 •

edited

Loading

Cyan4973 commented Sep 7, 2021 •

edited

Loading

Pipelined Implementation of ZSTD_fast (~+5% Speed) #2749

Pipelined Implementation of ZSTD_fast (~+5% Speed) #2749

Conversation

felixhandte commented Aug 17, 2021 • edited Loading

Description

Parsing Differences

Benchmarks

Status

To-Do:

Cyan4973 commented Sep 7, 2021 • edited Loading

felixhandte commented Aug 17, 2021 •

edited

Loading

Cyan4973 commented Sep 7, 2021 •

edited

Loading