Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of dctx by reutilizing dst buffer #2751

Merged
merged 16 commits into from
Oct 25, 2021
Merged

Conversation

binhdvo
Copy link
Contributor

@binhdvo binhdvo commented Aug 20, 2021

WIP, this round of optimizations has gotten performance much closer to parity, though it has introduced a checksum error in the 270MB file test I'm still tracking down. This however hasn't affected the smaller size tests; benchmarks indicate that in some cases we now see performance improvements on top of the memory reduction due to the improved cache behavior. However there's other cases, at low file sizes and high compressibility, where we are still about 1% behind parity.

Benchmark

old performance

./tests/fullbench -b2 -B1000 -P0
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 4987.5 MB/s ( 1000)
./tests/fullbench -b2 -B1000 -P10
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 612.0 MB/s ( 1000)
./tests/fullbench -b2 -B1000 -P50
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 585.8 MB/s ( 1000)
./tests/fullbench -b2 -B1000 -P90
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 2597.8 MB/s ( 1000)
./tests/fullbench -b2 -B1000 -P100
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 2635.7 MB/s ( 1000)
./tests/fullbench -b2 -B10000 -P0
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 36167.5 MB/s ( 10000)
./tests/fullbench -b2 -B10000 -P10
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 1292.4 MB/s ( 10000)
./tests/fullbench -b2 -B10000 -P50
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 1671.9 MB/s ( 10000)
./tests/fullbench -b2 -B10000 -P90
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 3205.2 MB/s ( 10000)
./tests/fullbench -b2 -B10000 -P100
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 6179.7 MB/s ( 10000)
./tests/fullbench -b2 -B100000 -P0
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 51880.0 MB/s ( 100000)
./tests/fullbench -b2 -B100000 -P10
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 1237.1 MB/s ( 100000)
./tests/fullbench -b2 -B100000 -P50
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 2151.4 MB/s ( 100000)
./tests/fullbench -b2 -B100000 -P90
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 3193.0 MB/s ( 100000)
./tests/fullbench -b2 -B100000 -P100
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 7095.5 MB/s ( 100000)
./tests/fullbench -b2 -B1000000 -P0
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 34106.0 MB/s ( 1000000)
./tests/fullbench -b2 -B1000000 -P10
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 1309.3 MB/s ( 1000000)
./tests/fullbench -b2 -B1000000 -P50
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 1973.5 MB/s ( 1000000)
./tests/fullbench -b2 -B1000000 -P90
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 2637.8 MB/s ( 1000000)
./tests/fullbench -b2 -B1000000 -P100
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 14852.5 MB/s ( 1000000)

new performance

./tests/fullbench -b2 -B1000 -P0
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 4999.4 MB/s ( 1000)
./tests/fullbench -b2 -B1000 -P10
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 609.1 MB/s ( 1000)
./tests/fullbench -b2 -B1000 -P50
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 583.5 MB/s ( 1000)
./tests/fullbench -b2 -B1000 -P90
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 2402.1 MB/s ( 1000)
./tests/fullbench -b2 -B1000 -P100
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000 bytes :
2#decompress : 2587.4 MB/s ( 1000)
./tests/fullbench -b2 -B10000 -P0
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 37441.8 MB/s ( 10000)
./tests/fullbench -b2 -B10000 -P10
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 1297.5 MB/s ( 10000)
./tests/fullbench -b2 -B10000 -P50
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 1656.7 MB/s ( 10000)
./tests/fullbench -b2 -B10000 -P90
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 3081.0 MB/s ( 10000)
./tests/fullbench -b2 -B10000 -P100
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 10000 bytes :
2#decompress : 6127.2 MB/s ( 10000)
./tests/fullbench -b2 -B100000 -P0
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 52215.9 MB/s ( 100000)
./tests/fullbench -b2 -B100000 -P10
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 1252.2 MB/s ( 100000)
./tests/fullbench -b2 -B100000 -P50
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 2146.6 MB/s ( 100000)
./tests/fullbench -b2 -B100000 -P90
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 3614.6 MB/s ( 100000)
./tests/fullbench -b2 -B100000 -P100
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 100000 bytes :
2#decompress : 7084.7 MB/s ( 100000)
./tests/fullbench -b2 -B1000000 -P0
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 33857.1 MB/s ( 1000000)
./tests/fullbench -b2 -B1000000 -P10
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 1288.9 MB/s ( 1000000)
./tests/fullbench -b2 -B1000000 -P50
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 2095.4 MB/s ( 1000000)
./tests/fullbench -b2 -B1000000 -P90
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 2786.2 MB/s ( 1000000)
./tests/fullbench -b2 -B1000000 -P100
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Aug 19 2021) ***
Sample 1000000 bytes :
2#decompress : 15258.4 MB/s ( 1000000)

@binhdvo binhdvo marked this pull request as draft August 20, 2021 16:09
lib/zstd.h Outdated Show resolved Hide resolved
@binhdvo
Copy link
Contributor Author

binhdvo commented Aug 24, 2021

Still fixing some fuzzer errors that are blocking CI, updated performance numbers below

(All in MB/s)

1000 tests:

        orig          new
-P0     5107.3        5264.5 
-P10    628.7         627.9  
-P50    603.3         608.0  
-P90    2665.0        2505.6 
-P100   2688.1        2652.5 

10000 tests:

        orig          new
-P0     37406.7       39007.2
-P10    1328.4        1335.2 
-P50    1715.6        1699.0 
-P90    3259.1        3204.5 
-P100   6408.1        6419.9 

100000 tests:

        orig          new
-P0     52696.0       53962.9
-P10    1263.9        1291.1 
-P50    1801.9        2208.2 
-P90    3255.7        3329.3 
-P100   7338.0        7422.0 

1000000 tests:

        orig          new
-P0     34991.3       34964.2
-P10    1353.2        1337.7 
-P50    2122.0        2158.6 
-P90    2785.3        2777.5 
-P100   15813.2       15834.1

@Cyan4973
Copy link
Contributor

I presume you are using the synthetic content generator.
While it's a good way to start investigating this topic,
consider also measuring some "real life" content.

enwik is a good example of "real" content that is different from the generator, as it's full of small overlapping matches, which is something the synthetic generator doesn't emulate well.

Also, look at individual files (not the whole package) in calgary.tar or even silesia.tar, to see if some of them "stand out".

@senhuang42
Copy link
Contributor

senhuang42 commented Aug 24, 2021

The memory reductions are nice!

I took a quick look at this on enwik7 with perf, (using the command, perf stat -e "cycles,branches,branch-misses,instructions,page-faults,L1-dcache-load-misses,L1-dcache-loads,L1-dcache-stores" -- ./zstd -b1 ../data/enwik7) and got the following results:

dev

 1#enwik7            : 9.54 MiB -> 3.92 MiB (2.432),  262.2 MB/s, 1039.5 MB/s 

 Performance counter stats for './zstd -b1 ../data/enwik7':

    22,422,771,549      cycles                                                        (71.43%)
     5,897,438,746      branches                                                      (57.15%)
       159,972,762      branch-misses             #    2.71% of all branches          (57.15%)
    59,148,636,696      instructions              #    2.64  insn per cycle           (71.43%)
             7,523      page-faults                                                 
       925,264,639      L1-dcache-load-misses     #    5.24% of all L1-dcache accesses  (71.43%)
    17,655,116,114      L1-dcache-loads                                               (71.42%)
     7,476,309,120      L1-dcache-stores                                              (71.42%)

       6.237735345 seconds time elapsed

       6.214009000 seconds user
       0.009980000 seconds sys

binhdctx2

 1#enwik7            : 9.54 MiB -> 3.92 MiB (2.432),  262.1 MB/s,  901.9 MB/s 

 Performance counter stats for './zstd -b1 ../data/enwik7':

    22,577,928,991      cycles                                                        (71.44%)
     5,655,449,556      branches                                                      (57.15%)
       152,349,238      branch-misses             #    2.69% of all branches          (57.15%)
    55,533,277,104      instructions              #    2.46  insn per cycle           (71.44%)
             7,508      page-faults                                                 
       915,748,699      L1-dcache-load-misses     #    5.54% of all L1-dcache accesses  (71.44%)
    16,533,161,474      L1-dcache-loads                                               (71.41%)
     6,713,694,511      L1-dcache-stores                                              (71.41%)

       6.280837396 seconds time elapsed

       6.257051000 seconds user
       0.009977000 seconds sys

The stats unfortunately are not that helpful, since it seems that instructions per cycle decreases without a very clear culprit, but it seems to indicate a 13-14% decompression speed regression for me.

for ( ; ; ) {
seq_t sequence = ZSTD_decodeSequence(&seqState, isLongOffset);
size_t oneSeqSize;
if (litPtr + sequence.litLength > dctx->litBufferEnd)
Copy link
Contributor

@senhuang42 senhuang42 Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that might be helpful is looking at the perf annotations, with perf record --call-graph=dwarf -e cycles:ppp -- ./zstd [...]. I see the following here:
Screen Shot 2021-08-24 at 12 26 50 PM

Which means this branch is maybe the culprit for some extra cycles? If I annotate this condition with LIKELY then for some reason I get a 5-6% speedup in decompression (900 -> 950 MB/s on enwik7). So maybe it's worth investigating this branch/loop in particular?

This loop is the hot loop in decompression, and is pretty sensitive to any changes.

EDIT: Given that the LIKELY condition doesn't seem to make much sense, it might have something to do with alignment?

@binhdvo
Copy link
Contributor Author

binhdvo commented Aug 27, 2021

Updated fixing all the fuzzer errors and rebased to current version; there is still one more error in the stream fuzzer (zstreamtests) that should not affect these performance results. Shifted the alignment and annotations; performance on enwik at compression level 1 now looks good, as well as across the rest of the silesia corpus. However, performance on high compression levels seems to be lagging.

@binhdvo
Copy link
Contributor Author

binhdvo commented Aug 27, 2021

Silesia+enwik corpus tests level 1

            orig     new
enwik7.txt  1162.9   1180.2
dickens     1181.8   1179.0
mozilla     1144.9   1186.0
mr          1398.0   1414.0
nci         2056.1   2067.2
ooffice     965.2    991.2
osdb        1432.3   1438.6
reymont     1179.4   1202.9
samba       1636.5   1672.5
sao         1081.4   1061.0
webster     1262.7   1278.6
xml         2054.5   2044.9
x-ray       1071.0   1055.7


Silesia+enwik corpus tests level 10

            orig     new
enwik7.txt  1043.4   1056.5
dickens     1046.4   1008.9
mozilla     1171.6   1220.2
mr          1067.1   1045.9
nci         2792.4   2813.2
ooffice     900.1    913.2
osdb        1400.8   1356.0
reymont     1287.7   1278.7
samba       1812.0   1797.0
sao         938.4    922.8
webster     1227.6   1207.6
xml         2538.8   2515.5
x-ray       648.7    621.2


Silesia+enwik corpus tests level 22

            orig     new
enwik7.txt  1057.6   1050.4
dickens     1072.5   1006.8
mozilla     868.7    830.1
mr          1007.6   972.4
nci         2854.5   2751.8
ooffice     689.4    699.3
osdb        1443.4   1470.7
reymont     1326.7   1300.9
samba       1570.4   1535.8
sao         844.2    835.8
webster     1159.2   1119.9
xml         2641.7   2472.7
x-ray       612.8    596.4

@Cyan4973
Copy link
Contributor

Cyan4973 commented Aug 27, 2021

performance on high compression levels seems to be lagging.

In general, high compression levels tend to generate more and smaller sequences, on top of much less literals, meaning that the resulting literals buffer will be pretty small, and the nb of bytes per literals read will likely be very small too (0 or 1 most of the times). This of course doesn't apply for non-compressible sources.

A way to check that would be to print the nb of sequences and nb of literals decoded.

Anyway, this could be a hint to analyze performance of your modification.
If the nb of literals is small, I would have expected your modification to mostly push those decoded literals into the "private" small buffer into the DCtx. Which means, the behavior should be identical to original. This assumes that there is no additional branch taking decisions about which buffer to read into.
If there is a difference, that might be due to the presence of an additional branch in or around the execSequence logic, which is also executed more often since there are now more sequences.

const BYTE* const prefixStart = (const BYTE*) (dctx->prefixStart);
const BYTE* const vBase = (const BYTE*) (dctx->virtualStart);
const BYTE* const dictEnd = (const BYTE*) (dctx->dictEnd);
unsigned litInDst = litPtr >= ostart && litPtr < oend;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an opportunity here to reduce litInDst.

What you want to avoid is overwriting the literals stored in dst.
Knowing that you decode a block, you also know that this block can't be larger than maxBlockSize.
You may not know formally maxBlockSize at this point, since it depends windowSize,
but you could simply assume that it's necessarily <= 128 KB.

Here oend is the end of dst buffer.
When this buffer is large, it's much farther away than 128 KB.
So you could instead calculate a maxBlockEnd pointer, and compare litPtr to that.

This should reduce the nb of times litInDst == 1,
thus using the simpler & faster decoding loop more often.

@binhdvo
Copy link
Contributor Author

binhdvo commented Aug 31, 2021

Some further performance improvements, the new version fairly consistently wins at lower compression levels, and now has some wins and losses at higher compression levels across enwik and silesia.

Silesia+enwik corpus tests level 1

            orig     new
enwik7.txt  1116.6   1131.1
dickens     1142.3   1141.8
mozilla     1120.0   1143.5
mr          1347.9   1351.2
nci         2000.1   2020.3
ooffice     927.3    956.0
osdb        1389.3   1398.3
reymont     1156.4   1164.5
samba       1548.0   1583.2
sao         1017.6   1002.6
webster     1178.4   1199.2
xml         1908.8   1949.5
x-ray       1026.4   1039.0


Silesia+enwik corpus tests level 10

            orig     new
enwik7.txt  912.2    919.1
dickens     908.9    867.8
mozilla     1009.2   1044.7
mr          921.9    899.8
nci         2387.6   2411.5
ooffice     783.8    788.9
osdb        1232.5   1173.6
reymont     1105.9   1090.7
samba       1558.1   1556.2
sao         800.9    785.8
webster     1040.8   1027.8
xml         2224.9   2214.4
x-ray       553.5    527.2


Silesia+enwik corpus tests level 22

            orig     new
enwik7.txt  1028.8   1037.5
dickens     1047.3   994.8
mozilla     861.1    846.1
mr          1029.0   969.4
nci         2781.5   2831.2
ooffice     684.0    684.9
osdb        1412.9   1443.7
reymont     1332.0   1278.4
samba       1602.1   1559.6
sao         831.1    837.5
webster     1133.6   1126.7
xml         2513.6   2499.7
x-ray       607.7    599.3

@@ -99,6 +100,7 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
U32 const lhlCode = (istart[0] >> 2) & 3;
U32 const lhc = MEM_readLE32(istart);
size_t hufSuccess;
size_t expectedWriteSize = MIN(ZSTD_BLOCKSIZE_MAX, dstCapacity);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cause of the last remaining streaming fuzzer error. It appears in some cases, having the lit buffer stored ZSTD_BLOCKSIZE_MAX beyond dst is enough for it to stomp the extended dictionary in streaming mode, so if a match is at fairly long offset and into this area it can copy incorrectly from the litbuffer. I had thought that ZSTD_BLOCKSIZE_MAX was guaranteed not to be in the range of where we needed the extDict from previous offsets; still investigating what the correct safe offset should be here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the fuzzer error was fixed by the increased streaming buffer size

@binhdvo
Copy link
Contributor Author

binhdvo commented Sep 7, 2021

All fuzzing test issues have been fixed, CI tests now pass. Performance looks generally good on Mac, with consistent performance improvements on top of the memory savings across enwik7 and silesia at default compression levels. At maximum compression levels there are some wins and some losses. On the devserver there were initially some stark losses. Some adjustments were made to alignment across the two loops and scratch buffer size; these have improved these losses but they are still present.

Silesia+enwik corpus tests level 1, Mac

            orig     new
enwik7.txt  1085.9   1095.2
dickens     1159.2   1196.7
mozilla     1135.1   1151.6
mr          1371.6   1384.0
nci         2030.5   2049.8
ooffice     950.8    966.7
osdb        1389.1   1429.2
reymont     1096.6   1236.1
samba       1625.5   1670.4
sao         1062.4   1008.4
webster     1245.3   1285.2
xml         2019.7   2059.3
x-ray       1049.0   1049.2


Silesia+enwik corpus tests level 22, Mac

            orig     new
enwik7.txt  962.9    985.2
dickens     1041.6   1000.9
mozilla     875.5    885.1
mr          988.0    963.1
nci         2763.7   2928.6
ooffice     683.8    695.7
osdb        1418.9   1404.4
reymont     1299.4   1281.7
samba       1564.4   1567.3
sao         813.3    815.7
webster     1134.3   1091.5
xml         2544.2   2522.8
x-ray       606.3    611.9


Silesia+enwik corpus tests level 1, devvm (57GB)

            orig     new
enwik7.txt  730.5    705.4
dickens     744.6    719.7
mozilla     715.1    714.3
mr          892.7    872.1
nci         1143.0   1120.1
ooffice     663.0    652.2
osdb        848.2    827.1
reymont     707.0    721.5
samba       972.0    947.4
sao         633.7    599.1
webster     746.2    718.6
xml         1346.2   1344.0
x-ray       739.7    738.0


Silesia+enwik corpus tests level 22, devvm (57GB)

            orig     new
enwik7.txt  538.7    535.6
dickens     414.9    494.1
mozilla     459.2    420.6
mr          441.8    344.6
nci         1291.3   1212.6
ooffice     402.1    385.3
osdb        619.4    466.1
reymont     661.9    709.5
samba       851.0    791.4
sao         421.6    400.3
webster     372.4    347.8
xml         1737.8   1675.8
x-ray       266.5    318.7

@@ -106,6 +106,8 @@ typedef struct {
size_t ddictPtrCount;
} ZSTD_DDictHashSet;

#define ZSTD_LITBUFFEREXTRASIZE 8192 /* extra buffer reduces amount of dst required to store litBuffer */
Copy link
Contributor

@Cyan4973 Cyan4973 Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 KB is a significant budget.
Ideally, I would have preferred this permanent internal buffer to be reduced to something like 512 bytes (or less).
But hey, if you state that this large size is necessary otherwise it impacts performance, there is room for discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that it is necessary, bumping it up was enough to get several cases that otherwise wouldn't have to run entirely within the buffer, but it's not 100% clear that that is a requirement to get adequate performance.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Sep 8, 2021

I've been testing this PR on my (stable) desktop,
and immediately, I've witnessed some significant decompression speed regression,
not exactly small (more than -10%, though it varies depending on file).

Corpus tested : individual components of calgary
Parameter : level 1
CPU : i7-9700k (turbo off)
Compiler : gcc v9.3.0
OS : Ubuntu 20.04

I haven't tested more compilers nor more corpuses, but these regressions are large enough by themselves to warrant an investigation.

#if defined(__GNUC__) && defined(__x86_64__)
/* Align the decompression loop to 32 + 16 bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a relatively useful and informative comment.
Unless you have reason to believe that the comment is misleading or useless,
please preserve this knowledge in the source code.

Copy link
Contributor Author

@binhdvo binhdvo Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, I had intended to move it slightly, not remove it.

@binhdvo
Copy link
Contributor Author

binhdvo commented Sep 8, 2021

I've unwrapped one of the loops in a manner that should help the compiler optimize better for the dev server; initial performance tests look significantly better than before but overall performance on devvm still isn't quite where I'd like it. MacOS performance still remains good after the changes.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Sep 8, 2021

Good news !

Just a note :
sometimes, it's better to squash several commits into a single one,
because the presence of intermediate commits is actually not helpful,
for example it's faulty, did not make sense, or featured some trivial typo, etc.

In other occasions, keeping commits separated is actually better for review.
This is typically the case for larger topics and PR, which are necessarily achieved through progressive modifications.
The advantage is that it makes it easier to follow each modification (assuming it's fully formed, meaning it brings an improvement on its own, and still compiles fine), thus providing a better understanding for review.

Unfortunately, this is not a trivial clear cut rule. One has to use its own judgment.

In this case, for example, I'm unable to tell what "unwrapped one of the loops" means, as this design choice is swamped inside the large PR. So if it's a point worth reviewing, it's better to make it stand out, by giving it its own commit.

@binhdvo
Copy link
Contributor Author

binhdvo commented Sep 13, 2021

Median of 5 runs each (there was a bit of variation). As before performance looks favorable on MacOS. On devvm, it looks generally comparable across gcc, gcc.par, and clang.par with some tradeoffs at higher compression levels. On devbig there are still regressions, mainly on gcc and gcc.par.

Silesia+enwik corpus tests, MacOS

            orig     new
level 1
enwik7.txt  904.3    950.2
silesia.tar 1104.0   1125.8 

level 19
enwik7.txt  927.4    928.4 
silesia.tar 1063.3   1140.4 

level 22
enwik7.txt  956.3    1005.1 
silesia.tar 928.3    938.5 


Silesia+enwik corpus tests, devvm gcc

            orig     new
level 1
enwik7.txt  501.9    508.6 
silesia.tar 541.3    536.6  

level 19
enwik7.txt  473.8    499.1  
silesia.tar 432.4    437.3 

level 22
enwik7.txt  492.9    488.8  
silesia.tar 434.6    436.1 


Silesia+enwik corpus tests, devvm gcc.par

            orig     new
level 1
enwik7.txt  481.5    506.5 
silesia.tar 552.9    546.4  

level 19
enwik7.txt  480.0    512.8  
silesia.tar 454.6    447.8 

level 22
enwik7.txt  516.7    491.5  
silesia.tar 430.0    433.2 


Silesia+enwik corpus tests, devvm clang.par

            orig     new
level 1
enwik7.txt  512.1    505.4 
silesia.tar 544.0    540.9  

level 19
enwik7.txt  520.0    503.9  
silesia.tar 445.2    443.4 

level 22
enwik7.txt  521.1    517.2  
silesia.tar 441.0    441.4 


Silesia+enwik corpus tests, devbig gcc

            orig     new
level 1
enwik7.txt  809.3    787.2 
silesia.tar 893.4    880.6  

level 19
enwik7.txt  757.6    743.1  
silesia.tar 753.9    746.1

level 22
enwik7.txt  757.4    725.8 
silesia.tar 710.7    713.9  


Silesia+enwik corpus tests, devbig gcc.par

            orig     new
level 1
enwik7.txt  833.1    828.0 
silesia.tar 862.8    851.7  

level 19
enwik7.txt  787.1    753.7  
silesia.tar 772.9    742.8

level 22
enwik7.txt  752.1    743.2 
silesia.tar 697.4    712.8  


Silesia+enwik corpus tests, devbig clang.par

            orig     new
level 1
enwik7.txt  860.5    860.5 
silesia.tar 899.8    900.9  

level 19
enwik7.txt  794.2    790.6  
silesia.tar 754.8    751.3

level 22
enwik7.txt  796.4    790.7 
silesia.tar 764.0    741.3  

@@ -121,8 +123,28 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
litCSize = (lhc >> 22) + ((size_t)istart[4] << 10);
break;
}
RETURN_ERROR_IF(litSize > 0 && dst == NULL, dstSize_tooSmall, "NULL not handled");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question :
is this scenario possible,
for example, could it happen if the user would provide a NULL pointer as destination,

or should it normally never happen,
for example because the code calling this function will ensure it never provides NULL as dst ?

If this is the second case, this condition is rather an assert(),
and it should also be clearly documented, so that the caller knows that this scenario is not allowed.

It seems like a small detail (is it an error ? or is it forbidden ?), but this kind of decision trickles down throughout the whole code, so it's important to be familiar with the logic.

Copy link
Contributor Author

@binhdvo binhdvo Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a fuzzer test that passes NULL and expects this error response, this was added to maintain that behavior

https://github.com/facebook/zstd/blob/dev/tests/fuzzer.c#L700

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would re-write this check as litSize > dstCapacity, which is more general.

Edit: The check below expectedWriteSize < litSize should cover this case.

@binhdvo
Copy link
Contributor Author

binhdvo commented Oct 18, 2021

Benchmarks on Yann's private server after separating execSequence functions into split buffer versions and recalculating alignments across the three loops. The difference between 16k and max size lit buffer are not that large, however the gains overall for the changes are not to total parity as was hoped for. I attempted some benchmarks after equalizing the wildcopy, safecopy, and copy16 behavior as well and they did not make much difference.

16k buffer, gcc-7
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1032.6   1041.6   +0.9
mozilla.1.zst                       977.5    991.9    +1.5
mr.1.zst                            1201.9   1203.6   +0.1
nci.1.zst                           1593.6   1645.4   +3.3
ooffice.1.zst                       918.6    921.2    +0.3
osdb.1.zst                          1161.8   1167.2   +0.5
reymont.1.zst                       999.5    1006.8   +0.7
samba.1.zst                         1332.9   1348.1   +1.1
sao.1.zst                           951.1    947.5    -0.4
webster.1.zst                       1054.8   1064.0   +0.9
xml.1.zst                           1577.6   1605.1   +1.7
x-ray.1.zst                         1170.6   1169.2   -0.1

bib.1.zst                           1020.6   1039.1   +1.8
book1.1.zst                         1047.7   1050.7   +0.3
book2.1.zst                         1032.8   1037.4   +0.4
geo.1.zst                           1374.8   1332.1   -3.1
news.1.zst                          1081.7   1075.7   -0.6
obj1.1.zst                          1075.0   1071.7   -0.3
obj2.1.zst                          903.8    910.1    +0.7
paper1.1.zst                        946.7    953.1    +0.7
paper2.1.zst                        964.1    977.1    +1.3
pic.1.zst                           1874.2   1885.1   +0.6
progc.1.zst                         945.1    950.9    +0.6
progl.1.zst                         1118.2   1158.4   +3.6
progp.1.zst                         1122.7   1158.8   +3.2
trans.1.zst                         1166.1   1238.2   +6.2

alice29.txt.1.zst                   914.4    919.1    +0.5
syoulik.txt.1.zst                   991.3    971.5    -2.0
cp.html.1.zst                       1067.8   1071.1   +0.3
fields.c.1.zst                      785.0    786.2    +0.2
grammar.lsp.1.zst                   598.3    593.7    -0.8
kennedy.xls.1.zst                   939.2    947.0    +0.8
lcet10.txt.1.zst                    1050.6   1053.3   +0.3
lrabn12.txt.1.zst                   1034.5   1039.5   +0.5
ptt5.1.zst                          1872.0   1884.6   +0.7
sum.1.zst                           997.4    1021.6   +2.4
xargs.1.1.zst                       623.3    621.4    -0.3


128k buffer, gcc-7
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1032.6   1044.5   +1.2
mozilla.1.zst                       977.0    993.1    +1.6
mr.1.zst                            1204.7   1206.8   +0.2
nci.1.zst                           1593.6   1646.6   +3.3
ooffice.1.zst                       919.1    923.8    +0.5
osdb.1.zst                          1162.1   1170.2   +0.7
reymont.1.zst                       997.8    1006.9   +0.9
samba.1.zst                         1334.6   1346.9   +0.9
sao.1.zst                           952.6    948.0    -0.5
webster.1.zst                       1057.3   1066.3   +0.9
xml.1.zst                           1579.6   1607.7   +1.8
x-ray.1.zst                         1166.2   1170.9   +0.4

bib.1.zst                           1018.6   1046.4   +2.7
book1.1.zst                         1047.5   1057.7   +1.0
book2.1.zst                         1031.9   1040.8   +0.9
geo.1.zst                           1379.1   1374.6   -0.3
news.1.zst                          1080.9   1091.4   +1.0
obj1.1.zst                          1072.1   1076.2   +0.4
obj2.1.zst                          904.3    923.7    +2.1
paper1.1.zst                        944.3    958.0    +1.5
paper2.1.zst                        962.3    971.7    +1.0
pic.1.zst                           1873.0   1883.2   +0.5
progc.1.zst                         944.7    955.4    +1.1
progl.1.zst                         1116.8   1159.9   +3.9
progp.1.zst                         1124.7   1157.7   +2.9
trans.1.zst                         1166.8   1245.2   +6.7

alice29.txt.1.zst                   911.1    922.4    +1.2
syoulik.txt.1.zst                   992.1    1003.4   +1.1
cp.html.1.zst                       1069.1   1077.9   +0.8
fields.c.1.zst                      785.0    786.9    +0.2
grammar.lsp.1.zst                   598.8    599.1    +0.1
kennedy.xls.1.zst                   943.4    945.2    +0.2
lcet10.txt.1.zst                    1053.1   1061.3   +0.8
lrabn12.txt.1.zst                   1035.1   1045.8   +1.0
ptt5.1.zst                          1871.6   1884.2   +0.7
sum.1.zst                           993.5    1028.6   +3.5
xargs.1.1.zst                       623.4    616.4    -1.1


16k buffer, gcc-8
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1066.9   1064.7   -0.2
mozilla.1.zst                       995.4    1010.9   +1.6
mr.1.zst                            1232.8   1230.3   -0.2
nci.1.zst                           1647.8   1695.0   +2.9
ooffice.1.zst                       937.2    947.1    +1.1
osdb.1.zst                          1214.0   1216.2   +0.2
reymont.1.zst                       1033.4   1048.6   +1.5
samba.1.zst                         1391.3   1406.5   +1.1
sao.1.zst                           976.2    975.1    -0.1
webster.1.zst                       1102.8   1113.7   +1.0
xml.1.zst                           1655.5   1686.1   +1.8
x-ray.1.zst                         1176.3   1170.4   -0.5

bib.1.zst                           1073.9   1045.2   -2.7
book1.1.zst                         1088.6   1071.0   -1.6
book2.1.zst                         1065.3   1064.8   -0.0
geo.1.zst                           1378.0   1346.8   -2.3
news.1.zst                          1125.2   1096.9   -2.5
obj1.1.zst                          1083.8   1091.5   +0.7
obj2.1.zst                          922.5    909.8    -1.4
paper1.1.zst                        983.0    998.1    +1.5
paper2.1.zst                        1001.9   986.7    -1.5
pic.1.zst                           1907.5   1954.7   +2.5
progc.1.zst                         987.7    1003.8   +1.6
progl.1.zst                         1226.0   1237.8   +1.0
progp.1.zst                         1224.4   1226.2   +0.1
trans.1.zst                         1287.5   1314.6   +2.1

alice29.txt.1.zst                   949.0    920.7    -3.0
syoulik.txt.1.zst                   1029.9   973.6    -5.5
cp.html.1.zst                       1128.4   1135.3   +0.6
fields.c.1.zst                      820.3    833.0    +1.5
grammar.lsp.1.zst                   601.9    616.7    +2.5
kennedy.xls.1.zst                   965.9    949.5    -1.7
lcet10.txt.1.zst                    1092.9   1077.3   -1.4
lrabn12.txt.1.zst                   1073.9   1060.5   -1.2
ptt5.1.zst                          1907.9   1958.9   +2.7
sum.1.zst                           1046.6   1066.8   +1.9
xargs.1.1.zst                       633.3    654.0    +3.3


128k buffer, gcc-8
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1071.4   1070.4   -0.1
mozilla.1.zst                       996.0    1011.2   +1.5
mr.1.zst                            1235.1   1232.8   -0.2
nci.1.zst                           1647.3   1692.3   +2.7
ooffice.1.zst                       937.1    945.9    +0.9
osdb.1.zst                          1217.0   1218.3   +0.1
reymont.1.zst                       1035.4   1042.7   +0.7
samba.1.zst                         1388.6   1409.4   +1.5
sao.1.zst                           975.0    976.5    +0.2
webster.1.zst                       1102.0   1113.7   +1.1
xml.1.zst                           1648.0   1688.5   +2.5
x-ray.1.zst                         1173.5   1172.0   -0.1

bib.1.zst                           1072.3   1092.4   +1.9
book1.1.zst                         1086.0   1084.7   -0.1
book2.1.zst                         1061.0   1071.1   +1.0
geo.1.zst                           1380.5   1375.9   -0.3
news.1.zst                          1127.8   1118.9   -0.8
obj1.1.zst                          1086.4   1093.3   +0.6
obj2.1.zst                          922.5    932.8    +1.1
paper1.1.zst                        985.5    1001.2   +1.6
paper2.1.zst                        999.9    1010.3   +1.0
pic.1.zst                           1911.5   1955.4   +2.3
progc.1.zst                         994.2    1004.9   +1.1
progl.1.zst                         1235.3   1243.1   +0.6
progp.1.zst                         1218.0   1235.3   +1.4
trans.1.zst                         1291.9   1310.6   +1.4

alice29.txt.1.zst                   951.7    955.8    +0.4
syoulik.txt.1.zst                   1031.0   1041.2   +1.0
cp.html.1.zst                       1131.8   1129.1   -0.2
fields.c.1.zst                      819.0    838.2    +2.3
grammar.lsp.1.zst                   601.6    619.4    +3.0
kennedy.xls.1.zst                   968.6    950.3    -1.9
lcet10.txt.1.zst                    1091.9   1100.7   +0.8
lrabn12.txt.1.zst                   1075.2   1074.9   -0.0
ptt5.1.zst                          1907.9   1957.5   +2.6
sum.1.zst                           1047.7   1065.7   +1.7
xargs.1.1.zst                       635.5    652.2    +2.6


16k buffer, gcc-9
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1068.2   1056.1   -1.1
mozilla.1.zst                       1019.7   983.0    -3.6
mr.1.zst                            1228.8   1212.4   -1.3
nci.1.zst                           1691.9   1599.7   -5.4
ooffice.1.zst                       945.1    926.9    -1.9
osdb.1.zst                          1215.8   1194.5   -1.8
reymont.1.zst                       1046.1   1028.2   -1.7
samba.1.zst                         1404.3   1367.9   -2.6
sao.1.zst                           973.3    966.9    -0.7
webster.1.zst                       1112.7   1092.0   -1.9
xml.1.zst                           1676.7   1617.2   -3.5
x-ray.1.zst                         1171.7   1170.5   -0.1

bib.1.zst                           1095.5   1068.6   -2.5
book1.1.zst                         1076.9   1068.8   -0.8
book2.1.zst                         1071.4   1052.3   -1.8
geo.1.zst                           1381.7   1339.1   -3.1
news.1.zst                          1116.0   1091.5   -2.2
obj1.1.zst                          1096.7   1082.0   -1.3
obj2.1.zst                          938.1    913.7    -2.6
paper1.1.zst                        995.7    981.0    -1.5
paper2.1.zst                        1010.1   993.9    -1.6
pic.1.zst                           1947.9   1873.9   -3.8
progc.1.zst                         1002.6   987.5    -1.5
progl.1.zst                         1234.2   1202.9   -2.5
progp.1.zst                         1224.1   1201.9   -1.8
trans.1.zst                         1324.1   1274.8   -3.7

alice29.txt.1.zst                   954.4    933.7    -2.2
syoulik.txt.1.zst                   1035.4   999.5    -3.5
cp.html.1.zst                       1151.0   1120.5   -2.6
fields.c.1.zst                      838.5    815.0    -2.8
grammar.lsp.1.zst                   612.3    598.1    -2.3
kennedy.xls.1.zst                   1014.1   948.3    -6.5
lcet10.txt.1.zst                    1096.7   1071.0   -2.3
lrabn12.txt.1.zst                   1069.5   1050.3   -1.8
ptt5.1.zst                          1947.0   1871.0   -3.9
sum.1.zst                           1055.2   1042.9   -1.2
xargs.1.1.zst                       645.3    632.9    -1.9

128k buffer, gcc-9
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1070.1   1055.9   -1.3
mozilla.1.zst                       1021.8   982.6    -3.8
mr.1.zst                            1230.0   1211.9   -1.5
nci.1.zst                           1691.0   1599.3   -5.4
ooffice.1.zst                       945.5    926.7    -2.0
osdb.1.zst                          1218.0   1194.6   -1.9
reymont.1.zst                       1050.7   1027.3   -2.2
samba.1.zst                         1406.7   1367.0   -2.8
sao.1.zst                           975.1    967.6    -0.8
webster.1.zst                       1113.1   1091.3   -2.0
xml.1.zst                           1679.9   1619.7   -3.6
x-ray.1.zst                         1168.4   1173.0   +0.4

bib.1.zst                           1093.6   1070.4   -2.1
book1.1.zst                         1079.8   1072.2   -0.7
book2.1.zst                         1069.6   1053.8   -1.5
geo.1.zst                           1379.6   1372.0   -0.6
news.1.zst                          1116.2   1102.7   -1.2
obj1.1.zst                          1098.5   1081.0   -1.6
obj2.1.zst                          937.0    906.1    -3.3
paper1.1.zst                        994.2    984.2    -1.0
paper2.1.zst                        1008.3   990.4    -1.8
pic.1.zst                           1946.4   1873.6   -3.7
progc.1.zst                         1004.3   986.4    -1.8
progl.1.zst                         1232.5   1207.8   -2.0
progp.1.zst                         1223.9   1201.2   -1.9
trans.1.zst                         1323.5   1275.1   -3.7

alice29.txt.1.zst                   956.4    940.4    -1.7
syoulik.txt.1.zst                   1038.3   1022.0   -1.6
cp.html.1.zst                       1149.6   1127.5   -1.9
fields.c.1.zst                      837.2    815.9    -2.5
grammar.lsp.1.zst                   612.8    602.2    -1.7
kennedy.xls.1.zst                   1014.8   948.0    -6.6
lcet10.txt.1.zst                    1096.9   1081.9   -1.4
lrabn12.txt.1.zst                   1069.8   1053.9   -1.5
ptt5.1.zst                          1946.7   1867.8   -4.1
sum.1.zst                           1058.6   1037.2   -2.0
xargs.1.1.zst                       642.4    631.5    -1.7


16k buffer, gcc-10
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1087.7   1068.7   -1.7
mozilla.1.zst                       1035.8   998.2    -3.6
mr.1.zst                            1242.7   1221.5   -1.7
nci.1.zst                           1714.2   1622.8   -5.3
ooffice.1.zst                       964.7    933.8    -3.2
osdb.1.zst                          1230.3   1206.3   -2.0
reymont.1.zst                       1054.0   1040.1   -1.3
samba.1.zst                         1420.4   1378.1   -3.0
sao.1.zst                           987.2    966.9    -2.1
webster.1.zst                       1124.2   1099.0   -2.2
xml.1.zst                           1689.0   1632.0   -3.4
x-ray.1.zst                         1174.0   1171.1   -0.2

bib.1.zst                           1110.8   1038.3   -6.5
book1.1.zst                         1105.3   1078.6   -2.4
book2.1.zst                         1084.0   1057.6   -2.4
geo.1.zst                           1383.1   1345.4   -2.7
news.1.zst                          1135.3   1097.4   -3.3
obj1.1.zst                          1101.5   1095.3   -0.6
obj2.1.zst                          951.7    899.4    -5.5
paper1.1.zst                        1013.5   992.8    -2.0
paper2.1.zst                        1025.3   952.1    -7.1
pic.1.zst                           1971.1   1904.7   -3.4
progc.1.zst                         1010.0   1007.1   -0.3
progl.1.zst                         1261.4   1216.6   -3.6
progp.1.zst                         1242.0   1222.2   -1.6
trans.1.zst                         1354.1   1294.0   -4.4

alice29.txt.1.zst                   957.1    905.3    -5.4
syoulik.txt.1.zst                   1051.1   986.4    -6.2
cp.html.1.zst                       1146.4   1147.1   +0.1
fields.c.1.zst                      836.4    828.3    -1.0
grammar.lsp.1.zst                   611.8    612.2    +0.1
kennedy.xls.1.zst                   1035.3   1005.3   -2.9
lcet10.txt.1.zst                    1105.3   1078.5   -2.4
lrabn12.txt.1.zst                   1090.0   1066.1   -2.2
ptt5.1.zst                          1977.6   1895.8   -4.1
sum.1.zst                           1077.0   1054.6   -2.1
xargs.1.1.zst                       642.8    643.7    +0.1


128k buffer, gcc-10
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1082.4   1075.8   -0.6
mozilla.1.zst                       1036.7   999.0    -3.6
mr.1.zst                            1245.4   1224.0   -1.7
nci.1.zst                           1716.4   1624.0   -5.4
ooffice.1.zst                       964.6    935.4    -3.0
osdb.1.zst                          1234.9   1206.5   -2.3
reymont.1.zst                       1058.0   1041.1   -1.6
samba.1.zst                         1422.5   1382.7   -2.8
sao.1.zst                           985.3    967.8    -1.8
webster.1.zst                       1126.0   1103.4   -2.0
xml.1.zst                           1688.2   1634.5   -3.2
x-ray.1.zst                         1173.5   1175.0   +0.1

bib.1.zst                           1111.6   1090.6   -1.9
book1.1.zst                         1104.1   1094.9   -0.8
book2.1.zst                         1084.9   1070.4   -1.3
geo.1.zst                           1381.9   1381.5   -0.0
news.1.zst                          1136.4   1121.3   -1.3
obj1.1.zst                          1101.8   1098.8   -0.3
obj2.1.zst                          951.1    917.1    -3.6
paper1.1.zst                        1013.9   996.9    -1.7
paper2.1.zst                        1024.1   1011.8   -1.2
pic.1.zst                           1974.9   1902.4   -3.7
progc.1.zst                         1016.4   1008.5   -0.8
progl.1.zst                         1261.4   1221.1   -3.2
progp.1.zst                         1246.1   1221.9   -1.9
trans.1.zst                         1345.3   1302.5   -3.2

alice29.txt.1.zst                   964.4    954.2    -1.1
syoulik.txt.1.zst                   1050.7   1042.2   -0.8
cp.html.1.zst                       1144.0   1147.5   +0.3
fields.c.1.zst                      835.7    833.5    -0.3
grammar.lsp.1.zst                   611.2    612.9    +0.3
kennedy.xls.1.zst                   1034.1   1007.8   -2.5
lcet10.txt.1.zst                    1109.4   1100.2   -0.8
lrabn12.txt.1.zst                   1087.7   1083.7   -0.4
ptt5.1.zst                          1974.4   1901.0   -3.7
sum.1.zst                           1079.6   1056.4   -2.1
xargs.1.1.zst                       643.1    643.2    +0.0


16k buffer, clang-12
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1077.2   1081.2   +0.4
mozilla.1.zst                       1022.7   1019.5   -0.3
mr.1.zst                            1243.9   1236.4   -0.6
nci.1.zst                           1686.7   1657.5   -1.7
ooffice.1.zst                       940.2    939.8    -0.0
osdb.1.zst                          1207.0   1205.8   -0.1
reymont.1.zst                       1066.7   1056.2   -1.0
samba.1.zst                         1409.8   1395.1   -1.0
sao.1.zst                           958.7    962.5    +0.4
webster.1.zst                       1115.6   1104.7   -1.0
xml.1.zst                           1674.5   1639.3   -2.1
x-ray.1.zst                         1172.5   1170.3   -0.2

bib.1.zst                           1111.1   1070.0   -3.7
book1.1.zst                         1090.8   1085.8   -0.5
book2.1.zst                         1092.6   1071.6   -1.9
geo.1.zst                           1382.6   1345.5   -2.7
news.1.zst                          1118.1   1105.9   -1.1
obj1.1.zst                          1100.7   1098.3   -0.2
obj2.1.zst                          948.2    917.7    -3.2
paper1.1.zst                        1026.6   1005.1   -2.1
paper2.1.zst                        1032.1   987.6    -4.3
pic.1.zst                           1926.6   1912.3   -0.7
progc.1.zst                         1022.3   1010.4   -1.2
progl.1.zst                         1259.4   1224.8   -2.7
progp.1.zst                         1261.5   1230.6   -2.4
trans.1.zst                         1342.2   1297.7   -3.3

alice29.txt.1.zst                   980.1    941.5    -3.9
syoulik.txt.1.zst                   1048.1   1018.4   -2.8
cp.html.1.zst                       1165.3   1156.2   -0.8
fields.c.1.zst                      853.2    846.1    -0.8
grammar.lsp.1.zst                   630.8    622.9    -1.3
kennedy.xls.1.zst                   1043.4   1029.3   -1.4
lcet10.txt.1.zst                    1113.1   1086.9   -2.4
lrabn12.txt.1.zst                   1081.3   1073.2   -0.7
ptt5.1.zst                          1928.8   1911.8   -0.9
sum.1.zst                           1086.1   1063.8   -2.1
xargs.1.1.zst                       661.8    661.3    -0.1

128k buffer, clang-12
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1084.5   1078.4   -0.6
mozilla.1.zst                       1027.0   1019.9   -0.7
mr.1.zst                            1243.9   1236.9   -0.6
nci.1.zst                           1694.9   1656.1   -2.3
ooffice.1.zst                       939.4    939.8    +0.0
osdb.1.zst                          1207.4   1207.1   -0.0
reymont.1.zst                       1068.2   1056.5   -1.1
samba.1.zst                         1410.6   1398.3   -0.9
sao.1.zst                           958.1    962.9    +0.5
webster.1.zst                       1115.6   1106.2   -0.8
xml.1.zst                           1671.4   1648.7   -1.4
x-ray.1.zst                         1177.0   1173.8   -0.3

bib.1.zst                           1112.8   1092.2   -1.9
book1.1.zst                         1089.7   1096.4   +0.6
book2.1.zst                         1091.6   1082.5   -0.8
geo.1.zst                           1385.9   1383.3   -0.2
news.1.zst                          1120.4   1122.0   +0.1
obj1.1.zst                          1104.7   1099.1   -0.5
obj2.1.zst                          948.8    939.6    -1.0
paper1.1.zst                        1020.1   1011.9   -0.8
paper2.1.zst                        1031.4   1023.6   -0.8
pic.1.zst                           1929.1   1911.0   -0.9
progc.1.zst                         1023.4   1013.4   -1.0
progl.1.zst                         1253.9   1228.8   -2.0
progp.1.zst                         1255.8   1228.1   -2.2
trans.1.zst                         1333.2   1285.1   -3.6

alice29.txt.1.zst                   976.3    970.6    -0.6
syoulik.txt.1.zst                   1053.8   1051.1   -0.3
cp.html.1.zst                       1163.5   1145.0   -1.6
fields.c.1.zst                      849.7    844.0    -0.7
grammar.lsp.1.zst                   631.4    623.3    -1.3
kennedy.xls.1.zst                   1044.2   1030.8   -1.3
lcet10.txt.1.zst                    1120.5   1108.2   -1.1
lrabn12.txt.1.zst                   1078.9   1078.9   +0.0
ptt5.1.zst                          1926.3   1910.9   -0.8
sum.1.zst                           1085.5   1071.7   -1.3
xargs.1.1.zst                       664.7    657.8    -1.0

@binhdvo
Copy link
Contributor Author

binhdvo commented Oct 19, 2021

Moved the split branch up from where it would result in inlined paths on Nick's suggestion and performance looks much better across both the 16k and 128k cases now on Yann's server; there is a failure in the fuzz and regression tests I am still trying to debug as well as getting benchmarks on Nick's server.

16k buffer, gcc-7
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1031.9   1063.1   +3.0    
mozilla.1.zst                       977.4    979.8    +0.2    
mr.1.zst                            1204.3   1225.4   +1.8    
nci.1.zst                           1596.5   1620.9   +1.5    
ooffice.1.zst                       918.5    921.9    +0.4    
osdb.1.zst                          1159.9   1154.3   -0.5    
reymont.1.zst                       1000.3   1029.7   +2.9    
samba.1.zst                         1334.4   1354.4   +1.5    
sao.1.zst                           952.3    954.0    +0.2    
webster.1.zst                       1052.8   1079.3   +2.5    
xml.1.zst                           1564.0   1595.5   +2.0    
x-ray.1.zst                         1168.2   1170.4   +0.2    

bib.1.zst                           1002.8   1037.5   +3.5    
book1.1.zst                         1044.3   1074.1   +2.9    
book2.1.zst                         1025.3   1059.4   +3.3    
geo.1.zst                           1300.6   1344.2   +3.4    
news.1.zst                          1073.7   1088.8   +1.4    
obj1.1.zst                          1041.4   1071.6   +2.9    
obj2.1.zst                          896.2    907.9    +1.3    
paper1.1.zst                        895.8    982.0    +9.6    
paper2.1.zst                        940.5    962.6    +2.3    
pic.1.zst                           1850.3   1875.0   +1.3    
progc.1.zst                         937.6    978.0    +4.3    
progl.1.zst                         1114.9   1142.3   +2.5    
progp.1.zst                         1123.2   1164.9   +3.7    
trans.1.zst                         1115.6   1205.0   +8.0    

alice29.txt.1.zst                   897.2    916.6    +2.2    
syoulik.txt.1.zst                   975.3    982.9    +0.8    
cp.html.1.zst                       913.9    1106.4   +21.1   
fields.c.1.zst                      726.5    813.2    +11.9   
grammar.lsp.1.zst                   525.8    603.8    +14.8   
kennedy.xls.1.zst                   934.3    926.0    -0.9    
lcet10.txt.1.zst                    1044.2   1067.6   +2.2    
lrabn12.txt.1.zst                   1028.8   1064.4   +3.5    
ptt5.1.zst                          1850.5   1876.2   +1.4    
sum.1.zst                           883.0    1021.7   +15.7   
xargs.1.1.zst                       622.6    630.4    +1.3    


128k buffer, gcc-7
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1035.1   1066.3   +3.0    
mozilla.1.zst                       976.4    981.9    +0.6    
mr.1.zst                            1201.5   1223.0   +1.8    
nci.1.zst                           1592.1   1581.7   -0.7    
ooffice.1.zst                       918.8    922.7    +0.4    
osdb.1.zst                          1160.1   1155.4   -0.4    
reymont.1.zst                       995.9    1027.9   +3.2    
samba.1.zst                         1333.8   1352.4   +1.4    
sao.1.zst                           951.9    955.4    +0.4    
webster.1.zst                       1054.6   1078.9   +2.3    
xml.1.zst                           1578.5   1594.9   +1.0    
x-ray.1.zst                         1169.8   1169.6   -0.0    

bib.1.zst                           1018.5   1032.2   +1.3    
book1.1.zst                         1047.2   1079.5   +3.1    
book2.1.zst                         1023.4   1064.7   +4.0    
geo.1.zst                           1301.5   1378.7   +5.9    
news.1.zst                          1074.3   1099.8   +2.4    
obj1.1.zst                          920.7    1072.8   +16.5   
obj2.1.zst                          896.4    907.2    +1.2    
paper1.1.zst                        946.4    922.4    -2.5    
paper2.1.zst                        965.2    975.1    +1.0    
pic.1.zst                           1873.7   1873.1   -0.0    
progc.1.zst                         937.7    976.8    +4.2    
progl.1.zst                         1045.5   1165.3   +11.5   
progp.1.zst                         1010.0   1165.1   +15.4   
trans.1.zst                         1111.0   1206.7   +8.6    

alice29.txt.1.zst                   895.4    957.0    +6.9    
syoulik.txt.1.zst                   991.5    1037.6   +4.6    
cp.html.1.zst                       1001.9   1098.5   +9.6    
fields.c.1.zst                      721.7    808.2    +12.0   
grammar.lsp.1.zst                   522.9    606.4    +16.0   
kennedy.xls.1.zst                   936.3    926.0    -1.1    
lcet10.txt.1.zst                    1054.9   1094.8   +3.8    
lrabn12.txt.1.zst                   1028.7   1067.1   +3.7    
ptt5.1.zst                          1850.9   1874.9   +1.3    
sum.1.zst                           997.0    890.9    -10.6   
xargs.1.1.zst                       624.0    567.8    -9.0    


16k buffer, gcc-8
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1066.5   1063.5   -0.3    
mozilla.1.zst                       995.7    1001.7   +0.6    
mr.1.zst                            1232.8   1236.4   +0.3    
nci.1.zst                           1643.7   1662.1   +1.1    
ooffice.1.zst                       936.5    941.2    +0.5    
osdb.1.zst                          1216.5   1216.3   -0.0    
reymont.1.zst                       1031.7   1032.6   +0.1    
samba.1.zst                         1386.0   1393.2   +0.5    
sao.1.zst                           974.3    976.1    +0.2    
webster.1.zst                       1103.2   1104.7   +0.1    
xml.1.zst                           1646.7   1667.9   +1.3    
x-ray.1.zst                         1171.0   1171.0   +0.0    

bib.1.zst                           1069.5   1025.1   -4.2    
book1.1.zst                         1088.0   1068.6   -1.8    
book2.1.zst                         1067.6   1053.1   -1.4    
geo.1.zst                           1384.6   1347.9   -2.7    
news.1.zst                          1126.5   1090.2   -3.2    
obj1.1.zst                          1084.6   1085.4   +0.1    
obj2.1.zst                          921.8    898.4    -2.5    
paper1.1.zst                        979.1    989.2    +1.0    
paper2.1.zst                        998.9    951.6    -4.7    
pic.1.zst                           1905.2   1930.0   +1.3    
progc.1.zst                         991.5    995.9    +0.4    
progl.1.zst                         1223.0   1223.6   +0.0    
progp.1.zst                         1228.1   1220.2   -0.6    
trans.1.zst                         1281.8   1305.5   +1.8    

alice29.txt.1.zst                   948.1    897.0    -5.4    
syoulik.txt.1.zst                   1027.5   942.8    -8.2    
cp.html.1.zst                       1133.1   1128.1   -0.4    
fields.c.1.zst                      821.2    830.2    +1.1    
grammar.lsp.1.zst                   603.4    609.4    +1.0    
kennedy.xls.1.zst                   968.4    968.5    +0.0    
lcet10.txt.1.zst                    1091.8   1062.8   -2.7    
lrabn12.txt.1.zst                   1075.2   1052.9   -2.1    
ptt5.1.zst                          1907.1   1929.9   +1.2    
sum.1.zst                           1050.3   1058.2   +0.8    
xargs.1.1.zst                       634.8    650.4    +2.5    


128k buffer, gcc-8
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1066.5   1062.7   -0.4    
mozilla.1.zst                       995.1    1000.6   +0.6    
mr.1.zst                            1234.6   1234.6   +0.0    
nci.1.zst                           1645.5   1648.9   +0.2    
ooffice.1.zst                       938.7    942.0    +0.4    
osdb.1.zst                          1218.8   1194.6   -2.0    
reymont.1.zst                       1033.6   1030.9   -0.3    
samba.1.zst                         1385.1   1388.5   +0.2    
sao.1.zst                           975.7    976.0    +0.0    
webster.1.zst                       1103.7   1104.8   +0.1    
xml.1.zst                           1645.3   1668.4   +1.4    
x-ray.1.zst                         1164.6   1172.1   +0.6    

bib.1.zst                           1042.0   1082.2   +3.9    
book1.1.zst                         1081.8   1077.3   -0.4    
book2.1.zst                         1061.1   1065.7   +0.4    
geo.1.zst                           1325.1   1325.7   +0.0    
news.1.zst                          1127.4   1117.9   -0.8    
obj1.1.zst                          1085.0   1088.1   +0.3    
obj2.1.zst                          924.2    926.5    +0.2    
paper1.1.zst                        987.3    988.0    +0.1    
paper2.1.zst                        1003.8   1001.4   -0.2    
pic.1.zst                           1908.8   1928.4   +1.0    
progc.1.zst                         993.2    997.4    +0.4    
progl.1.zst                         1227.3   1220.5   -0.6    
progp.1.zst                         1225.6   1216.6   -0.7    
trans.1.zst                         1287.6   1303.5   +1.2    

alice29.txt.1.zst                   945.7    940.2    -0.6    
syoulik.txt.1.zst                   1027.2   1027.5   +0.0    
cp.html.1.zst                       1132.3   1128.0   -0.4    
fields.c.1.zst                      818.7    828.1    +1.1    
grammar.lsp.1.zst                   602.8    616.5    +2.3    
kennedy.xls.1.zst                   968.1    968.4    +0.0    
lcet10.txt.1.zst                    1090.6   1092.4   +0.2    
lrabn12.txt.1.zst                   1073.2   1066.4   -0.6    
ptt5.1.zst                          1904.6   1930.4   +1.4    
sum.1.zst                           1048.8   1059.8   +1.0    
xargs.1.1.zst                       633.6    647.9    +2.3    


16k buffer, gcc-9
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1069.4   1071.8   +0.2    
mozilla.1.zst                       1018.8   1022.7   +0.4    
mr.1.zst                            1224.5   1230.0   +0.4    
nci.1.zst                           1689.1   1691.3   +0.1    
ooffice.1.zst                       945.2    946.6    +0.1    
osdb.1.zst                          1217.6   1220.5   +0.2    
reymont.1.zst                       1042.9   1049.9   +0.7    
samba.1.zst                         1404.0   1409.7   +0.4    
sao.1.zst                           973.8    973.8    +0.0    
webster.1.zst                       1110.6   1114.4   +0.3    
xml.1.zst                           1674.3   1679.0   +0.3    
x-ray.1.zst                         1170.2   1170.5   +0.0    

bib.1.zst                           1098.8   1054.0   -4.1    
book1.1.zst                         1076.9   1076.7   -0.0    
book2.1.zst                         1073.7   1070.3   -0.3    
geo.1.zst                           1380.1   1335.2   -3.3    
news.1.zst                          1117.2   1097.8   -1.7    
obj1.1.zst                          1096.5   1093.9   -0.2    
obj2.1.zst                          938.0    916.4    -2.3    
paper1.1.zst                        991.5    1004.6   +1.3    
paper2.1.zst                        1008.7   989.7    -1.9    
pic.1.zst                           1944.5   1945.0   +0.0    
progc.1.zst                         1003.0   1008.2   +0.5    
progl.1.zst                         1227.7   1241.0   +1.1    
progp.1.zst                         1223.8   1228.4   +0.4    
trans.1.zst                         1319.8   1321.0   +0.1    

alice29.txt.1.zst                   955.7    930.7    -2.6    
syoulik.txt.1.zst                   1035.9   990.8    -4.4    
cp.html.1.zst                       1150.1   1140.7   -0.8    
fields.c.1.zst                      836.7    833.5    -0.4    
grammar.lsp.1.zst                   612.6    604.3    -1.4    
kennedy.xls.1.zst                   1013.3   1018.2   +0.5    
lcet10.txt.1.zst                    1096.5   1084.9   -1.1    
lrabn12.txt.1.zst                   1065.6   1062.6   -0.3    
ptt5.1.zst                          1947.3   1951.1   +0.2    
sum.1.zst                           1054.9   1067.4   +1.2    
xargs.1.1.zst                       642.3    641.6    -0.1    

128k buffer, gcc-9
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1071.0   1071.8   +0.1    
mozilla.1.zst                       1020.3   1024.1   +0.4    
mr.1.zst                            1228.7   1231.4   +0.2    
nci.1.zst                           1692.1   1690.6   -0.1    
ooffice.1.zst                       946.3    945.5    -0.1    
osdb.1.zst                          1214.5   1222.1   +0.6    
reymont.1.zst                       1045.4   1049.6   +0.4    
samba.1.zst                         1402.7   1410.9   +0.6    
sao.1.zst                           974.6    975.6    +0.1    
webster.1.zst                       1108.7   1114.6   +0.5    
xml.1.zst                           1673.7   1673.3   -0.0    
x-ray.1.zst                         1169.0   1169.8   +0.1    

bib.1.zst                           1097.1   1099.7   +0.2    
book1.1.zst                         1078.0   1086.0   +0.7    
book2.1.zst                         1072.0   1078.4   +0.6    
geo.1.zst                           1381.3   1380.4   -0.1    
news.1.zst                          1115.2   1121.1   +0.5    
obj1.1.zst                          1097.2   1094.3   -0.3    
obj2.1.zst                          934.5    937.2    +0.3    
paper1.1.zst                        994.9    998.4    +0.4    
paper2.1.zst                        1009.4   1016.0   +0.7    
pic.1.zst                           1945.9   1947.6   +0.1    
progc.1.zst                         1001.5   1007.7   +0.6    
progl.1.zst                         1233.3   1247.9   +1.2    
progp.1.zst                         1228.2   1236.4   +0.7    
trans.1.zst                         1324.9   1328.9   +0.3    

alice29.txt.1.zst                   954.8    959.8    +0.5    
syoulik.txt.1.zst                   1038.3   1041.6   +0.3    
cp.html.1.zst                       1147.8   1139.8   -0.7    
fields.c.1.zst                      836.4    829.2    -0.9    
grammar.lsp.1.zst                   613.7    604.4    -1.5    
kennedy.xls.1.zst                   1006.3   1017.9   +1.2    
lcet10.txt.1.zst                    1097.5   1095.8   -0.2    
lrabn12.txt.1.zst                   1065.8   1073.4   +0.7    
ptt5.1.zst                          1946.8   1948.9   +0.1    
sum.1.zst                           1058.5   1069.4   +1.0    
xargs.1.1.zst                       642.9    638.1    -0.7    


16k buffer, gcc-10
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1086.7   1077.7   -0.8    
mozilla.1.zst                       1034.5   1031.7   -0.3    
mr.1.zst                            1240.5   1238.1   -0.2    
nci.1.zst                           1712.4   1700.1   -0.7    
ooffice.1.zst                       964.1    959.8    -0.4    
osdb.1.zst                          1229.2   1227.6   -0.1    
reymont.1.zst                       1051.7   1053.6   +0.2    
samba.1.zst                         1418.4   1411.5   -0.5    
sao.1.zst                           986.4    983.3    -0.3    
webster.1.zst                       1123.8   1118.9   -0.4    
xml.1.zst                           1690.4   1685.0   -0.3    
x-ray.1.zst                         1169.9   1170.7   +0.1    

bib.1.zst                           1111.6   1041.8   -6.3    
book1.1.zst                         1103.9   1089.6   -1.3    
book2.1.zst                         1081.8   1067.8   -1.3    
geo.1.zst                           1381.7   1354.4   -2.0    
news.1.zst                          1135.6   1101.7   -3.0    
obj1.1.zst                          1097.9   1106.9   +0.8    
obj2.1.zst                          951.7    919.0    -3.4    
paper1.1.zst                        1011.8   1000.9   -1.1    
paper2.1.zst                        1023.4   969.8    -5.2    
pic.1.zst                           1977.5   1967.6   -0.5    
progc.1.zst                         1016.0   1016.3   +0.0    
progl.1.zst                         1257.3   1248.2   -0.7    
progp.1.zst                         1247.3   1239.3   -0.6    
trans.1.zst                         1347.4   1343.2   -0.3    

alice29.txt.1.zst                   963.7    920.8    -4.5    
syoulik.txt.1.zst                   1050.1   994.9    -5.3    
cp.html.1.zst                       1145.4   1161.8   +1.4    
fields.c.1.zst                      836.8    839.9    +0.4    
grammar.lsp.1.zst                   611.0    613.9    +0.5    
kennedy.xls.1.zst                   1034.6   1032.1   -0.2    
lcet10.txt.1.zst                    1106.9   1079.2   -2.5    
lrabn12.txt.1.zst                   1087.9   1066.1   -2.0    
ptt5.1.zst                          1974.2   1964.0   -0.5    
sum.1.zst                           1075.1   1069.3   -0.5    
xargs.1.1.zst                       643.0    649.2    +1.0    


128k buffer, gcc-10
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1085.9   1080.6   -0.5    
mozilla.1.zst                       1035.9   1033.9   -0.2    
mr.1.zst                            1243.9   1238.2   -0.5    
nci.1.zst                           1710.4   1684.0   -1.5    
ooffice.1.zst                       963.4    960.3    -0.3    
osdb.1.zst                          1232.0   1230.3   -0.1    
reymont.1.zst                       1055.5   1052.6   -0.3    
samba.1.zst                         1417.9   1413.3   -0.3    
sao.1.zst                           986.5    983.3    -0.3    
webster.1.zst                       1127.5   1122.6   -0.4    
xml.1.zst                           1695.4   1684.2   -0.7    
x-ray.1.zst                         1165.7   1171.2   +0.5    

bib.1.zst                           1111.5   1112.7   +0.1    
book1.1.zst                         1103.4   1105.1   +0.2    
book2.1.zst                         1084.9   1080.1   -0.4    
geo.1.zst                           1381.2   1385.3   +0.3    
news.1.zst                          1134.4   1133.6   -0.1    
obj1.1.zst                          1100.8   1099.4   -0.1    
obj2.1.zst                          950.1    951.1    +0.1    
paper1.1.zst                        1009.4   999.9    -0.9    
paper2.1.zst                        1022.0   988.4    -3.3    
pic.1.zst                           1969.1   1974.3   +0.3    
progc.1.zst                         1017.8   1017.5   -0.0    
progl.1.zst                         1258.9   1247.0   -0.9    
progp.1.zst                         1247.0   1236.2   -0.9    
trans.1.zst                         1347.3   1340.9   -0.5    

alice29.txt.1.zst                   954.4    953.2    -0.1    
syoulik.txt.1.zst                   1048.1   1048.2   +0.0    
cp.html.1.zst                       1143.6   1159.1   +1.4    
fields.c.1.zst                      837.4    837.8    +0.0    
grammar.lsp.1.zst                   611.6    617.1    +0.9    
kennedy.xls.1.zst                   1035.2   1031.6   -0.3    
lcet10.txt.1.zst                    1108.2   1103.5   -0.4    
lrabn12.txt.1.zst                   1086.7   1085.4   -0.1    
ptt5.1.zst                          1972.0   1969.0   -0.2    
sum.1.zst                           1076.7   1066.9   -0.9    
xargs.1.1.zst                       642.7    645.9    +0.5    


16k buffer, clang-12
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1084.1   1080.6   -0.3    
mozilla.1.zst                       1024.3   1025.9   +0.2    
mr.1.zst                            1240.5   1235.2   -0.4    
nci.1.zst                           1695.6   1691.8   -0.2    
ooffice.1.zst                       940.5    938.3    -0.2    
osdb.1.zst                          1208.4   1205.4   -0.2    
reymont.1.zst                       1066.6   1068.2   +0.2    
samba.1.zst                         1411.9   1409.6   -0.2    
sao.1.zst                           958.1    955.9    -0.2    
webster.1.zst                       1115.2   1113.7   -0.1    
xml.1.zst                           1676.0   1671.9   -0.2    
x-ray.1.zst                         1173.1   1172.0   -0.1    

bib.1.zst                           1111.5   1060.4   -4.6    
book1.1.zst                         1086.9   1077.1   -0.9    
book2.1.zst                         1091.0   1083.3   -0.7    
geo.1.zst                           1381.2   1346.7   -2.5    
news.1.zst                          1120.7   1108.7   -1.1    
obj1.1.zst                          1100.3   1103.4   +0.3    
obj2.1.zst                          948.2    923.0    -2.7    
paper1.1.zst                        1021.7   1016.2   -0.5    
paper2.1.zst                        1028.1   988.7    -3.8    
pic.1.zst                           1929.9   1933.3   +0.2    
progc.1.zst                         1026.0   1023.0   -0.3    
progl.1.zst                         1260.2   1257.4   -0.2    
progp.1.zst                         1258.0   1259.9   +0.2    
trans.1.zst                         1332.3   1335.7   +0.3    

alice29.txt.1.zst                   980.4    948.5    -3.3    
syoulik.txt.1.zst                   1052.2   1028.6   -2.2    
cp.html.1.zst                       1166.2   1170.8   +0.4   
fields.c.1.zst                      854.9    847.2    -0.9    
grammar.lsp.1.zst                   631.7    628.2    -0.6    
kennedy.xls.1.zst                   1044.9   1044.2   -0.1    
lcet10.txt.1.zst                    1117.2   1091.7   -2.3    
lrabn12.txt.1.zst                   1079.9   1070.8   -0.8    
ptt5.1.zst                          1927.8   1929.5   +0.1    
sum.1.zst                           1088.8   1082.9   -0.5    
xargs.1.1.zst                       663.0    666.8    +0.6    

128k buffer, clang-12
Decompression Mb/s
                                    orig     new      diff%

dickens.1.zst                       1083.8   1077.7   -0.6    
mozilla.1.zst                       1026.7   1025.4   -0.1    
mr.1.zst                            1240.0   1238.3   -0.1    
nci.1.zst                           1692.9   1675.5   -1.0    
ooffice.1.zst                       940.4    938.5    -0.2    
osdb.1.zst                          1204.9   1203.6   -0.1    
reymont.1.zst                       1067.0   1064.8   -0.2    
samba.1.zst                         1409.3   1408.2   -0.1    
sao.1.zst                           958.3    957.8    -0.1    
webster.1.zst                       1114.7   1110.9   -0.3    
xml.1.zst                           1673.6   1668.9   -0.3    
x-ray.1.zst                         1172.0   1172.9   +0.1    

bib.1.zst                           1107.9   1112.4   +0.4    
book1.1.zst                         1083.0   1084.8   +0.2    
book2.1.zst                         1086.2   1082.5   -0.3    
geo.1.zst                           1382.5   1363.4   -1.4    
news.1.zst                          1121.5   1115.3   -0.6    
obj1.1.zst                          1106.8   1044.7   -5.6    
obj2.1.zst                          946.8    942.8    -0.4    
paper1.1.zst                        1023.9   947.3    -7.5    
paper2.1.zst                        1033.9   997.1    -3.6    
pic.1.zst                           1923.0   1934.7   +0.6    
progc.1.zst                         928.2    1021.5   +10.1   
progl.1.zst                         1126.9   1255.7   +11.4   
progp.1.zst                         1088.9   1256.6   +15.4   
trans.1.zst                         1338.2   1182.8   -11.6   

alice29.txt.1.zst                   981.7    976.6    -0.5    
syoulik.txt.1.zst                   1054.2   1024.1   -2.9    
cp.html.1.zst                       1166.1   1169.5   +0.3    
fields.c.1.zst                      843.1    856.5    +1.6    
grammar.lsp.1.zst                   629.6    629.9    +0.0    
kennedy.xls.1.zst                   1043.6   1035.4   -0.8    
lcet10.txt.1.zst                    1116.4   1097.0   -1.7    
lrabn12.txt.1.zst                   1078.3   1072.4   -0.5    
ptt5.1.zst                          1929.1   1924.4   -0.2    
sum.1.zst                           1085.9   947.8    -12.7   
xargs.1.1.zst                       664.2    657.2    -1.1    

@binhdvo
Copy link
Contributor Author

binhdvo commented Oct 21, 2021

After fixes and some testing, the 32k buffer case looks performant for the most part and still gets a factor 4 memory improvement, except for on level 22 enwik8/9 on the full benchmark on the dev server for gcc and clang.par where it gets some significant regressions (this is the case that exercises the long match loop). It is noteworthy that the same case gets significant benefits on gcc.par. Increasing the lit extra buffer does not address this, I am continuing to investigate.

16k buffer, gcc-7 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1028.7   1033.7   +0.5    
mozilla.1.zst                       974.6    970.5    -0.4    
mr.1.zst                            1201.8   1197.7   -0.3    
nci.1.zst                           1588.6   1591.4   +0.2    
ooffice.1.zst                       918.3    913.1    -0.6    
osdb.1.zst                          1159.4   1157.5   -0.2    
reymont.1.zst                       995.2    1008.3   +1.3    
samba.1.zst                         1329.6   1332.4   +0.2    
sao.1.zst                           951.7    947.6    -0.4    
webster.1.zst                       1054.9   1057.6   +0.3    
xml.1.zst                           1577.0   1576.9   -0.0    
x-ray.1.zst                         1170.4   1168.0   -0.2    

bib.1.zst                           1020.1   1024.3   +0.4    
book1.1.zst                         1049.2   1049.6   +0.0    
book2.1.zst                         1031.5   1028.4   -0.3    
geo.1.zst                           1373.7   1340.2   -2.4    
news.1.zst                          1079.4   1079.2   -0.0    
obj1.1.zst                          1058.9   1067.2   +0.8    
obj2.1.zst                          904.2    903.0    -0.1    
paper1.1.zst                        941.1    950.7    +1.0    
paper2.1.zst                        964.6    960.9    -0.4    
pic.1.zst                           1874.2   1849.5   -1.3    
progc.1.zst                         933.4    944.2    +1.2    
progl.1.zst                         1115.5   1055.6   -5.4    
progp.1.zst                         1115.1   1154.2   +3.5    
trans.1.zst                         1170.7   1224.0   +4.6    

alice29.txt.1.zst                   912.8    907.8    -0.5    
syoulik.txt.1.zst                   988.9    977.4    -1.2    
cp.html.1.zst                       1054.5   1075.3   +2.0    
fields.c.1.zst                      780.8    787.5    +0.9    
grammar.lsp.1.zst                   598.4    596.2    -0.4    
kennedy.xls.1.zst                   942.7    912.0    -3.3    
lcet10.txt.1.zst                    1050.7   1047.6   -0.3    
lrabn12.txt.1.zst                   1035.8   1036.4   +0.1    
ptt5.1.zst                          1876.1   1849.3   -1.4    
sum.1.zst                           990.4    1006.4   +1.6    
xargs.1.1.zst                       618.8    621.6    +0.5    

32k buffer, gcc-7 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1031.3   1037.4   +0.6    
mozilla.1.zst                       975.1    971.5    -0.4    
mr.1.zst                            1202.3   1200.0   -0.2    
nci.1.zst                           1589.1   1588.5   -0.0    
ooffice.1.zst                       917.7    912.6    -0.6    
osdb.1.zst                          1160.3   1156.3   -0.3    
reymont.1.zst                       994.9    1008.4   +1.4    
samba.1.zst                         1330.1   1329.1   -0.1    
sao.1.zst                           952.8    946.1    -0.7    
webster.1.zst                       1055.9   1059.7   +0.4    
xml.1.zst                           1572.6   1578.0   +0.3    
x-ray.1.zst                         1167.9   1169.8   +0.2    

bib.1.zst                           1015.7   1035.8   +2.0    
book1.1.zst                         1046.4   1050.6   +0.4    
book2.1.zst                         1031.7   1032.2   +0.0    
geo.1.zst                           1379.4   1337.8   -3.0    
news.1.zst                          1082.8   1076.4   -0.6    
obj1.1.zst                          1059.2   1066.8   +0.7    
obj2.1.zst                          904.0    903.4    -0.1    
paper1.1.zst                        943.0    951.2    +0.9    
paper2.1.zst                        963.8    962.1    -0.2    
pic.1.zst                           1875.6   1845.8   -1.6    
progc.1.zst                         935.8    943.4    +0.8    
progl.1.zst                         1113.7   1153.4   +3.6    
progp.1.zst                         1117.8   1153.0   +3.1    
trans.1.zst                         1170.6   1220.2   +4.2    

alice29.txt.1.zst                   911.9    914.5    +0.3    
syoulik.txt.1.zst                   991.0    989.2    -0.2    
cp.html.1.zst                       1053.6   1073.7   +1.9    
fields.c.1.zst                      781.4    786.5    +0.7    
grammar.lsp.1.zst                   599.5    598.4    -0.2    
kennedy.xls.1.zst                   942.2    914.2    -3.0    
lcet10.txt.1.zst                    1052.5   1052.5   +0.0    
lrabn12.txt.1.zst                   1036.1   1035.9   -0.0    
ptt5.1.zst                          1871.9   1846.0   -1.4    
sum.1.zst                           991.6    1007.3   +1.6    
xargs.1.1.zst                       619.0    621.9    +0.5    

128k buffer, gcc-7 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1030.7   1028.2   -0.2    
mozilla.1.zst                       974.7    971.0    -0.4    
mr.1.zst                            1201.6   1198.7   -0.2    
nci.1.zst                           1587.9   1592.0   +0.3    
ooffice.1.zst                       917.9    912.0    -0.6    
osdb.1.zst                          1160.3   1157.9   -0.2    
reymont.1.zst                       991.8    1005.8   +1.4    
samba.1.zst                         1328.9   1330.8   +0.1    
sao.1.zst                           951.4    946.7    -0.5    
webster.1.zst                       1054.2   1061.9   +0.7    
xml.1.zst                           1574.2   1578.4   +0.3    
x-ray.1.zst                         1167.2   1169.2   +0.2    

bib.1.zst                           1015.8   1042.2   +2.6    
book1.1.zst                         1047.6   1047.0   -0.1    
book2.1.zst                         1030.1   1031.6   +0.1    
geo.1.zst                           1377.3   1375.8   -0.1    
news.1.zst                          1083.4   1083.4   +0.0    
obj1.1.zst                          1059.2   1064.4   +0.5    
obj2.1.zst                          903.0    904.8    +0.2    
paper1.1.zst                        943.4    950.0    +0.7    
paper2.1.zst                        963.9    972.0    +0.8    
pic.1.zst                           1871.5   1850.2   -1.1    
progc.1.zst                         935.6    944.0    +0.9    
progl.1.zst                         1114.0   1150.6   +3.3    
progp.1.zst                         1118.4   1153.0   +3.1    
trans.1.zst                         1172.1   1224.7   +4.5    

alice29.txt.1.zst                   908.9    913.9    +0.6    
syoulik.txt.1.zst                   990.8    995.8    +0.5    
cp.html.1.zst                       1052.1   1073.4   +2.0   
fields.c.1.zst                      783.4    784.2    +0.1    
grammar.lsp.1.zst                   599.2    599.1    -0.0    
kennedy.xls.1.zst                   935.5    914.2    -2.3    
lcet10.txt.1.zst                    1052.7   1053.2   +0.0    
lrabn12.txt.1.zst                   1028.3   1035.8   +0.7    
ptt5.1.zst                          1848.0   1849.7   +0.1    
sum.1.zst                           990.7    1006.9   +1.6    
xargs.1.1.zst                       619.1    618.6    -0.1    

16k buffer, gcc-8 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1064.2   1061.6   -0.2    
mozilla.1.zst                       994.4    1001.2   +0.7    
mr.1.zst                            1238.8   1233.6   -0.4    
nci.1.zst                           1644.8   1662.1   +1.1    
ooffice.1.zst                       938.6    941.2    +0.3    
osdb.1.zst                          1218.7   1212.0   -0.5    
reymont.1.zst                       1033.0   1035.2   +0.2    
samba.1.zst                         1385.7   1393.7   +0.6    
sao.1.zst                           974.4    974.2    -0.0    
webster.1.zst                       1103.3   1103.8   +0.0    
xml.1.zst                           1650.1   1670.5   +1.2    
x-ray.1.zst                         1167.0   1173.1   +0.5    

bib.1.zst                           1072.6   1035.7   -3.4    
book1.1.zst                         1088.9   1065.3   -2.2    
book2.1.zst                         1067.9   1054.8   -1.2    
geo.1.zst                           1385.8   1341.1   -3.2    
news.1.zst                          1128.3   1090.4   -3.4    
obj1.1.zst                          1082.6   1076.3   -0.6    
obj2.1.zst                          922.3    900.1    -2.4    
paper1.1.zst                        990.3    983.4    -0.7    
paper2.1.zst                        1000.6   950.8    -5.0    
pic.1.zst                           1907.0   1930.2   +1.2    
progc.1.zst                         996.2    999.5    +0.3    
progl.1.zst                         1221.0   1226.8   +0.5    
progp.1.zst                         1214.3   1212.9   -0.1    
trans.1.zst                         1286.5   1302.4   +1.2    

alice29.txt.1.zst                   949.6    897.0    -5.5    
syoulik.txt.1.zst                   1028.8   950.8    -7.6    
cp.html.1.zst                       1130.3   1130.2   -0.0    
fields.c.1.zst                      820.9    827.0    +0.7    
grammar.lsp.1.zst                   609.5    604.9    -0.8    
kennedy.xls.1.zst                   968.3    965.2    -0.3    
lcet10.txt.1.zst                    1091.8   1060.0   -2.9    
lrabn12.txt.1.zst                   1076.0   1049.7   -2.4    
ptt5.1.zst                          1904.7   1927.4   +1.2    
sum.1.zst                           1045.6   1052.1   +0.6    
xargs.1.1.zst                       640.3    640.4    +0.0    

32k buffer, gcc-8 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1065.7   1061.2   -0.4    
mozilla.1.zst                       993.0    1001.4   +0.8    
mr.1.zst                            1231.6   1235.4   +0.3    
nci.1.zst                           1645.0   1664.6   +1.2    
ooffice.1.zst                       937.4    941.2    +0.4    
osdb.1.zst                          1217.7   1214.6   -0.3    
reymont.1.zst                       1032.0   1037.8   +0.6    
samba.1.zst                         1387.7   1393.9   +0.4    
sao.1.zst                           974.7    974.4    -0.0    
webster.1.zst                       1102.9   1105.3   +0.2    
xml.1.zst                           1649.4   1672.0   +1.4    
x-ray.1.zst                         1171.1   1171.0   -0.0    

bib.1.zst                           1038.4   1082.5   +4.2    
book1.1.zst                         1088.4   1067.9   -1.9    
book2.1.zst                         1067.9   1065.0   -0.3    
geo.1.zst                           1382.8   1343.3   -2.9    
news.1.zst                          1127.3   1096.3   -2.7    
obj1.1.zst                          1075.2   1076.1   +0.1    
obj2.1.zst                          922.1    924.9    +0.3    
paper1.1.zst                        988.5    984.7    -0.4    
paper2.1.zst                        1001.8   997.6    -0.4    
pic.1.zst                           1907.8   1928.4   +1.1    
progc.1.zst                         994.8    995.9    +0.1    
progl.1.zst                         1227.3   1226.9   -0.0    
progp.1.zst                         1223.9   1213.1   -0.9    
trans.1.zst                         1290.8   1307.3   +1.3    

alice29.txt.1.zst                   944.1    923.1    -2.2    
syoulik.txt.1.zst                   1029.8   972.7    -5.5    
cp.html.1.zst                       1127.4   1131.4   +0.4    
fields.c.1.zst                      823.0    825.2    +0.3    
grammar.lsp.1.zst                   601.3    604.9    +0.6    
kennedy.xls.1.zst                   968.0    966.4    -0.2    
lcet10.txt.1.zst                    1087.4   1072.1   -1.4    
lrabn12.txt.1.zst                   1077.9   1059.1   -1.7    
ptt5.1.zst                          1903.8   1928.2   +1.3    
sum.1.zst                           1043.1   1052.1   +0.9    
xargs.1.1.zst                       641.0    640.5    -0.1    

128k buffer, gcc-8 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1066.0   1062.4   -0.3    
mozilla.1.zst                       992.9    998.6    +0.6    
mr.1.zst                            1231.4   1236.8   +0.4    
nci.1.zst                           1645.4   1649.8   +0.3    
ooffice.1.zst                       936.3    941.2    +0.5    
osdb.1.zst                          1217.2   1213.7   -0.3    
reymont.1.zst                       1032.5   1035.1   +0.3    
samba.1.zst                         1386.5   1394.8   +0.6    
sao.1.zst                           973.4    974.5    +0.1    
webster.1.zst                       1101.3   1101.7   +0.0    
xml.1.zst                           1644.2   1667.0   +1.4    
x-ray.1.zst                         1169.7   1172.6   +0.2    

bib.1.zst                           1073.4   1083.9   +1.0    
book1.1.zst                         1087.2   1079.4   -0.7    
book2.1.zst                         1065.0   1065.9   +0.1    
geo.1.zst                           1377.9   1379.1   +0.1    
news.1.zst                          1120.9   1112.9   -0.7    
obj1.1.zst                          1081.1   1078.5   -0.2    
obj2.1.zst                          920.2    924.3    +0.4    
paper1.1.zst                        989.3    989.7    +0.0    
paper2.1.zst                        1003.0   999.7    -0.3    
pic.1.zst                           1902.9   1930.3   +1.4    
progc.1.zst                         991.4    1000.7   +0.9    
progl.1.zst                         1223.7   1226.8   +0.3    
progp.1.zst                         1215.7   1218.5   +0.2    
trans.1.zst                         1287.3   1303.4   +1.3    

alice29.txt.1.zst                   925.7    943.0    +1.9    
syoulik.txt.1.zst                   1020.8   1029.8   +0.9    
cp.html.1.zst                       1129.4   1129.2   -0.0    
fields.c.1.zst                      821.2    823.8    +0.3    
grammar.lsp.1.zst                   608.6    607.6    -0.2    
kennedy.xls.1.zst                   969.2    968.0    -0.1    
lcet10.txt.1.zst                    1091.1   1084.9   -0.6    
lrabn12.txt.1.zst                   1071.4   1064.9   -0.6    
ptt5.1.zst                          1908.7   1928.3   +1.0    
sum.1.zst                           1043.9   1050.1   +0.6    
xargs.1.1.zst                       640.5    637.3    -0.5    

16k buffer, gcc-9 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1066.7   1071.6   +0.5    
mozilla.1.zst                       1018.9   1021.3   +0.2    
mr.1.zst                            1225.2   1228.7   +0.3    
nci.1.zst                           1688.0   1699.6   +0.7    
ooffice.1.zst                       944.2    947.2    +0.3    
osdb.1.zst                          1216.3   1219.4   +0.3    
reymont.1.zst                       1043.6   1053.5   +0.9    
samba.1.zst                         1401.7   1404.1   +0.2    
sao.1.zst                           972.9    974.2    +0.1    
webster.1.zst                       1107.6   1112.4   +0.4    
xml.1.zst                           1671.9   1680.7   +0.5    
x-ray.1.zst                         1167.3   1168.5   +0.1    

bib.1.zst                           1095.7   1061.6   -3.1    
book1.1.zst                         1080.0   1077.0   -0.3    
book2.1.zst                         1071.7   1070.0   -0.2    
geo.1.zst                           1379.6   1343.0   -2.7    
news.1.zst                          1114.2   1098.2   -1.4    
obj1.1.zst                          1098.5   1093.2   -0.5    
obj2.1.zst                          936.3    920.9    -1.6    
paper1.1.zst                        992.3    1002.7   +1.0    
paper2.1.zst                        1010.7   991.6    -1.9    
pic.1.zst                           1942.9   1938.8   -0.2    
progc.1.zst                         1001.4   1007.5   +0.6    
progl.1.zst                         1234.7   1235.9   +0.1    
progp.1.zst                         1218.7   1232.9   +1.2    
trans.1.zst                         1331.6   1328.0   -0.3    

alice29.txt.1.zst                   952.2    916.4    -3.8    
syoulik.txt.1.zst                   1037.2   988.1    -4.7    
cp.html.1.zst                       1147.8   1138.0   -0.9    
fields.c.1.zst                      840.1    831.4    -1.0    
grammar.lsp.1.zst                   613.8    605.5    -1.4    
kennedy.xls.1.zst                   1011.5   1018.5   +0.7    
lcet10.txt.1.zst                    1096.3   1083.2   -1.2    
lrabn12.txt.1.zst                   1065.3   1064.2   -0.1    
ptt5.1.zst                          1945.6   1943.7   -0.1    
sum.1.zst                           1058.5   1061.1   +0.2    
xargs.1.1.zst                       645.8    641.3    -0.7    

32k buffer, gcc-9 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1070.8   1070.6   -0.0    
mozilla.1.zst                       1020.5   1022.0   +0.1    
mr.1.zst                            1224.8   1226.6   +0.1    
nci.1.zst                           1687.4   1688.0   +0.0    
ooffice.1.zst                       945.1    946.1    +0.1    
osdb.1.zst                          1217.2   1218.9   +0.1    
reymont.1.zst                       1045.2   1050.5   +0.5    
samba.1.zst                         1401.1   1409.1   +0.6    
sao.1.zst                           973.6    974.2    +0.1    
webster.1.zst                       1109.5   1110.2   +0.1    
xml.1.zst                           1671.6   1679.8   +0.5    
x-ray.1.zst                         1170.1   1169.8   -0.0    

bib.1.zst                           1096.9   1098.0   +0.1    
book1.1.zst                         1083.7   1077.7   -0.6    
book2.1.zst                         1071.6   1071.4   -0.0    
geo.1.zst                           1372.8   1344.1   -2.1    
news.1.zst                          1115.0   1104.6   -0.9    
obj1.1.zst                          1099.8   1092.9   -0.6    
obj2.1.zst                          938.3    939.0    +0.1    
paper1.1.zst                        992.8    998.0    +0.5    
paper2.1.zst                        1008.8   1015.6   +0.7    
pic.1.zst                           1941.6   1946.2   +0.2    
progc.1.zst                         1000.7   1003.1   +0.2    
progl.1.zst                         1236.4   1232.0   -0.4    
progp.1.zst                         1219.4   1233.3   +1.1    
trans.1.zst                         1332.9   1326.7   -0.5    

alice29.txt.1.zst                   952.0    944.3    -0.8    
syoulik.txt.1.zst                   1039.7   1011.9   -2.7    
cp.html.1.zst                       1148.0   1135.7   -1.1    
fields.c.1.zst                      837.0    831.9    -0.6    
grammar.lsp.1.zst                   610.0    603.3    -1.1    
kennedy.xls.1.zst                   1012.3   1019.0   +0.7    
lcet10.txt.1.zst                    1094.0   1094.2   +0.0    
lrabn12.txt.1.zst                   1069.6   1071.8   +0.2    
ptt5.1.zst                          1944.4   1944.4   +0.0    
sum.1.zst                           1057.9   1063.6   +0.5    
xargs.1.1.zst                       645.1    639.4    -0.9    

128k buffer, gcc-9 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1067.8   1069.9   +0.2    
mozilla.1.zst                       1018.4   1021.1   +0.3    
mr.1.zst                            1227.4   1228.9   +0.1    
nci.1.zst                           1691.1   1697.5   +0.4    
ooffice.1.zst                       944.3    947.0    +0.3    
osdb.1.zst                          1215.2   1215.0   -0.0    
reymont.1.zst                       1044.9   1053.5   +0.8    
samba.1.zst                         1401.8   1405.3   +0.2    
sao.1.zst                           974.5    974.7    +0.0    
webster.1.zst                       1106.4   1112.1   +0.5    
xml.1.zst                           1675.3   1679.9   +0.3    
x-ray.1.zst                         1170.6   1171.1   +0.0    

bib.1.zst                           1095.6   1101.7   +0.6    
book1.1.zst                         1080.7   1084.4   +0.3    
book2.1.zst                         1074.1   1073.8   -0.0    
geo.1.zst                           1378.2   1374.1   -0.3    
news.1.zst                          1116.6   1119.8   +0.3    
obj1.1.zst                          1098.2   1088.2   -0.9    
obj2.1.zst                          938.5    939.4    +0.1    
paper1.1.zst                        994.4    1000.9   +0.7    
paper2.1.zst                        1010.9   1014.2   +0.3    
pic.1.zst                           1946.0   1942.3   -0.2    
progc.1.zst                         1001.2   1004.6   +0.3    
progl.1.zst                         1235.5   1241.3   +0.5    
progp.1.zst                         1222.3   1231.6   +0.8    
trans.1.zst                         1330.4   1327.5   -0.2    

alice29.txt.1.zst                   952.6    952.5    -0.0    
syoulik.txt.1.zst                   1037.6   1038.3   +0.1    
cp.html.1.zst                       1145.6   1141.4   -0.4    
fields.c.1.zst                      837.0    831.7    -0.6    
grammar.lsp.1.zst                   613.6    605.2    -1.4    
kennedy.xls.1.zst                   1013.8   1018.3   +0.4    
lcet10.txt.1.zst                    1096.6   1105.0   +0.8    
lrabn12.txt.1.zst                   1063.2   1072.4   +0.9    
ptt5.1.zst                          1943.9   1940.3   -0.2    
sum.1.zst                           1057.7   1060.9   +0.3    
xargs.1.1.zst                       643.9    638.5    -0.8    

16k buffer, gcc-10 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1088.2   1082.0   -0.6    
mozilla.1.zst                       1033.6   1031.6   -0.2    
mr.1.zst                            1241.1   1237.3   -0.3    
nci.1.zst                           1697.2   1699.7   +0.1    
ooffice.1.zst                       962.5    959.3    -0.3    
osdb.1.zst                          1232.8   1225.7   -0.6    
reymont.1.zst                       1054.4   1054.1   -0.0    
samba.1.zst                         1415.1   1415.1   +0.0    
sao.1.zst                           989.1    983.3    -0.6    
webster.1.zst                       1122.1   1116.2   -0.5    
xml.1.zst                           1690.7   1687.6   -0.2    
x-ray.1.zst                         1169.0   1174.9   +0.5    

bib.1.zst                           1106.9   1037.4   -6.3    
book1.1.zst                         1102.1   1090.1   -1.1    
book2.1.zst                         1084.4   1069.9   -1.3    
geo.1.zst                           1383.0   1344.9   -2.8    
news.1.zst                          1136.7   1112.1   -2.2    
obj1.1.zst                          1099.5   1104.1   +0.4    
obj2.1.zst                          951.2    916.7    -3.6    
paper1.1.zst                        1010.4   1000.5   -1.0    
paper2.1.zst                        1025.1   958.0    -6.5    
pic.1.zst                           1970.6   1962.8   -0.4    
progc.1.zst                         1017.7   1015.7   -0.2    
progl.1.zst                         1254.3   1241.4   -1.0    
progp.1.zst                         1246.5   1242.9   -0.3    
trans.1.zst                         1342.6   1337.9   -0.4    

alice29.txt.1.zst                   954.3    912.1    -4.4    
syoulik.txt.1.zst                   1048.6   994.2    -5.2    
cp.html.1.zst                       1144.5   1156.4   +1.0    
fields.c.1.zst                      833.6    846.8    +1.6    
grammar.lsp.1.zst                   612.1    614.2    +0.3    
kennedy.xls.1.zst                   1034.1   1030.9   -0.3    
lcet10.txt.1.zst                    1103.9   1087.8   -1.5    
lrabn12.txt.1.zst                   1088.8   1075.7   -1.2    
ptt5.1.zst                          1973.7   1964.8   -0.5    
sum.1.zst                           1073.2   1065.1   -0.8    
xargs.1.1.zst                       642.2    650.0    +1.2    

32k buffer, gcc-10 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1088.1   1081.4   -0.6    
mozilla.1.zst                       1033.7   1032.1   -0.2    
mr.1.zst                            1241.5   1230.2   -0.9    
nci.1.zst                           1693.3   1698.9   +0.3    
ooffice.1.zst                       963.6    960.4    -0.3    
osdb.1.zst                          1232.5   1212.3   -1.6    
reymont.1.zst                       1057.1   1046.4   -1.0    
samba.1.zst                         1420.9   1411.3   -0.7    
sao.1.zst                           989.1    983.7    -0.5    
webster.1.zst                       1125.8   1121.8   -0.4    
xml.1.zst                           1682.5   1688.3   +0.3    
x-ray.1.zst                         1168.7   1173.9   +0.4    

bib.1.zst                           1063.0   1109.1   +4.3    
book1.1.zst                         1094.5   1094.5   +0.0    
book2.1.zst                         1078.3   1081.4   +0.3    
geo.1.zst                           1377.4   1348.1   -2.1    
news.1.zst                          1137.7   1107.2   -2.7    
obj1.1.zst                          1102.0   1100.8   -0.1    
obj2.1.zst                          952.4    943.5    -0.9    
paper1.1.zst                        1006.7   999.6    -0.7    
paper2.1.zst                        1025.3   1019.5   -0.6    
pic.1.zst                           1974.4   1957.3   -0.9    
progc.1.zst                         1020.8   1014.6   -0.6    
progl.1.zst                         1258.6   1238.9   -1.6    
progp.1.zst                         1251.7   1240.0   -0.9    
trans.1.zst                         1342.3   1343.5   +0.1    

alice29.txt.1.zst                   945.1    904.6    -4.3    
syoulik.txt.1.zst                   1048.3   983.8    -6.2    
cp.html.1.zst                       1146.0   1150.7   +0.4    
fields.c.1.zst                      833.8    845.8    +1.4    
grammar.lsp.1.zst                   612.3    611.2    -0.2    
kennedy.xls.1.zst                   1034.0   1022.5   -1.1    
lcet10.txt.1.zst                    1107.7   1079.3   -2.6    
lrabn12.txt.1.zst                   1087.0   1070.9   -1.5    
ptt5.1.zst                          1973.9   1962.6   -0.6    
sum.1.zst                           1073.5   1063.3   -1.0    
xargs.1.1.zst                       644.1    648.7    +0.7    

128k buffer, gcc-10 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1083.9   1079.8   -0.4    
mozilla.1.zst                       1031.2   1032.1   +0.1    
mr.1.zst                            1240.0   1240.4   +0.0    
nci.1.zst                           1694.2   1699.5   +0.3    
ooffice.1.zst                       963.1    959.5    -0.4    
osdb.1.zst                          1234.1   1230.8   -0.3    
reymont.1.zst                       1055.1   1048.8   -0.6    
samba.1.zst                         1416.9   1410.2   -0.5    
sao.1.zst                           986.2    985.8    -0.0    
webster.1.zst                       1124.6   1120.3   -0.4    
xml.1.zst                           1693.8   1684.2   -0.6    
x-ray.1.zst                         1172.8   1174.2   +0.1    

bib.1.zst                           1109.0   1104.5   -0.4    
book1.1.zst                         1102.4   1097.5   -0.4    
book2.1.zst                         1080.7   1077.3   -0.3    
geo.1.zst                           1381.9   1383.0   +0.1    
news.1.zst                          1137.7   1132.5   -0.5    
obj1.1.zst                          1099.4   1101.0   +0.1    
obj2.1.zst                          951.2    948.6    -0.3    
paper1.1.zst                        1009.7   942.1    -6.7    
paper2.1.zst                        1021.1   1017.6   -0.3    
pic.1.zst                           1971.0   1967.7   -0.2    
progc.1.zst                         1018.7   1014.8   -0.4    
progl.1.zst                         1257.6   1248.0   -0.8    
progp.1.zst                         1247.7   1238.6   -0.7    
trans.1.zst                         1343.0   1337.8   -0.4    

alice29.txt.1.zst                   961.7    954.3    -0.8    
syoulik.txt.1.zst                   1048.7   1044.7   -0.4    
cp.html.1.zst                       1148.4   1156.6   +0.7    
fields.c.1.zst                      834.4    831.0    -0.4    
grammar.lsp.1.zst                   612.2    615.3    +0.5    
kennedy.xls.1.zst                   1032.9   1031.8   -0.1    
lcet10.txt.1.zst                    1106.7   1102.3   -0.4    
lrabn12.txt.1.zst                   1080.7   1083.1   +0.2    
ptt5.1.zst                          1970.4   1965.2   -0.3    
sum.1.zst                           1070.6   1061.3   -0.9    
xargs.1.1.zst                       643.6    646.9    +0.5    

16k buffer, clang-12 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1082.9   1076.0   -0.6    
mozilla.1.zst                       1025.6   1011.5   -1.4    
mr.1.zst                            1238.6   1231.0   -0.6    
nci.1.zst                           1663.1   1668.3   +0.3    
ooffice.1.zst                       939.0    929.3    -1.0    
osdb.1.zst                          1209.2   1202.5   -0.6    
reymont.1.zst                       1063.1   1061.1   -0.2    
samba.1.zst                         1413.4   1403.7   -0.7    
sao.1.zst                           957.7    955.9    -0.2    
webster.1.zst                       1118.1   1109.3   -0.8    
xml.1.zst                           1671.0   1657.7   -0.8    
x-ray.1.zst                         1174.5   1171.9   -0.2    

bib.1.zst                           1115.9   1052.0   -5.7    
book1.1.zst                         1088.7   1074.7   -1.3    
book2.1.zst                         1092.3   1070.1   -2.0    
geo.1.zst                           1385.2   1348.3   -2.7    
news.1.zst                          1121.1   1098.9   -2.0    
obj1.1.zst                          1104.2   1101.8   -0.2    
obj2.1.zst                          949.6    920.5    -3.1    
paper1.1.zst                        1020.4   1021.1   +0.1    
paper2.1.zst                        1028.3   982.8    -4.4    
pic.1.zst                           1930.9   1914.8   -0.8    
progc.1.zst                         1023.1   1020.2   -0.3    
progl.1.zst                         1257.8   1260.8   +0.2    
progp.1.zst                         1246.1   1255.1   +0.7    
trans.1.zst                         1324.4   1330.9   +0.5    

alice29.txt.1.zst                   973.9    947.3    -2.7    
syoulik.txt.1.zst                   1051.2   1022.6   -2.7    
cp.html.1.zst                       1162.8   1167.3   +0.4    
fields.c.1.zst                      852.9    851.7    -0.1    
grammar.lsp.1.zst                   629.5    627.7    -0.3    
kennedy.xls.1.zst                   1044.9   1018.9   -2.5    
lcet10.txt.1.zst                    1112.6   1083.6   -2.6    
lrabn12.txt.1.zst                   1072.7   1067.3   -0.5    
ptt5.1.zst                          1930.6   1917.0   -0.7    
sum.1.zst                           1088.4   1071.3   -1.6    
xargs.1.1.zst                       666.8    665.5    -0.2    

32k buffer, clang-12 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1079.3   1077.2   -0.2    
mozilla.1.zst                       1024.1   1009.2   -1.5    
mr.1.zst                            1234.4   1235.3   +0.1    
nci.1.zst                           1687.6   1668.0   -1.2    
ooffice.1.zst                       940.2    930.2    -1.1    
osdb.1.zst                          1198.4   1201.7   +0.3    
reymont.1.zst                       1060.4   1064.1   +0.3    
samba.1.zst                         1411.5   1402.0   -0.7    
sao.1.zst                           958.2    953.6    -0.5    
webster.1.zst                       1115.9   1106.6   -0.8    
xml.1.zst                           1664.8   1657.1   -0.5    
x-ray.1.zst                         1167.0   1170.6   +0.3    

bib.1.zst                           1112.4   1110.3   -0.2    
book1.1.zst                         1083.9   1076.7   -0.7    
book2.1.zst                         1087.3   1078.7   -0.8    
geo.1.zst                           1383.6   1350.4   -2.4    
news.1.zst                          1119.1   1092.7   -2.4    
obj1.1.zst                          1104.9   1103.1   -0.2    
obj2.1.zst                          946.9    934.9    -1.3    
paper1.1.zst                        1021.8   1020.6   -0.1    
paper2.1.zst                        1026.8   1033.0   +0.6    
pic.1.zst                           1929.0   1911.9   -0.9    
progc.1.zst                         1018.0   1015.8   -0.2    
progl.1.zst                         1262.4   1263.2   +0.1    
progp.1.zst                         1248.2   1257.1   +0.7    
trans.1.zst                         1328.4   1326.9   -0.1    

alice29.txt.1.zst                   971.5    933.4    -3.9    
syoulik.txt.1.zst                   1054.0   1004.4   -4.7    
cp.html.1.zst                       1170.3   1164.2   -0.5    
fields.c.1.zst                      855.3    855.5    +0.0    
grammar.lsp.1.zst                   629.1    626.4    -0.4    
kennedy.xls.1.zst                   1043.7   1021.4   -2.1    
lcet10.txt.1.zst                    1114.0   1089.6   -2.2    
lrabn12.txt.1.zst                   1076.6   1061.7   -1.4    
ptt5.1.zst                          1929.4   1913.2   -0.8    
sum.1.zst                           1090.7   1074.4   -1.5    
xargs.1.1.zst                       665.8    665.1    -0.1    

128k buffer, clang-12 nondev server
Decompression Mb/s
test                                orig     new      diff%

dickens.1.zst                       1075.0   1074.3   -0.1    
mozilla.1.zst                       1025.5   1014.5   -1.1    
mr.1.zst                            1236.2   1237.1   +0.1    
nci.1.zst                           1689.3   1670.3   -1.1    
ooffice.1.zst                       939.8    931.1    -0.9    
osdb.1.zst                          1209.1   1202.7   -0.5    
reymont.1.zst                       1067.3   1064.1   -0.3    
samba.1.zst                         1414.4   1399.2   -1.1    
sao.1.zst                           958.6    956.4    -0.2    
webster.1.zst                       1115.4   1108.5   -0.6    
xml.1.zst                           1673.4   1665.7   -0.5    
x-ray.1.zst                         1172.2   1172.1   -0.0    

bib.1.zst                           1112.2   1104.4   -0.7    
book1.1.zst                         1090.3   1079.7   -1.0    
book2.1.zst                         1090.0   1084.7   -0.5    
geo.1.zst                           1380.2   1382.4   +0.2    
news.1.zst                          1122.2   1116.5   -0.5    
obj1.1.zst                          1108.4   1107.4   -0.1    
obj2.1.zst                          948.1    935.2    -1.4    
paper1.1.zst                        1019.1   1018.8   -0.0    
paper2.1.zst                        1023.8   1029.8   +0.6    
pic.1.zst                           1931.2   1914.9   -0.8    
progc.1.zst                         1020.1   1021.2   +0.1    
progl.1.zst                         1256.0   1258.3   +0.2    
progp.1.zst                         1251.2   1259.8   +0.7    
trans.1.zst                         1331.0   1328.5   -0.2    

alice29.txt.1.zst                   980.7    974.6    -0.6    
syoulik.txt.1.zst                   1055.6   1051.9   -0.4    
cp.html.1.zst                       1163.1   1167.0   +0.3    
fields.c.1.zst                      858.6    851.0    -0.9    
grammar.lsp.1.zst                   629.9    629.9    +0.0    
kennedy.xls.1.zst                   1044.2   1021.1   -2.2    
lcet10.txt.1.zst                    1114.6   1108.2   -0.6    
lrabn12.txt.1.zst                   1078.6   1070.2   -0.8    
ptt5.1.zst                          1931.7   1914.1   -0.9    
sum.1.zst                           1086.2   1086.1   -0.0    
xargs.1.1.zst                       665.6    663.4    -0.3    

32k buffer, gcc.par devbig server
Decompression Mb/s
test                                orig     new      diff%

silesia.tar.3.zst                   924.9    897.0    -3.0    
dickens.3.zst                       795.4    784.1    -1.4    
mozilla.3.zst                       933.9    912.3    -2.3    
mr.3.zst                            887.8    886.4    -0.2    
nci.3.zst                           1510.6   1505.3   -0.4    
ooffice.3.zst                       804.9    787.7    -2.1    
osdb.3.zst                          1142.5   1128.1   -1.3    
reymont.3.zst                       893.0    884.2    -1.0    
samba.3.zst                         1259.2   1265.1   +0.5  
sao.3.zst                           808.2    802.7    -0.7    
webster.3.zst                       812.7    804.7    -1.0    
xml.3.zst                           1645.0   1640.7   -0.3    
x-ray.3.zst                         748.3    739.1    -1.2    

xargs.1.3.zst                       589.4    594.7    +0.9  
grammar.lsp.3.zst                   591.6    591.7    +0.0  
kennedy.xls.3.zst                   972.8    961.1    -1.2    
sum.3.zst                           975.5    973.3    -0.2    
syoulik.txt.3.zst                   837.2    843.2    +0.7  
ptt5.3.zst                          1687.2   1694.9   +0.5  
lcet10.txt.3.zst                    881.7    903.7    +2.5  
lrabn12.txt.3.zst                   799.0    802.4    +0.4  
alice29.txt.3.zst                   731.4    748.2    +2.3  
cp.html.3.zst                       1021.6   993.0    -2.8    
fields.c.3.zst                      828.0    835.1    +0.9  

bib.3.zst                           1009.3   1001.4   -0.8    
book1.3.zst                         823.9    830.6    +0.8  
book2.3.zst                         920.0    915.2    -0.5    
geo.3.zst                           1049.6   1044.7   -0.5    
news.3.zst                          975.2    965.0    -1.0    
obj1.3.zst                          1032.9   1031.9   -0.1    
obj2.3.zst                          847.8    843.9    -0.5    
paper1.3.zst                        896.1    913.3    +1.9  
paper2.3.zst                        881.7    899.6    +2.0  
pic.3.zst                           1759.8   1774.8   +0.9  
progc.3.zst                         919.6    923.0    +0.4  
progl.3.zst                         1169.9   1170.8   +0.1  
progp.3.zst                         1187.5   1176.8   -0.9    
trans.3.zst                         1308.7   1321.7   +1.0  

enwik7.txt.22.zst                   782.4    775.0    -0.9    
enwik8.txt.22.zst                   493.9    537.2    +8.8  
enwik9.txt.22.zst                   533.7    589.6    +10.5 

bib.dict.zst                        871.4    866.6    -0.6    
book1.dict.zst                      784.2    787.8    +0.5  
book2.dict.zst                      871.6    878.1    +0.7  
geo.dict.zst                        1124.6   1122.7   -0.2    
news.dict.zst                       947.1    924.5    -2.4    
obj1.dict.zst                       1149.9   1154.7   +0.4  
obj2.dict.zst                       918.7    919.2    +0.1  
paper1.dict.zst                     728.9    731.9    +0.4  
paper2.dict.zst                     745.0    746.6    +0.2  
pic.dict.zst                        1733.7   1731.0   -0.2    
progc.dict.zst                      832.5    818.0    -1.7    
progl.dict.zst                      1073.0   1075.8   +0.3  
progp.dict.zst                      1162.3   1155.7   -0.6    
trans.dict.zst                      1281.4   1285.8   +0.3  

silesia.tar.decompressStream.zst    4995.6   4998.3   +0.1  
calgary.tar.decompressStream.zst    18769.7  18941.2  +0.9  
canterbury.tar.decompressStream.zst 12244.7  12236.7  -0.1    
enwik7.txt.decompressStream.zst     12957.1  12949.7  -0.1    

32k buffer, gcc devbig server
Decompression Mb/s
test                                orig     new      diff%

silesia.tar.3.zst                   910.8    899.5    -1.2    
dickens.3.zst                       798.3    790.0    -1.0    
mozilla.3.zst                       943.2    938.8    -0.5    
mr.3.zst                            903.9    906.7    +0.3  
nci.3.zst                           1553.2   1534.5   -1.2    
ooffice.3.zst                       800.4    799.7    -0.1    
osdb.3.zst                          1133.9   1155.4   +1.9  
reymont.3.zst                       889.4    890.0    +0.1  
samba.3.zst                         1290.6   1271.9   -1.4    
sao.3.zst                           811.2    812.9    +0.2  
webster.3.zst                       792.9    786.9    -0.8    
xml.3.zst                           1612.8   1617.9   +0.3  
x-ray.3.zst                         732.6    729.1    -0.5    

xargs.1.3.zst                       588.2    592.0    +0.6  
grammar.lsp.3.zst                   591.3    594.6    +0.6  
kennedy.xls.3.zst                   973.5    967.5    -0.6    
sum.3.zst                           993.0    1001.8   +0.9  
syoulik.txt.3.zst                   848.0    843.3    -0.6    
ptt5.3.zst                          1735.6   1708.1   -1.6    
lcet10.txt.3.zst                    919.9    909.3    -1.2    
lrabn12.txt.3.zst                   795.1    795.0    -0.0    
alice29.txt.3.zst                   744.5    744.3    -0.0    
cp.html.3.zst                       1014.3   1012.3   -0.2    
fields.c.3.zst                      825.6    828.3    +0.3  

bib.3.zst                           986.0    975.1    -1.1    
book1.3.zst                         829.5    824.8    -0.6    
book2.3.zst                         922.0    925.1    +0.3  
geo.3.zst                           1039.4   1031.2   -0.8    
news.3.zst                          967.1    971.6    +0.5  
obj1.3.zst                          1006.3   1020.1   +1.4  
obj2.3.zst                          849.7    846.2    -0.4    
paper1.3.zst                        887.6    893.1    +0.6  
paper2.3.zst                        845.4    857.0    +1.4  
pic.3.zst                           1742.7   1741.6   -0.1    
progc.3.zst                         869.6    892.5    +2.6  
progl.3.zst                         1127.7   1138.3   +0.9  
progp.3.zst                         1139.4   1134.6   -0.4    
trans.3.zst                         1280.5   1310.5   +2.3  

enwik7.txt.22.zst                   757.5    753.6    -0.5    
enwik8.txt.22.zst                   546.7    481.4    -11.9   
enwik9.txt.22.zst                   579.5    501.8    -13.4   

bib.dict.zst                        864.0    887.1    +2.7  
book1.dict.zst                      780.6    795.5    +1.9  
book2.dict.zst                      872.5    888.6    +1.8  
geo.dict.zst                        1136.5   1111.9   -2.2    
news.dict.zst                       966.3    964.5    -0.2    
obj1.dict.zst                       1145.3   1158.2   +1.1  
obj2.dict.zst                       906.4    910.0    +0.4  
paper1.dict.zst                     715.8    713.7    -0.3    
paper2.dict.zst                     749.1    754.1    +0.7  
pic.dict.zst                        1759.7   1746.1   -0.8    
progc.dict.zst                      834.4    843.5    +1.1  
progl.dict.zst                      1055.0   1072.9   +1.7  
progp.dict.zst                      1156.8   1168.5   +1.0  
trans.dict.zst                      1245.7   1270.0   +2.0  

silesia.tar.decompressStream.zst    4994.8   5008.6   +0.3  
calgary.tar.decompressStream.zst    19601.6  19895.0  +1.5  
canterbury.tar.decompressStream.zst 12354.0  12493.2  +1.1  
enwik7.txt.decompressStream.zst     13118.6  13125.5  +0.1  

32k buffer, clang.par devbig server
Decompression Mb/s
test                                orig     new      diff%

silesia.tar.3.zst                   933.2    929.1    -0.4    
dickens.3.zst                       815.1    817.1    +0.2  
mozilla.3.zst                       977.0    969.1    -0.8    
mr.3.zst                            929.1    930.7    +0.2  
nci.3.zst                           1547.9   1531.1   -1.1    
ooffice.3.zst                       806.1    809.7    +0.4  
osdb.3.zst                          1162.2   1149.1   -1.1    
reymont.3.zst                       900.5    904.9    +0.5  
samba.3.zst                         1276.9   1246.6   -2.4    
sao.3.zst                           796.4    796.2    -0.0    
webster.3.zst                       798.4    781.1    -2.2    
xml.3.zst                           1604.4   1629.6   +1.6  
x-ray.3.zst                         766.7    766.6    -0.0    

xargs.1.3.zst                       626.3    620.2    -1.0    
grammar.lsp.3.zst                   614.3    601.1    -2.1    
kennedy.xls.3.zst                   1013.1   1014.2   +0.1  
sum.3.zst                           1033.6   1029.6   -0.4    
syoulik.txt.3.zst                   897.0    877.3    -2.2    
ptt5.3.zst                          1749.1   1769.2   +1.1  
lcet10.txt.3.zst                    976.6    974.7    -0.2    
lrabn12.txt.3.zst                   846.8    850.1    +0.4  
alice29.txt.3.zst                   785.6    786.5    +0.1  
cp.html.3.zst                       1052.5   1046.8   -0.5    
fields.c.3.zst                      855.6    843.8    -1.4    

bib.3.zst                           1021.0   1029.5   +0.8  
book1.3.zst                         854.1    855.9    +0.2  
book2.3.zst                         954.6    952.2    -0.3    
geo.3.zst                           1033.3   1023.6   -0.9    
news.3.zst                          982.5    966.1    -1.7    
obj1.3.zst                          1035.0   1036.1   +0.1  
obj2.3.zst                          866.8    858.0    -1.0    
paper1.3.zst                        926.6    928.7    +0.2  
paper2.3.zst                        899.1    915.2    +1.8  
pic.3.zst                           1715.1   1671.2   -2.6    
progc.3.zst                         887.9    886.0    -0.2    
progl.3.zst                         1166.2   1134.6   -2.7    
progp.3.zst                         1147.6   1154.7   +0.6  
trans.3.zst                         1290.9   1292.9   +0.2  

enwik7.txt.22.zst                   785.5    783.3    -0.3    
enwik8.txt.22.zst                   574.8    497.4    -13.5   
enwik9.txt.22.zst                   601.8    530.6    -11.8   

bib.dict.zst                        864.6    875.1    +1.2  
book1.dict.zst                      805.1    800.1    -0.6    
book2.dict.zst                      872.3    881.2    +1.0  
geo.dict.zst                        1089.9   1071.8   -1.7    
news.dict.zst                       958.8    948.3    -1.1    
obj1.dict.zst                       1145.5   1158.9   +1.2  
obj2.dict.zst                       911.1    874.8    -4.0    
paper1.dict.zst                     731.6    709.3    -3.0    
paper2.dict.zst                     766.9    770.1    +0.4  
pic.dict.zst                        1715.1   1723.4   +0.5  
progc.dict.zst                      863.0    847.5    -1.8    
progl.dict.zst                      1056.7   1056.4   -0.0    
progp.dict.zst                      1199.9   1165.4   -2.9    
trans.dict.zst                      1269.9   1299.8   +2.4  

silesia.tar.decompressStream.zst    4998.1   4998.6   +0.0  
calgary.tar.decompressStream.zst    19534.1  19535.8  +0.0  
canterbury.tar.decompressStream.zst 12656.2  12617.5  -0.3    
enwik7.txt.decompressStream.zst     13041.6  13055.9  +0.1  

@binhdvo
Copy link
Contributor Author

binhdvo commented Oct 21, 2021

Fixed regressions on enwik8/9 by improving the long matching function to avoid use of split exec functions where possible.

32k buffer, gcc.par devbig server
Decompression Mb/s
test                                orig     new      diff%
enwik7.txt.22.zst                   663.8    758.9    +14.3    
enwik8.txt.22.zst                   488.3    525.9    +7.7  
enwik9.txt.22.zst                   519.2    563.4    +8.5 

32k buffer, gcc devbig server
Decompression Mb/s
test                                orig     new      diff%
enwik7.txt.22.zst                   746.7    745.3    -0.2    
enwik8.txt.22.zst                   542.8    542.4    -0.1  
enwik9.txt.22.zst                   574.5    577.4    +0.5 

32k buffer, clang.par devbig server
Decompression Mb/s
test                                orig     new      diff%
enwik7.txt.22.zst                   773.8    763.3    -1.4    
enwik8.txt.22.zst                   553.8    528.0    -4.6  
enwik9.txt.22.zst                   564.0    569.8    +1.0 

@binhdvo
Copy link
Contributor Author

binhdvo commented Oct 22, 2021

Added alignment for gcc11 and increased the buffer size to 64k which for our corpora seems to perform as well as the full 128k buffer with some memory savings.

64k buffer, gcc7 yann server
Decompression Mb/s
test                                orig     new      diff%
dickens.1.zst                       1027.2   1035.7   +0.8    
mozilla.1.zst                       974.6    970.8    -0.4    
mr.1.zst                            1201.7   1201.3   -0.0    
nci.1.zst                           1590.5   1592.4   +0.1    
ooffice.1.zst                       917.6    913.4    -0.5    
osdb.1.zst                          1143.3   1157.9   +1.3    
reymont.1.zst                       1000.5   1009.8   +0.9    
samba.1.zst                         1330.6   1332.8   +0.2    
sao.1.zst                           951.6    947.2    -0.5    
webster.1.zst                       1057.0   1061.8   +0.5    
xml.1.zst                           1575.1   1578.0   +0.2    
x-ray.1.zst                         1171.4   1172.3   +0.1    

bib.1.zst                           1016.7   1042.3   +2.5    
book1.1.zst                         1049.2   1049.7   +0.0    
book2.1.zst                         1032.7   1036.2   +0.3    
geo.1.zst                           1379.8   1336.4   -3.1    
news.1.zst                          1083.5   1082.9   -0.1    
obj1.1.zst                          1059.4   1064.5   +0.5    
obj2.1.zst                          903.9    904.8    +0.1    
paper1.1.zst                        944.6    951.6    +0.7    
paper2.1.zst                        961.0    971.9    +1.1    
pic.1.zst                           1873.3   1852.9   -1.1    
progc.1.zst                         934.5    948.5    +1.5    
progl.1.zst                         1113.9   1155.2   +3.7    
progp.1.zst                         1115.4   1152.1   +3.3    
trans.1.zst                         1172.9   1227.8   +4.7    

alice29.txt.1.zst                   914.8    919.0    +0.5    
syoulik.txt.1.zst                   992.4    997.1    +0.5    
cp.html.1.zst                       1046.8   1076.2   +2.8    
fields.c.1.zst                      781.7    786.1    +0.6    
grammar.lsp.1.zst                   598.8    597.3    -0.3    
kennedy.xls.1.zst                   943.3    914.6    -3.0    
lcet10.txt.1.zst                    1054.3   1052.6   -0.2    
lrabn12.txt.1.zst                   1036.9   1029.7   -0.7    
ptt5.1.zst                          1874.6   1847.2   -1.5    
sum.1.zst                           989.2    1006.5   +1.7    
xargs.1.1.zst                       619.5    620.6    +0.2    

64k buffer, gcc8 yann server
Decompression Mb/s
test                                orig     new      diff%
dickens.1.zst                       1066.0   1064.2   -0.2    
mozilla.1.zst                       993.9    1001.9   +0.8    
mr.1.zst                            1232.9   1238.0   +0.4    
nci.1.zst                           1645.8   1664.6   +1.1    
ooffice.1.zst                       938.2    940.3    +0.2    
osdb.1.zst                          1218.0   1211.3   -0.6    
reymont.1.zst                       1034.0   1038.4   +0.4    
samba.1.zst                         1387.9   1393.6   +0.4    
sao.1.zst                           975.3    973.8    -0.2    
webster.1.zst                       1104.8   1103.9   -0.1    
xml.1.zst                           1653.7   1672.1   +1.1    
x-ray.1.zst                         1169.3   1173.9   +0.4    

bib.1.zst                           1073.6   1084.1   +1.0    
book1.1.zst                         1087.9   1081.9   -0.6    
book2.1.zst                         1068.2   1068.1   -0.0    
geo.1.zst                           1384.4   1335.0   -3.6    
news.1.zst                          1126.4   1118.5   -0.7    
obj1.1.zst                          1080.1   1074.6   -0.5    
obj2.1.zst                          921.3    928.6    +0.8    
paper1.1.zst                        989.3    988.7    -0.1    
paper2.1.zst                        1001.1   996.9    -0.4    
pic.1.zst                           1908.0   1930.1   +1.2    
progc.1.zst                         997.1    995.6    -0.2    
progl.1.zst                         1220.1   1225.4   +0.4    
progp.1.zst                         1213.8   1215.0   +0.1    
trans.1.zst                         1290.0   1305.0   +1.2    

alice29.txt.1.zst                   953.1    943.0    -1.1    
syoulik.txt.1.zst                   1032.5   1028.2   -0.4    
cp.html.1.zst                       1130.8   1127.2   -0.3    
fields.c.1.zst                      822.3    823.0    +0.1    
grammar.lsp.1.zst                   610.1    607.2    -0.5    
kennedy.xls.1.zst                   960.0    964.8    +0.5    
lcet10.txt.1.zst                    1094.7   1091.1   -0.3    
lrabn12.txt.1.zst                   1076.3   1059.7   -1.5    
ptt5.1.zst                          1899.0   1926.0   +1.4    
sum.1.zst                           1045.8   1053.9   +0.8    
xargs.1.1.zst                       640.6    641.7    +0.2    

64k buffer, gcc9 yann server
Decompression Mb/s
test                                orig     new      diff%
dickens.1.zst                       1065.4   1071.9   +0.6    
mozilla.1.zst                       1019.6   1021.3   +0.2    
mr.1.zst                            1228.9   1228.4   -0.0    
nci.1.zst                           1690.6   1696.8   +0.4    
ooffice.1.zst                       945.9    944.9    -0.1    
osdb.1.zst                          1218.1   1219.9   +0.1    
reymont.1.zst                       1048.9   1053.4   +0.4    
samba.1.zst                         1403.6   1412.5   +0.6    
sao.1.zst                           974.8    975.5    +0.1    
webster.1.zst                       1108.2   1114.1   +0.5    
xml.1.zst                           1676.1   1683.8   +0.5    
x-ray.1.zst                         1173.8   1172.4   -0.1    

bib.1.zst                           1099.4   1100.3   +0.1    
book1.1.zst                         1082.1   1085.8   +0.3    
book2.1.zst                         1075.2   1072.6   -0.2    
geo.1.zst                           1380.6   1335.0   -3.3    
news.1.zst                          1114.0   1119.5   +0.5    
obj1.1.zst                          1094.3   1087.2   -0.6    
obj2.1.zst                          935.2    938.1    +0.3    
paper1.1.zst                        993.8    1001.9   +0.8    
paper2.1.zst                        1010.1   1016.5   +0.6    
pic.1.zst                           1946.5   1942.4   -0.2    
progc.1.zst                         1004.3   1008.2   +0.4    
progl.1.zst                         1239.5   1232.8   -0.5    
progp.1.zst                         1217.7   1224.7   +0.6    
trans.1.zst                         1334.0   1327.1   -0.5    

alice29.txt.1.zst                   956.5    960.0    +0.4    
syoulik.txt.1.zst                   1037.3   1039.4   +0.2    
cp.html.1.zst                       1145.6   1135.0   -0.9    
fields.c.1.zst                      834.6    834.9    +0.0    
grammar.lsp.1.zst                   611.4    602.2    -1.5    
kennedy.xls.1.zst                   1014.7   1018.6   +0.4    
lcet10.txt.1.zst                    1097.5   1101.0   +0.3    
lrabn12.txt.1.zst                   1072.1   1074.5   +0.2    
ptt5.1.zst                          1947.7   1943.9   -0.2    
sum.1.zst                           1058.6   1067.3   +0.8    
xargs.1.1.zst                       644.1    641.0    -0.5    

64k buffer, gcc10 yann server
Decompression Mb/s
test                                orig     new      diff%
dickens.1.zst                       1080.6   1084.3   +0.3    
mozilla.1.zst                       1034.0   1032.4   -0.2    
mr.1.zst                            1243.8   1240.4   -0.3    
nci.1.zst                           1693.5   1702.0   +0.5    
ooffice.1.zst                       964.1    961.3    -0.3    
osdb.1.zst                          1232.5   1230.4   -0.2    
reymont.1.zst                       1053.0   1055.8   +0.3    
samba.1.zst                         1418.7   1416.0   -0.2    
sao.1.zst                           988.7    984.5    -0.4    
webster.1.zst                       1123.2   1117.1   -0.5    
xml.1.zst                           1694.1   1677.8   -1.0    
x-ray.1.zst                         1172.3   1173.7   +0.1    

bib.1.zst                           1108.0   1106.2   -0.2    
book1.1.zst                         1106.3   1100.8   -0.5    
book2.1.zst                         1085.2   1083.4   -0.2    
geo.1.zst                           1384.6   1335.6   -3.5    
news.1.zst                          1138.4   1134.3   -0.4    
obj1.1.zst                          1095.5   1102.8   +0.7    
obj2.1.zst                          952.0    948.8    -0.3    
paper1.1.zst                        1009.5   999.2    -1.0    
paper2.1.zst                        1020.0   1012.8   -0.7    
pic.1.zst                           1971.6   1972.0   +0.0    
progc.1.zst                         1017.7   1017.8   +0.0    
progl.1.zst                         1256.9   1248.0   -0.7    
progp.1.zst                         1246.6   1239.5   -0.6    
trans.1.zst                         1347.0   1338.7   -0.6    

alice29.txt.1.zst                   961.7    961.3    -0.0    
syoulik.txt.1.zst                   1049.1   1046.4   -0.3    
cp.html.1.zst                       1147.6   1153.5   +0.5    
fields.c.1.zst                      832.0    840.6    +1.0    
grammar.lsp.1.zst                   609.8    614.5    +0.8    
kennedy.xls.1.zst                   1034.3   1032.2   -0.2    
lcet10.txt.1.zst                    1106.6   1103.9   -0.2    
lrabn12.txt.1.zst                   1088.4   1084.1   -0.4    
ptt5.1.zst                          1971.4   1972.1   +0.0    
sum.1.zst                           1073.7   1063.6   -0.9    
xargs.1.1.zst                       641.5    649.3    +1.2    

64k buffer, clang12 yann server
Decompression Mb/s
test                                orig     new      diff%
dickens.1.zst                       1081.0   1076.1   -0.5    
mozilla.1.zst                       1026.6   1014.9   -1.1    
mr.1.zst                            1234.4   1235.9   +0.1    
nci.1.zst                           1689.7   1670.7   -1.1    
ooffice.1.zst                       942.3    931.7    -1.1    
osdb.1.zst                          1209.9   1203.3   -0.5    
reymont.1.zst                       1065.3   1067.2   +0.2    
samba.1.zst                         1411.9   1406.9   -0.4    
sao.1.zst                           958.2    957.6    -0.1    
webster.1.zst                       1112.2   1110.9   -0.1    
xml.1.zst                           1672.6   1663.5   -0.5    
x-ray.1.zst                         1169.7   1174.7   +0.4    

bib.1.zst                           1111.3   1107.7   -0.3    
book1.1.zst                         1086.7   1084.4   -0.2    
book2.1.zst                         1088.0   1084.4   -0.3    
geo.1.zst                           1382.8   1334.2   -3.5    
news.1.zst                          1119.8   1115.7   -0.4    
obj1.1.zst                          1062.4   1104.2   +3.9    
obj2.1.zst                          946.8    932.6    -1.5    
paper1.1.zst                        1023.1   1017.6   -0.5    
paper2.1.zst                        1031.2   1029.1   -0.2    
pic.1.zst                           1934.2   1916.5   -0.9    
progc.1.zst                         1023.6   1025.1   +0.1    
progl.1.zst                         1256.1   1266.7   +0.8    
progp.1.zst                         1253.8   1258.1   +0.3    
trans.1.zst                         1331.1   1329.8   -0.1    

alice29.txt.1.zst                   976.5    980.7    +0.4    
syoulik.txt.1.zst                   1053.6   1050.9   -0.3    
cp.html.1.zst                       1167.1   1166.2   -0.1    
fields.c.1.zst                      853.3    855.3    +0.2    
grammar.lsp.1.zst                   630.7    627.2    -0.6    
kennedy.xls.1.zst                   1043.6   1020.8   -2.2    
lcet10.txt.1.zst                    1116.0   1105.5   -0.9    
lrabn12.txt.1.zst                   1075.5   1075.0   -0.0    
ptt5.1.zst                          1934.6   1915.2   -1.0    
sum.1.zst                           1088.2   1081.0   -0.7    
xargs.1.1.zst                       662.7    664.1    +0.2    

64k buffer, gcc11 nick server
Decompression Mb/s
test                                orig     new      diff%
dickens.1.zst                       1070.3   1068.2   -0.2    
mozilla.1.zst                       989.2    995.8    +0.7    
mr.1.zst                            1220.1   1226.6   +0.5    
nci.1.zst                           1630.5   1638.0   +0.5    
ooffice.1.zst                       921.4    930.9    +1.0    
osdb.1.zst                          1189.1   1203.3   +1.2    
reymont.1.zst                       1037.2   1045.6   +0.8    
samba.1.zst                         1373.3   1388.5   +1.1    
sao.1.zst                           945.4    953.9    +0.9    
webster.1.zst                       1094.6   1107.0   +1.1    
xml.1.zst                           1628.2   1653.3   +1.5    
x-ray.1.zst                         1167.9   1166.0   -0.2    

bib.1.zst                           1095.6   1090.9   -0.4    
book1.1.zst                         1070.0   1075.0   +0.5    
book2.1.zst                         1060.4   1067.5   +0.7    
geo.1.zst                           1358.5   1321.6   -2.7    
news.1.zst                          1093.1   1102.9   +0.9    
obj1.1.zst                          1074.1   1085.9   +1.1    
obj2.1.zst                          919.4    920.4    +0.1    
paper1.1.zst                        990.7    1005.0   +1.4    
paper2.1.zst                        1011.5   1017.2   +0.6    
pic.1.zst                           1899.3   1911.8   +0.7    
progc.1.zst                         1008.9   1009.6   +0.1    
progl.1.zst                         1237.4   1254.7   +1.4    
progp.1.zst                         1228.3   1243.6   +1.2    
trans.1.zst                         1314.9   1320.5   +0.4    

alice29.txt.1.zst                   956.8    959.7    +0.3    
syoulik.txt.1.zst                   1034.5   1035.9   +0.1    
cp.html.1.zst                       1143.1   1148.4   +0.5    
fields.c.1.zst                      833.7    836.7    +0.4    
grammar.lsp.1.zst                   614.5    610.5    -0.7    
kennedy.xls.1.zst                   946.7    984.5    +4.0    
lcet10.txt.1.zst                    1086.1   1093.4   +0.7    
lrabn12.txt.1.zst                   1060.9   1063.7   +0.3    
ptt5.1.zst                          1900.1   1914.2   +0.7    
sum.1.zst                           1047.3   1049.4   +0.2    
xargs.1.1.zst                       643.5    646.6    +0.5    

64k buffer, clang12 nick server
Decompression Mb/s
test                                orig     new      diff%
dickens.1.zst                       1080.6   1076.0   -0.4    
mozilla.1.zst                       997.5    999.1    +0.2    
mr.1.zst                            1230.9   1231.5   +0.0    
nci.1.zst                           1612.2   1610.3   -0.1    
ooffice.1.zst                       928.2    933.7    +0.6    
osdb.1.zst                          1189.5   1206.7   +1.4    
reymont.1.zst                       1049.0   1044.2   -0.5    
samba.1.zst                         1375.7   1385.7   +0.7    
sao.1.zst                           947.9    954.6    +0.7    
webster.1.zst                       1100.2   1103.2   +0.3    
xml.1.zst                           1616.4   1635.4   +1.2    
x-ray.1.zst                         1169.3   1163.3   -0.5    

bib.1.zst                           1091.3   1076.8   -1.3    
book1.1.zst                         1085.4   1085.7   +0.0    
book2.1.zst                         1071.0   1067.0   -0.4    
geo.1.zst                           1353.9   1302.7   -3.8    
news.1.zst                          1107.9   1116.9   +0.8    
obj1.1.zst                          1089.4   1093.4   +0.4    
obj2.1.zst                          920.6    920.0    -0.1    
paper1.1.zst                        1006.0   1006.7   +0.1    
paper2.1.zst                        1022.3   1017.8   -0.4    
pic.1.zst                           1871.5   1892.5   +1.1    
progc.1.zst                         1013.7   1004.4   -0.9    
progl.1.zst                         1243.5   1230.1   -1.1    
progp.1.zst                         1244.1   1231.4   -1.0    
trans.1.zst                         1285.5   1304.0   +1.4    

alice29.txt.1.zst                   964.5    959.3    -0.5    
syoulik.txt.1.zst                   1049.1   1039.6   -0.9    
cp.html.1.zst                       1149.1   1145.5   -0.3    
fields.c.1.zst                      849.3    845.1    -0.5    
grammar.lsp.1.zst                   626.6    621.0    -0.9    
kennedy.xls.1.zst                   990.2    1012.4   +2.2    
lcet10.txt.1.zst                    1102.2   1093.9   -0.8    
lrabn12.txt.1.zst                   1071.8   1074.3   +0.2    
ptt5.1.zst                          1868.1   1893.9   +1.4    
sum.1.zst                           1063.0   1064.1   +0.1    
xargs.1.1.zst                       659.0    659.1    +0.0    

@Cyan4973
Copy link
Contributor

These results look good to me.

@binhdvo
Copy link
Contributor Author

binhdvo commented Oct 22, 2021

Updated devbig server tests with 64k buffer and alignment changes (which should not make a difference since gcc11 is not used on devbig).

64k buffer, gcc.par devbig server
Decompression Mb/s
test                                orig     new      diff%
silesia.tar.3.zst                   894.3    888.8    -0.6    
dickens.3.zst                       772.5    767.6    -0.6    
mozilla.3.zst                       896.0    904.2    +0.9  
mr.3.zst                            883.1    891.0    +0.9  
nci.3.zst                           1535.3   1497.8   -2.4    
ooffice.3.zst                       791.7    788.8    -0.4    
osdb.3.zst                          1133.8   1122.8   -1.0    
reymont.3.zst                       880.0    874.5    -0.6    
samba.3.zst                         1210.5   1233.3   +1.9  
sao.3.zst                           785.4    786.7    +0.2  
webster.3.zst                       786.5    785.7    -0.1    
xml.3.zst                           1589.0   1545.5   -2.7    
x-ray.3.zst                         727.8    720.9    -0.9    

xargs.1.3.zst                       580.5    571.1    -1.6    
grammar.lsp.3.zst                   588.7    589.1    +0.1  
kennedy.xls.3.zst                   982.3    985.4    +0.3  
sum.3.zst                           1005.7   997.7    -0.8    
syoulik.txt.3.zst                   860.5    857.2    -0.4    
ptt5.3.zst                          1720.4   1712.2   -0.5    
lcet10.txt.3.zst                    893.7    904.7    +1.2  
lrabn12.txt.3.zst                   791.9    778.2    -1.7    
alice29.txt.3.zst                   733.1    729.2    -0.5    
cp.html.3.zst                       1008.0   1000.2   -0.8    
fields.c.3.zst                      796.5    795.1    -0.2    

bib.3.zst                           974.4    964.0    -1.1    
book1.3.zst                         813.5    806.4    -0.9    
book2.3.zst                         894.7    895.8    +0.1  
geo.3.zst                           1018.9   992.1    -2.6    
news.3.zst                          929.9    943.4    +1.5  
obj1.3.zst                          1000.4   1009.6   +0.9  
obj2.3.zst                          832.2    819.6    -1.5    
paper1.3.zst                        884.6    881.8    -0.3    
paper2.3.zst                        864.8    855.7    -1.1    
pic.3.zst                           1702.8   1719.5   +1.0  
progc.3.zst                         892.5    882.5    -1.1    
progl.3.zst                         1141.1   1132.8   -0.7    
progp.3.zst                         1164.2   1147.5   -1.4    
trans.3.zst                         1274.5   1263.7   -0.8    

enwik7.txt.22.zst                   755.9    766.3    +1.4  
enwik8.txt.22.zst                   490.9    546.9    +11.4 
enwik9.txt.22.zst                   532.6    579.9    +8.9  

bib.dict.zst                        861.1    877.1    +1.9  
book1.dict.zst                      780.5    782.6    +0.3  
book2.dict.zst                      868.4    868.0    -0.0    
geo.dict.zst                        1078.9   1074.6   -0.4    
news.dict.zst                       940.2    938.8    -0.1    
obj1.dict.zst                       1137.2   1140.5   +0.3  
obj2.dict.zst                       920.2    923.5    +0.4  
paper1.dict.zst                     720.0    730.8    +1.5  
paper2.dict.zst                     753.7    748.9    -0.6    
pic.dict.zst                        1696.2   1696.3   +0.0  
progc.dict.zst                      810.5    810.6    +0.0  
progl.dict.zst                      1072.5   1063.1   -0.9    
progp.dict.zst                      1169.3   1156.3   -1.1    
trans.dict.zst                      1289.0   1286.7   -0.2    

silesia.tar.decompressStream.zst    5003.4   4995.1   -0.2    
calgary.tar.decompressStream.zst    19265.9  19100.2  -0.9    
canterbury.tar.decompressStream.zst 12470.9  12415.3  -0.4    
enwik7.txt.decompressStream.zst     13042.4  13034.9  -0.1    

64k buffer, gcc devbig server
Decompression Mb/s
test                                orig     new      diff%
silesia.tar.3.zst                   892.0    891.4    -0.1    
dickens.3.zst                       774.0    777.0    +0.4  
mozilla.3.zst                       922.5    913.6    -1.0    
mr.3.zst                            877.0    886.6    +1.1  
nci.3.zst                           1517.2   1496.9   -1.3    
ooffice.3.zst                       801.8    797.7    -0.5    
osdb.3.zst                          1126.6   1147.0   +1.8  
reymont.3.zst                       884.5    864.5    -2.3    
samba.3.zst                         1282.0   1282.8   +0.1  
sao.3.zst                           803.0    810.7    +1.0  
webster.3.zst                       798.8    805.7    +0.9  
xml.3.zst                           1626.1   1614.9   -0.7    
x-ray.3.zst                         746.4    746.9    +0.1  

xargs.1.3.zst                       594.7    595.8    +0.2  
grammar.lsp.3.zst                   601.8    594.7    -1.2    
kennedy.xls.3.zst                   974.9    981.4    +0.7  
sum.3.zst                           1013.9   1014.0   +0.0  
syoulik.txt.3.zst                   859.0    854.3    -0.5    
ptt5.3.zst                          1739.9   1725.9   -0.8    
lcet10.txt.3.zst                    932.1    921.4    -1.1    
lrabn12.txt.3.zst                   789.9    782.7    -0.9    
alice29.txt.3.zst                   743.9    738.8    -0.7    
cp.html.3.zst                       1008.7   1024.1   +1.5  
fields.c.3.zst                      833.0    827.6    -0.6    

bib.3.zst                           969.3    978.7    +1.0  
book1.3.zst                         796.1    791.1    -0.6    
book2.3.zst                         900.7    898.4    -0.3    
geo.3.zst                           1039.3   1012.3   -2.6    
news.3.zst                          956.3    955.2    -0.1    
obj1.3.zst                          1007.2   1026.5   +1.9  
obj2.3.zst                          824.5    837.3    +1.6  
paper1.3.zst                        898.1    888.5    -1.1    
paper2.3.zst                        880.0    884.5    +0.5  
pic.3.zst                           1704.0   1703.5   -0.0    
progc.3.zst                         907.7    902.6    -0.6    
progl.3.zst                         1141.2   1152.4   +1.0  
progp.3.zst                         1141.5   1173.7   +2.8  
trans.3.zst                         1265.6   1299.8   +2.7  

enwik7.txt.22.zst                   768.8    765.4    -0.4    
enwik8.txt.22.zst                   551.7    551.1    -0.1    
enwik9.txt.22.zst                   584.2    589.3    +0.9  

bib.dict.zst                        873.3    887.4    +1.6  
book1.dict.zst                      788.4    796.9    +1.1  
book2.dict.zst                      853.4    859.9    +0.8  
geo.dict.zst                        1125.6   1101.0   -2.2    
news.dict.zst                       961.1    958.1    -0.3    
obj1.dict.zst                       1156.6   1154.1   -0.2    
obj2.dict.zst                       926.7    921.0    -0.6    
paper1.dict.zst                     721.0    719.3    -0.2    
paper2.dict.zst                     741.0    738.2    -0.4    
pic.dict.zst                        1715.4   1704.5   -0.6    
progc.dict.zst                      817.5    826.2    +1.1  
progl.dict.zst                      1037.2   1049.6   +1.2  
progp.dict.zst                      1121.4   1145.9   +2.2  
trans.dict.zst                      1237.4   1263.1   +2.1  

silesia.tar.decompressStream.zst    4996.9   4995.8   -0.0    
calgary.tar.decompressStream.zst    20256.2  20100.1  -0.8    
canterbury.tar.decompressStream.zst 12294.5  12450.6  +1.3  
enwik7.txt.decompressStream.zst     12970.0  12985.8  +0.1  

64k buffer, clang.par devbig server
Decompression Mb/s
test                                orig     new      diff%
silesia.tar.3.zst                   914.0    912.0    -0.2    
dickens.3.zst                       811.8    814.2    +0.3  
mozilla.3.zst                       947.5    941.3    -0.7    
mr.3.zst                            919.7    922.7    +0.3  
nci.3.zst                           1519.6   1526.5   +0.5  
ooffice.3.zst                       829.5    825.2    -0.5    
osdb.3.zst                          1172.2   1172.6   +0.0  
reymont.3.zst                       921.5    927.8    +0.7  
samba.3.zst                         1280.2   1275.7   -0.4    
sao.3.zst                           830.9    829.9    -0.1    
webster.3.zst                       811.5    814.5    +0.4  
xml.3.zst                           1611.9   1617.9   +0.4  
x-ray.3.zst                         724.7    748.8    +3.3  

xargs.1.3.zst                       612.5    610.9    -0.3    
grammar.lsp.3.zst                   614.5    610.7    -0.6    
kennedy.xls.3.zst                   1017.5   1019.9   +0.2  
sum.3.zst                           1033.2   1031.5   -0.2    
syoulik.txt.3.zst                   887.7    879.1    -1.0    
ptt5.3.zst                          1765.6   1749.4   -0.9    
lcet10.txt.3.zst                    947.0    945.7    -0.1    
lrabn12.txt.3.zst                   822.6    846.9    +3.0  
alice29.txt.3.zst                   783.1    779.8    -0.4    
cp.html.3.zst                       1038.7   1038.9   +0.0  
fields.c.3.zst                      846.1    836.2    -1.2    

bib.3.zst                           999.8    1014.7   +1.5  
book1.3.zst                         848.8    840.1    -1.0    
book2.3.zst                         928.8    945.0    +1.7  
geo.3.zst                           1028.3   1025.2   -0.3    
news.3.zst                          975.1    973.5    -0.2    
obj1.3.zst                          1026.9   1018.5   -0.8    
obj2.3.zst                          846.7    841.2    -0.6    
paper1.3.zst                        907.5    924.0    +1.8  
paper2.3.zst                        897.0    897.0    +0.0  
pic.3.zst                           1728.8   1730.7   +0.1  
progc.3.zst                         905.0    908.6    +0.4  
progl.3.zst                         1151.5   1164.4   +1.1  
progp.3.zst                         1165.1   1174.4   +0.8  
trans.3.zst                         1290.2   1279.7   -0.8    

enwik7.txt.22.zst                   803.3    802.7    -0.1    
enwik8.txt.22.zst                   561.3    552.2    -1.6    
enwik9.txt.22.zst                   608.5    589.6    -3.1    

bib.dict.zst                        889.1    894.5    +0.6  
book1.dict.zst                      808.0    795.7    -1.5    
book2.dict.zst                      894.3    894.9    +0.1  
geo.dict.zst                        1091.2   1089.9   -0.1    
news.dict.zst                       965.5    964.3    -0.1    
obj1.dict.zst                       1158.1   1152.2   -0.5    
obj2.dict.zst                       909.0    914.3    +0.6  
paper1.dict.zst                     747.4    746.4    -0.1    
paper2.dict.zst                     757.8    772.6    +2.0  
pic.dict.zst                        1690.3   1718.3   +1.7  
progc.dict.zst                      867.7    845.6    -2.5    
progl.dict.zst                      1053.7   1048.7   -0.5    
progp.dict.zst                      1165.4   1139.5   -2.2    
trans.dict.zst                      1266.0   1300.1   +2.7  

silesia.tar.decompressStream.zst    4985.9   4996.0   +0.2  
calgary.tar.decompressStream.zst    19949.1  19877.2  -0.4    
canterbury.tar.decompressStream.zst 12515.0  12503.2  -0.1    
enwik7.txt.decompressStream.zst     13041.1  13067.5  +0.2  

@terrelln
Copy link
Contributor

Awesome! The performance looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants