Skip to content

Commit

Permalink
Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1%…
Browse files Browse the repository at this point in the history
… Speed)

Unrolling the loop to handle 2 positions in each iteration allows us to reduce
the frequency of some operations that don't need to happen at every position.
One such operation is the step calculation, which is a very rough heuristic
anyways. It's fine if we do this a position later. The other operation is the
repcode check. But since the repcode check already tries expanding back one
position, we're really not missing much of importance by only trying it every
other position.

This commit also slightly reorders some operations.
  • Loading branch information
felixhandte committed Aug 20, 2021
1 parent c7bc971 commit 1ca6e16
Showing 1 changed file with 39 additions and 8 deletions.
47 changes: 39 additions & 8 deletions lib/compress/zstd_fast.c
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,7 @@ ZSTD_compressBlock_fast_generic_pipelined(
const BYTE* ip0 = istart;
const BYTE* ip1;
const BYTE* ip2;
const BYTE* ip3;
U32 current0;

U32 rep_offset1 = rep[0];
Expand Down Expand Up @@ -284,8 +285,9 @@ ZSTD_compressBlock_fast_generic_pipelined(
/* calculate positions, ip0 - anchor == 0, so we skip step calc */
ip1 = ip0 + stepSize;
ip2 = ip1 + stepSize;
ip3 = ip2 + stepSize;

if (ip2 >= ilimit) {
if (ip3 >= ilimit) {
goto _cleanup;
}

Expand All @@ -298,9 +300,8 @@ ZSTD_compressBlock_fast_generic_pipelined(
/* load repcode match for ip[2]*/
const U32 rval = MEM_read32(ip2 - rep_offset1);

current0 = ip0 - base;

/* write back hash table entry */
current0 = ip0 - base;
hashTable[hash0] = current0;

/* check repcode at ip[2] */
Expand Down Expand Up @@ -328,16 +329,45 @@ ZSTD_compressBlock_fast_generic_pipelined(
goto _offset;
}

hash0 = hash1;
/* lookup ip[1] */
idx = hashTable[hash1];

/* hash ip[2] */
hash0 = hash1;
hash1 = ZSTD_hashPtr(ip2, hlog, mls);

/* advance to next positions */
ip0 = ip1;
ip1 = ip2;
ip2 = ip3;
ip3 += step;

/* write back hash table entry */
current0 = ip0 - base;
hashTable[hash0] = current0;

/* load match for ip[0] */
if (idx >= prefixStartIndex) {
mval = MEM_read32(base + idx);
} else {
mval = MEM_read32(ip0) ^ 1; /* guaranteed to not match. */
}

/* check match at ip[0] */
if (MEM_read32(ip0) == mval) {
/* found a match! */
goto _offset;
}

/* lookup ip[1] */
idx = hashTable[hash0];
idx = hashTable[hash1];

/* hash ip[2] */
hash0 = hash1;
hash1 = ZSTD_hashPtr(ip2, hlog, mls);

/* calculate step */
if (ip1 >= nextStep) {
if (ip2 >= nextStep) {
PREFETCH_L1(ip1 + 64);
PREFETCH_L1(ip1 + 128);
step++;
Expand All @@ -347,8 +377,9 @@ ZSTD_compressBlock_fast_generic_pipelined(
/* advance to next positions */
ip0 = ip1;
ip1 = ip2;
ip2 += step;
} while (ip2 < ilimit);
ip2 = ip3;
ip3 += step;
} while (ip3 < ilimit);

_cleanup:
/* Note that there are probably still a couple positions we could search.
Expand Down

0 comments on commit 1ca6e16

Please sign in to comment.