Optimize a scan of non state-chaning bytes with SSSE3 instructions #58

This commit optimizes the scan of non-state-changing bytes using SSE2 instructions. A [_mm_cmpestri](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmpestri) operation appears to be quite slow compared to alternative approach that involves [_mm_shuffle_epi8](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shuffle_epi8) for low/high nibble of the input and using bitwise-and for the results to get a 16 bytes of LUT in one go (it also involves a bunch of other SSE2 operations which all have nice latency/throughput properties). The resulting LUT of 16 bytes can be analyzed (also vectorized) to get the index of the first byte (if any) that changes the state. That is done by figuring out the first byte that LUTs to zero. The tricky part here is the following: ``` Find A, B arrays (uint8_t[16]) such that * `A[i] & B[j] == 0` if `LUT[i | (j <<4)] == 0` * `A[i] & B[j] != 0` if `LUT[i | (j <<4)] != 0` // Note we don't need any specific non-zero value for all i,j = 0..15. ``` To find `A` and `B` satisfying the above conditions a [Z3](https://github.com/Z3Prover/z3) library is used. The npm package that wrapps z3 for using in ts is not particularly friendly to the author of this change so another package (synckit) was required to handle the async API for z3-wrapper. Using llhttp as a benchmark framework this change draws the following improvemnts: ``` Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz http: "seanmonstar/httparse" (C) BEFORE: 8192.00 mb | 1456.72 mb/s | 2172811.81 ops/sec | 5.62 s AFTER: 8192.00 mb | 1752.90 mb/s | 2614577.82 ops/sec | 4.67 s ~20% improvement http: "nodejs/http-parser" (C) BEFORE: 8192.00 mb | 1050.60 mb/s | 2118535.14 ops/sec | 7.80 s AFTER: 8192.00 mb | 1167.42 mb/s | 2354101.76 ops/sec | 7.02 s ~11% improvement ``` For more header-fields-heavy messages numbers might be even more convincing.

The previous commit actually uses SSSE3 instruction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize a scan of non state-chaning bytes with SSSE3 instructions #58

Optimize a scan of non state-chaning bytes with SSSE3 instructions #58

Commits on Oct 10, 2023

Commits on Oct 11, 2023

Optimize a scan of non state-chaning bytes with SSSE3 instructions #58

Are you sure you want to change the base?

Optimize a scan of non state-chaning bytes with SSSE3 instructions #58

Commits on Oct 10, 2023

Commits on Oct 11, 2023