Skip to content

Latest commit

 

History

History
141 lines (115 loc) · 6.24 KB

README.md

File metadata and controls

141 lines (115 loc) · 6.24 KB

crcspeed

CRC be slow.

This make CRC be fast.

No original ideas, but original adaptations. Lots of shoulder standing.

This started out as a modified version of comment at http://stackoverflow.com/questions/20562546 then was made more extensible.

NOTE: You should not be using any CRC variant for new code anywhere. All new fast hashing code should use well-designed multi-platform simd-aware libraries like xxh3 and xxh128. Only use CRC in your code if you need to hack together adapters for existing poorly designed systems or if you find yourself time travelling back to the 1970s.

Feature

  • CRC processing in 8-byte steps for CRC-64 (Jones) and CRC-16 (CCITT).
  • Generates CRCs with overhead of 1.5 CPU cycles per byte
  • Little endian and big endian support
    • big endian support hasn't been tested yet (because qemu-system-sparc hates me).
  • Test suite generates comparison for: bit-by-bit calculation, byte-by-byte calcuation (Sarwate / lookup table), and 8-bytes-at-once calculation. Results are reported with resulting CRCs, throughput, and CPU cycles per byte comparisons.

Usage

  • Use little endian CRCs:
    • crc64speed_init();
    • crc64speed(old_crc, new_data, new_data_length);
    • crc16speed_init();
    • crc16speed(old_crc, new_data, new_data_length);
  • Use native architecture CRCs:
    • crc64speed_init_native();
    • crc64speed_native(old_crc, new_data, new_data_length);
    • crc16speed_init_native();
    • crc16speed_native(old_crc, new_data, new_data_length);
  • Use custom CRC64 variant:
    • crcspeed64native_init(crc_calculation_function, uint64_t populate[8][256]);
      • crc calculation function takes (0, data, data_len) and returns crc64 as uint64_t.
      • populate is a lookup table _init populates for future lookups.
    • crcspeed64native(populated_lookup_table, old_crc, new_data, new_data_length);
  • Use custom CRC16 parameters:
    • crcspeed16native_init(crc_calculation_function, uint16_t populate[8][256]);
      • crc calculation function takes (0, data, data_len) and returns crc16 as uint16_t.
    • crcspeed16native(populated_lookup_table, old_crc, new_data, new_data_length);

Additionally, there are specific functions for forcing little or big endian calculations: crcspeed64little_init(), crcspeed64little(), crc64big_init(), crcspeed64big(), crcspeed16little_init(), crcspeed16little(), crc16big_init(), crcspeed16big().

Architecture

  • crcspeed.c is a framework for bootstrapping a fast lookup table using an existing function used to return the CRC for byte values 0 to 255. Lookups then use fast lookup table to calculate CRCs 8 bytes per loop iteration.
  • crc64speed.c is a ready-to-use fast, self-contained CRC-64-Jones implementation.
  • crc16speed.c is a ready-to-use fast, self-contained CRC16-CCITT implementation.
  • when in a multithreaded environment, do not run initialization function(s) in parallel.
  • for fastest CRC calculations, you can force the entire CRC lookup table into CPU caches by running crc64speed_cache_table() or crc16speed_cache_table(). Those functions just iterate over the lookup table to bring everything into local caches out from main memory (or worse, paged out to disk).
  • The CRC-16 lookup table is 4 KB (8x256 16 bit entries = 8 * 256 * 2 bytes = 4096 bytes).
  • The CRC-64 lookup table is 16 KB (8x256 64 bit entires = 8 * 256 * 8 bytes = 16384 bytes).

Benchmark

The Makefile builds three test excutables:

  • crc64speed just returns check values for two input types across all three internal CRC process methods (bit-by-bit, byte-by-byte, 8-bytes-at-once).
  • crc16speed returns check values for the same data, except limited to CRC16 results.
  • crcspeed has two options:
    • no arguments: return check values for crc64 and crc16 at the same time.
    • one argument: filename of file to read into memory then run CRC tests against.
      • If CRC results do not match (for each CRC variant), the return value of crcspeed is 1, otherwise 0 on success.
> mkdir build
> cd build
> cmake ..
> make -j
[ 18%] Building C object CMakeFiles/crcspeed.dir/crc16speed.c.o
[ 27%] Building C object CMakeFiles/crcspeed.dir/crcspeed.c.o
[ 36%] Building C object CMakeFiles/crcspeed.dir/crc64speed.c.o
[ 54%] Building C object CMakeFiles/crc64speed.dir/crc64speed.c.o
[ 54%] Building C object CMakeFiles/crc64speed.dir/crcspeed.c.o
[ 54%] Building C object CMakeFiles/crc16speed.dir/crcspeed.c.o
[ 63%] Building C object CMakeFiles/crc16speed.dir/crc16speed.c.o
[ 72%] Building C object CMakeFiles/crcspeed.dir/main.c.o
[ 81%] Linking C executable crc64speed
[ 90%] Linking C executable crc16speed
[100%] Linking C executable crcspeed
[100%] Built target crc16speed
[100%] Built target crc64speed
[100%] Built target crcspeed

> ./crcspeed ~/Downloads/John\ Mayer\ -\ Live\ At\ Austin\ City\ Limits\ PBS\ -\ Full\ Concert-gcdUz12FkdQ.mp4
Comparing CRCs against 730.72 MB file...

crc64 (no table)
CRC = ee43263b0a2b6c60
7.142642 seconds at 102.30 MB/s (24.18 CPU cycles per byte)

crc64 (lookup table)
CRC = ee43263b0a2b6c60
1.777920 seconds at 411.00 MB/s (6.02 CPU cycles per byte)

crc64speed
CRC = ee43263b0a2b6c60
0.448819 seconds at 1628.09 MB/s (1.52 CPU cycles per byte)

crc16 (no table)
CRC = 000000000000490f
7.413062 seconds at 98.57 MB/s (25.10 CPU cycles per byte)

crc16 (lookup table)
CRC = 000000000000490f
1.951917 seconds at 374.36 MB/s (6.61 CPU cycles per byte)

crc16speed
CRC = 000000000000490f
0.441418 seconds at 1655.38 MB/s (1.49 CPU cycles per byte)

License

All work here is released under BSD or Apache 2.0 License or equivalent.

Thanks

Thanks to Mark Adler for providing a readable implementation of slicing-by-8 in a stackoverflow comment.

Thanks for pycrc for saving me another month figuring out how to write CRC-64-Jones by hand.

Thanks to A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS for providing so many details it was clear I should give up and not try to re-create everything myself.