Skip to content

Fast-CRC is a C library providing efficient, application-selectable CRC algorithms

License

Notifications You must be signed in to change notification settings

leeson-consulting/fast-crc

Repository files navigation

Fast-CRC: Off the shelf and on your way

Fast-CRC provides tooling for generating efficient Cyclic Redundancy Check (CRC) algorithms from pick and choose templates. All algorithms are C99 compliant and tested, which allows for easy integration with your C or C++ source.

Features:

  • Every algorithm up to CRC-64 from Greg Cook’s "CRC Catalog"

  • Choice of 4-bit and 8-bit polynomial table-kernels

  • Unit tested against published check values

  • Dr. Nguyen’s HD4 and HD6 "Fast CRC" algorithms

  • Hamming profile database for every polynomial listed in Prof. Koopman’s "CRC Zoo"

Setup:

  • Get a copy of the Fast-CRC source.

  • Copy and modify one of the crc_algorithm.inc templates to suit your application. There’s a few examples under the ./templates folder to get you started.

  • Ensure the Fast-CRC folder and modified crc_algorithm.inc are both on the compiler’s header include path.

Example:

cd ./examples/hello_crc/

cmake -G "Unix Makefiles" -B build

cmake --build ./build/ && ./build/hello_crc

CRC-16/CCITT "Hello, CRC!" == 0x991c
./examples/hello_crc/crc_algorithms.inc
#include "crc_algorithms/standard/crc16_ccitt.inc"
./examples/hello_crc/hello_crc.c
#include "crc.h"

#include <stdio.h>
#include <string.h>

static char const HELLO_CRC[] = "Hello, CRC!";

int main(void)
{
  uint32_t const crc = crc16_CCITT((uint8_t const *) HELLO_CRC, strlen(HELLO_CRC));

  printf("\n\nCRC-16/CCITT \"%s\" == 0x%04x\n\n", HELLO_CRC, crc);

  return 0;
}

Interfaces:

Fast-CRC interfaces support two use cases:

  1. Single-Pass Calculation

  2. Multi-Pass Calculation

The Single-Pass interfaces calculate the CRC across the provided data. The Multi-Pass interfaces _start, _continue, or _finish a CRC calculation incrementally as data becomes available.

These interfaces are standardised across four functions per algorithm, eg.

crc32_Fast6 Interfaces
uint32_t crc32_Fast6_start(uint8_t const *data, size_t const data_len);
uint32_t crc32_Fast6_continue(uint32_t const crc, uint8_t const *data, size_t const data_len);
uint32_t crc32_Fast6_finish(uint32_t const crc, uint8_t const *data, size_t const data_len);
uint32_t crc32_Fast6(uint8_t const *data, size_t const data_len);

Single-Pass interfaces would typically be used to compute CRCs across a large blocks of readily accessible data, such as FLASH storage. Multi-Pass interfaces are best used for communication protocols or for calculating CRCs across large blocks in a cooperative multitasking / task-loop scenario. See ./examples/multipass_interface for usage.

Asynchronous interfaces are not currently supported, but could be implemented to support things like DMA driven CRC peripherals. See ./documentation/road_map.adoc.

Standard Table Algorithms:

Most of the algorithms bundled with Fast-CRC use polynomial table-kernels to speed up processing. These should be familiar to seasoned engineers — they’re what you’d write yourself but standardised and tested.

Table based algorithms process data either 8-bits at a time or 4-bits at a time. The 8-bit table-kernels are around twice as fast as the 4-bit table-kernels, but use significantly more memory.

For example a CRC-32 8-bit table-kernel will use 1 kB of memory, but the equivalent 4-bit table-kernel only uses 64 bytes!

This speed/size trade-off is of most interest in embedded applications, which don’t typically have memory to burn.

The choice of kernel is controlled by USE_CRC_KERNEL_TABLE8, which can be optionally defined for each applications’s crc_algorithms.inc template.

In the unlikely case that two algorithms use the same polynomial under different kernels, the 8-bit kernel wins. In other words both benefit from the speed of the 8-bit table with out needless duplication.

.examples/kernel_selection/crc_algorithms.inc
#define USE_CRC_KERNEL_TABLE8

// These algorithms use 8-bit kernels

#include "crc_algorithms/standard/crc16_ccitt.inc"

#undef USE_CRC_KERNEL_TABLE8

// These algorithms use 4-bit kernels ...

#include "crc_algorithms/standard/crc32_iso_hdlc.inc"

// ... except for this one,
// which implicitly uses the same 8-bit kernel-table
// that was brought in with crc16_ccitt above

#include "crc_algorithms/standard/crc16_ibm_sdlc.inc"

Sharded Table Algorithms:

Some CRC polynomial tables can be stored in types that are narrower than the CRC polynomial degree. Koopman identifies a few HD6 "Sub8" polynomials here, https://users.ece.cmu.edu/~koopman/crc/6sub8.html

Fast-CRC provides Sharded Tables for Dr. Nyugen’s HD4 Fast Polynomials. These tables allow a single table to masquerade as a whole family of wider CRCs, which provides a significant space saving over standard tables.

Sharded tables should be considered experimental, but interested parties can review ./tools/make_crc_tables and ./test/sharded/t{4,8}

Dr. Nyugen’s Fast CRC Algorithms:

Fast-CRC provides the first tested library of Dr. Nyugen’s "Fast CRC" algorithms. These CRC algorithms offer the best speed-size trade off available for pure software implementations. In some instances they can outperform the large table based implementations.

The Fast algorithms as cross-checked against table based implementations to ensure consistency. However, since this is the first library to implement them there are no independent checks of correctness.

Custom CRC Algorithms

Fast-CRC uses the C preprocessor under the hood to take an algorithm template and use it to compose functions and polynomial lookup tables. There’s a wide selection of algorithms under ./crc_algorithms, but you can also create your own algorithms by copying and modifying a template to suit.

This is a straight-forward process, but one which has a lot of finicky steps.

Let’s take a look at an example:

./crc_algorithms/crc16_riello.inc
#if crc_algorithms_inc == INCLUDE_INTERFACE

#include "crc_algorithms/interface.h"

make_crc16_interface(
  /* .name     =  */    RIELLO,
  /* .poly     =  */    Rx1021,
  /* .init     =  */    0xb2aa,
  /* .refin    =  */    true,
  /* .refout   =  */    true,
  /* .xorout   =  */    0x0000,
  /* .check    =  */    0x63d0,
  /* .residue  =  */    0x0000)

#elif crc_algorithms_inc == INCLUDE_IMPLEMENTATION

#include "crc_tables/standard/crc16_Rx1021.h"

make_crc16_implementation(
  /* .name     =  */    RIELLO,
  /* .poly     =  */    Rx1021,
  /* .init     =  */    0x554d,
  /* .xorout   =  */    0x0000)

#endif

Creating a Custom CRC Algorithm Template

The algorithm template calls two macros that are invoked at different phases of construction.

During the INCLUDE_INTERFACE phase, make_crc16_interface() generates the function protoypes for the CRC-16/RIELLO interfaces. It is one of a family of make_crcXX_interface() macros ranging from crc3 to crc64.

The parameters passed to the macro match those popularised by Ross Williams in his 1993 paper "A Painless Guide to CRC Error Detection Algorithms". A copy of the paper is included in ./documentation/crc_v3.txt under license.

The name field is used to distinguish your algorithm from other algorithms. You can choose any unique name you like, but it must be useable as a C identifier.

The poly(nomial) field has a special encoding. It represents the polynomial in "Normal Notation" with a single letter prefix. "Fx" polynomials refer to lookup tables encoded with Non-Reflected (Forward) polynomials. "Rx" polynomials refer to lookup tables encoded with Reflected (Reverse) polynomials.

The kernels treat these differently. The Reverse polynomials need one less shift operation than the equivalent Forward polynomials. This means that Reversed Polynomial algorithms are slightly faster than Forward algorithms.

The init value is used to initialise the CRC. It is not used in interface construction per se, but is used as a means to document the interface.

The refin and refout values exist only for documentation purposes. They relate to the need to reflect data coming into the algorithm and CRC values coming out.

When refin is set to false, the algorithm must use the Fx Forward Polynomial already discussed. Conversely when refin is set to true, the algorithm must use the Rx Reversed Polynomial.

Algorithms should set refin and refout to the same value, ie. both true or both false.

However, if you do chose to set refin and refout to different values you can still use Fast-CRC, but some manual coding will be required to reflect the CRC out. See ./crc_algorithms/standard/crc12_umts.inc for an example.

The xorout field is used to xor the final CRC value prior to return. This should generally be set to zero, since the compiler will optimize this away in the kernel. You can, however, choose any value you wish.

The check field is the CRC value returned after processing the data "123456789". This is published for all the cataloged algorithms and used by unit tests to verify operation.

The residue field is the result of running the algorithm across the complete codeword, which consists of the dataword concatenated with the crc value. This is only used for documentary purposes and is not currently used by the kernel constructors.

During the INCLUDE_IMPLEMENTATION phase, the desired CRC polynomial lookup table is first included from ./crc_tables.

This is followed by a call to the make_crc16_implementation() macro, which is a one-for-one counterpart to the corresponding interface macro.

Since the generated implementation needs to exactly match the interface, care must be taken to call the macro of the same CRC degree (width) with a subset of the same values used in the interface.

The one exception is the init value. This must be bit-reflected for reflected polynomials when calling the make_crcXX_implementation() macro. Often this reflection will produce the same value, eg. 0x000, 0xffff, which can cause some confusion. Neverthless, for reflected polynomials the init value must be passed in bit-reflected.

Creating a Custom CRC Interface Header

A manually constructed interface header should be provided for custom algorithms. The CRC algorithms and their respective interfaces bundled with Fast-CRC are the output of code generators under ./tools.

Have a look at these generators to see how things work, but in practice it’s easiest to copy and modify one of the interface blocks for an existing algorithm in ./crc.h when creating your custom header.

Including a Customised CRC Algorithm

After completing the custom_crc.inc and custom_crc.h files, you simply need to add #include "custom_crc.inc" to your crc_algorithms.inc template and #include "custom_crc.h" to your application code to use the new algorithm.

Polynomial Survey Data

Prof. Koopman was kind enough to share a large amount of data with the project. Some of this data has been mangled into a Javascript database.

It’s well known that the standard CRC algorithms do not provide the best bang for buck.

For instance consider the CRC-32 0x04c11db7 polynomial used by CRC-32/ISO_HDLC. This shows that the algorithm does provide HD6 protection, but only for the first 33 bytes of data:

CRC-32 0x04c11db7 Profile Data
{
  "id" : {
    "polynomial" : "x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x^1 + 1",
    "degree"     : 32,
    "explicit"   : "0x104c11db7",
    "koopman"    : "0x82608edb",
    "normal"     : "0x4c11db7"
  },
  "hd" :     [null, null, null,
    /* 3 */ { "bits"    : 4294967263, "bytes"   : 536870907 },
    /* 4 */ { "bits"    : 91607, "bytes"   : 11450 },
    /* 5 */ { "bits"    : 2974, "bytes"   : 371 },
    /* 6 */ { "bits"    : 268, "bytes"   : 33 },
    /* 7 */ { "bits"    : 171, "bytes"   : 21 },
    /* 8 */ { "bits"    : 91, "bytes"   : 11 },
    /* 9 */ { "bits"    : 57, "bytes"   : 7 },
    /* 10 */ { "bits"    : 34, "bytes"   : 4 },
    /* 11 */ { "bits"    : 21, "bytes"   : 2 },
    /* 12 */ { "bits"    : 12, "bytes"   : 1 },
    /* 13 */ { "bits"    : 10, "bytes"   : 1 },
    /* 14 */ { "bits"    : 10, "bytes"   : 1 },
    /* 15 */ { "bits"    : 10, "bytes"   : 1 }
  ],
}

Compare this, with CRC-32 0x32c00699, which provides HD6 for up to 4092 bytes of data, ie. this polynomial provides HD6 protection over 100 times greater range than the polynomial used by the standard algorithm

Simarly, CRC-24 0x65b provides HD6 protection for up to 59 bytes of data but with with one byte less overhead.

Orthodox Engineering Practice

The Polynomial Survey Data shows that there is considerable room for improvement in the selection of CRC polynomials for new applications.

Follow Polynomial Design Guides

The best practices for polynomial selection has been clearly described by Prof. Koopman et al.

These start with questions of constraint. Can you make a change to the design to improve error resilience?

If the design is open to change, what do you know and what do you care about?

It’s all very well saying that a polynomial provides HD6 performance at the design length. But if that polynomial comes with an additional 10% overhead do you care? Is the channel or media modelled well enough to accept an HD4 polynomial instead?

If there’s still a rationale for change, then choose the lowest degree polynomial that fits both constraints and the known (or assumed) error characteristics of the channel or media.

Consider Black Swans

In my experience it’s not popular to raise "what if" questions to the engineering team. Being the sole objector to "…​ a simple solution that already works …​" can be isolating.

But my experience has also taught me that Black Swan Events are far from uncommon.

Shot noise conducted or radiated from power electronics and switchgear can radically affect communications — even if for a short time.

Passive thermal cycling, changes in humidity, dirty connectors, non-static mechanical loads, etc. can all reduce SNR, which can present as both bit desynchronisation and corruptions.

Off-by-one errors or other weak coding can systematically change or augment large quantities of buffered data, which presents as overwhelming levels of corruption to any CRC.

When (not if) data corruptions hit the proverbial fan, how does your design cope? How Complex Systems Fail says that complex systems are always running under degraded conditions. Whether the degradation is sufficient to cause failure is difficult to determine because the system is complex. Nevertheless, every effort should be made to mitigate degradation.

Mitigating against high BERs or systemic corruption starts with CRC selection.

CRC performance has been analysed at BERs around 50% and towards 100%. The ideal CRC polynomial degrades to HD2 performance at any level of corruption at any data length. These polynomials are said to be "proper".

Some polynomials are proper at BERs around 50% as shown in Koopman’s CRC Zoo. But no known polynomials are proper at every length and every level of data corruption.

Given the choice of polynomials that satisfy HD requirements and other constraints, choose the least improper polynomial.

However, since the data is scant in this area, a more orthodox approach is to assume some errors will escape detection by CRC alone. These errors are not necessarily undetectable. Simple measures like sequence numbers and other protocol double-checks will weed out some errors.

At the end of the day you can’t guarantee that data is incorrupt, but you may be able to put a bounds on the corruption. Always check that your application is fit for purpose in terms of those bounds.

Recommendations

Select a CRC polynomial in the following order:

  1. The highest HD for the lengths of interest,

  2. The highest secondary HDs at shorter lengths,

  3. The highest degree given engineering constraints,

  4. The least improper at lengths of interest at errors approaching 50% corruption.

For the sky is falling (> 50% corruption) scenario, the best practical advice is use a higher degree polynomial. The extra bits in the CRC check value might help ward off disaster even if it’s no better than a good hash of the same width.

Beyond that you must address Black Swans in your protocol. If detected error rate creeps up above design criteria, it might be best to apply radio silence for a while. Resumption can be driven by timers, side-channel coordination, intrinsic measurements (RSSI etc.).

Other adaptations to protocol can also be applied to continue operation under adverse conditions, but I’ll address these in another project.

A useful (but strictly untrue) analogy is just as "you can’t beat cubic inches", "you can’t beat CRC degree". Do not prematurely optimise CRC degree. Look at the additional overhead. If the additional transmission / storage overhead is less than 1%, just use the higher degree CRC.

Fast-CRC eliminates concerns about implementation overhead in embedded targets. It offers the choice of both T4 and Sharded tables with the possibility of Fast CRC algorithms. It’s up to you to make the best choices for your application.

Acknowledgements:

Fast-CRC at it’s heart is an effort to streamline and standardise CRC libraries for engineers. However, the really hard work has been done by others. I mention them here to acknowledge their work and express my gratitude for what they have shared.

Fast-CRC makes extensive use of Philip Koopman’s CRC data. This data is used under terms of Creative Commons 4.0 Attribution International License.

Greg Cook’s CRC Catalog has proved an invaluable resource for disambiguating the often subtle differences between "standard" CRC algorithms.

Thomas Pircher’s pycrc generates the CRC tables used by Fast-CRC. pycrc is an excellent and simple to use product that can generate standalone C implementations at the drop of a hat. pycrc is used under terms of the MIT license.

Finally, Dr. Gam Nguyen’s research into fast algorithms for calculating certain classes of CRC polynomials has been eye-opening. These provide practical software based alternatives to lookup table for HD4 CRC calculations and should be considered for adoption into communication protocols targeting resource constrained platforms.