The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Optimize (with AVX) the processing of contiguous int16 arrays. ~2.3x speedup compared to 0.3.0
- Distribute source
- Add support for ARM (without NEON optimizations for now) on Linux and macOS
- Update supported numpy version range to >=1.21,<2
- Add support for AVX512. It will only be used if the CPU reports that it supports it.
- Compile builds for linux with clang instead of gcc, as this seems to yield tiny performance improvements
- Add support for Python 3.12
- Significantly speed up the processing of 1-dimensional strided arrays
- Slightly speed up the processing of ndarrays with at least 16 items
- Slightly speed up the processing of 2D arrays
- Speed up the processing of arrays with ndim > 2
- Speed up the processing of F-contiguous ndarrays
Initial release