- Critical memory error for some combinations of source/destinatnion sizes is fixed.
- A lot of optimizations in resampling including 16-bit intermediate color representation and heavy unrolling.
- Maintenance release
- Fixed error in RGBa -> RGBA convertion
- SSE4 and AVX2 fixed-point full loading implementation. Up to 4.6x faster.
- SSE4 and AVX2 fixed-point full loading horizontal pass.
- SSE4 and AVX2 fixed-point full loading vertical pass.
- RGBA -> RGBa SSE4 and AVX2 fixed-point full loading implementations. Up to 2.6x faster.
- RGBa -> RGBA AVX2 implementation using gather instructions. Up to 5x faster.
- SSE4 and AVX2 float full loading horizontal pass.
- SSE4 float full loading vertical pass.
- SSE4 and AVX2 float full loading horizontal pass.
- SSE4 float per-pixel loading vertical pass.
- SSE4 and AVX2 float per-pixel loading horizontal pass.
- SSE4 float per-pixel loading vertical pass.
- SSE4: Up to 2x for downscaling. Up to 3.5x for upscaling.
- AVX2: Up to 2.7x for downscaling. Up to 3.5x for upscaling.
- Simple SSE4 fixed-point implementations with per-pixel loading.
- Up to 2.1x faster.