This release adds support for NEON SIMD CPU instructions and improves performance on x86_64.
-
Added support for NEON SIMD instructions on ARM64 CPUs (v8 or newer).
-
Improved performance of spherical harmonic analysis and synthesis of point data values (up to ~20 %, depending on the processor). In spherical harmonic analysis, expensive horizontal sums of SIMD vectors (
SUM_R
) were reduced (~10 % improvement). In SIMD macros computing Legendre functions, some unnecessary blends were removed by suitable initializations (~20 % improvements in analysis and synthesis). -
The internal parameter
SIMD_BLOCK
was split toSIMD_BLOCK_A
andSIMD_BLOCK_S
that are used with spherical harmonic analysis and synthesis, respectively. After the improvements from the previous bullet point, the optimal value ofSIMD_BLOCK_A
seem to be about twice that ofSIMD_BLOCK_S
. This further improves the performance of spherical harmonic analysis by about 10 %. -
Added tests of
MASK_TRUE_ALL
,MASK_TRUE_ANY
,SUM_R
andBLEND_R
macros.