Skip to content

CHarm 0.4.2

Latest
Compare
Choose a tag to compare
@blazej-bucha blazej-bucha released this 05 Aug 12:56
v0.4.2
8b4e1d2

This release adds support for NEON SIMD CPU instructions and improves performance on x86_64.

  • Added support for NEON SIMD instructions on ARM64 CPUs (v8 or newer).

  • Improved performance of spherical harmonic analysis and synthesis of point data values (up to ~20 %, depending on the processor). In spherical harmonic analysis, expensive horizontal sums of SIMD vectors (SUM_R) were reduced (~10 % improvement). In SIMD macros computing Legendre functions, some unnecessary blends were removed by suitable initializations (~20 % improvements in analysis and synthesis).

  • The internal parameter SIMD_BLOCK was split to SIMD_BLOCK_A and SIMD_BLOCK_S that are used with spherical harmonic analysis and synthesis, respectively. After the improvements from the previous bullet point, the optimal value of SIMD_BLOCK_A seem to be about twice that of SIMD_BLOCK_S. This further improves the performance of spherical harmonic analysis by about 10 %.

  • Added tests of MASK_TRUE_ALL, MASK_TRUE_ANY, SUM_R and BLEND_R macros.