Releases: blazej-bucha/charm
CHarm 0.4.4
This release adds support for SSE4.1 and fixes likely wrong compilation of CHarm by gcc (12.2.0) on Debian 12 which may occur when -O3
and -ffast-math
are enabled and all SIMD instruction sets are disabled.
-
Added support for SSE4.1.
-
On Debian 12, gcc (12.2.0) seems to incorrectly compile
src/shs/shs_point_kernel.c
if the-O3
and-ffast-math
flags are enabled and all SIMD instructions are disabled. Although-ffast-math
does not guarantee full IEEE compliance, likely causing slight increase in numerical errors, this time the code is indeed mostly likely compiled incorrectly as revealed bymake check
. The code itself seems to be correct. This was confirmed by successfully compiling and testing CHarm with-O3
and-ffast-math
and no SIMD instructions on Manjaro with gcc 14.2.1 and also on Debian with clang 14.0.6. This is why one should always run the test suite bymake check
before installing and using the library.The issue was introduced in v0.4.3, which required some modifications to
shs_point_kernel.c
. The published PyHarm wheels for v0.4.3 are not affected by this issue, as they rely on AVX and alsomake check
during the build on GitHub passed all tests.
CHarm 0.4.3
This release adds support for MPI parallelization on distributed-memory systems and fixes one bug. Added is support for Python 3.13 while support for Python 3.9 and macOS 10.x is dropped. The API is backward compatible except for one minor change (see below).
-
Added support for MPI parallelization.
Spherical harmonic synthesis/analysis of point data can now also be conducted on systems with distributed memory. These could be, for instance, nodes of high-performance computing clusters or a few ordinary PCs interconnected via some network protocol (e.g., SSH). MPI makes it possible to distribute spherical harmonic coefficients and the signal to be analyzed/synthesized among a number of independent shared-memory systems that can communicate together. As a result, it is now fairly possible to perform spherical harmonic transforms up to degrees as high as 100,000 and beyond.
Full documentation and cookbook-style examples are available at https://www.charmlib.org.
-
Added module
mpi
. -
Added routines:
charm_crd_point_gl_shape
,charm_crd_point_dh1_shape
,charm_crd_point_dh2_shape
,charm_misc_buildopt_simd_vector_size
,charm_misc_buildopt_mpi
.
-
Added members to structures:
charm_shc
,charm_point
,charm_err
.
-
Bug fix in
pyharm.shc.Shc.set_coeffs()
by @avocakerok (#1). -
Added support for Python 3.13, dropped support for Python 3.9 and macOS 10.x.
-
Added support for the
gravity_constant
keyword that appears ingfc
models of celestial bodies. -
Improved test suite.
-
Docs improvements.
-
Renamed
charm_err.issaturated
tocharm_err.saturated
.
CHarm 0.4.2
This release adds support for NEON SIMD CPU instructions and improves performance on x86_64.
-
Added support for NEON SIMD instructions on ARM64 CPUs (v8 or newer).
-
Improved performance of spherical harmonic analysis and synthesis of point data values (up to ~20 %, depending on the processor). In spherical harmonic analysis, expensive horizontal sums of SIMD vectors (
SUM_R
) were reduced (~10 % improvement). In SIMD macros computing Legendre functions, some unnecessary blends were removed by suitable initializations (~20 % improvements in analysis and synthesis). -
The internal parameter
SIMD_BLOCK
was split toSIMD_BLOCK_A
andSIMD_BLOCK_S
that are used with spherical harmonic analysis and synthesis, respectively. After the improvements from the previous bullet point, the optimal value ofSIMD_BLOCK_A
seem to be about twice that ofSIMD_BLOCK_S
. This further improves the performance of spherical harmonic analysis by about 10 %. -
Added tests of
MASK_TRUE_ALL
,MASK_TRUE_ANY
,SUM_R
andBLEND_R
macros.
CHarm 0.4.1
This is a maintenance release enabling the installation of PyHarm using pip
. A few minor bugs were additionally fixed and some minor improvements were applied.
-
On Linux (x86_64), macOS (x86_64, ARM64) and Windows (x86_64), PyHarm can now be installed using
pip install pyharm
. -
In PyHarm, the
pathname
input parameter to theto_file
method of theShc
class no longer requires to specify./
when saving to the current working directory.pathname
may now contain only the file name, implying the coefficients should be saved to the current working directory. -
Fixed outside-of-bounds reads bug in the spherical harmonic synthesis at Driscoll--Healy grids. The outputs of the synthesis were correct but various errors might occur rarely (e.g., segmentation fault).
-
Removed variable-length arrays from one specific internal routine of CHarm. CHarm is now completely free from variable-length arrays.
-
Functions returning NaN now use the
NAN
macro frommath.h
. Previously,0.0 / 0.0
was used to get NaN. Now this is only a fallback if theNAN
macro is not found inmath.h
. -
The internal parameter
SIMD_BLOCK
is now set to4
for all kinds of AVX instruction sets. This value seems to perform the best overall. -
Changed PyHarm building process. The C-part of PyHarm is now compiled from within Python as an extension module.
-
Lib names no longer have the
_omp
suffix, even if CHarm is compiled with OpenMP support. Depending on the precision, the names arelibcharmf
,libcharm
andlibcharmq
. This affects the way how CHarm is linked. -
Various documentation improvements.
CHarm v0.4.0
This release adds new functions to synthesize the full first- and second-order gradients in the local north-oriented reference frame (LNOF) at evaluation points. The API is backward compatible except for a minor change in the misc
module.
-
Added functions to compute point values of the full first- and second-order gradients in LNOF and of the first- and second-order derivatives with respect to the spherical coordinates.
In CHarm, the new functions are placed in the
shs
module:charm_shs_point_grad1
(the full first-order gradient in LNOF),charm_shs_point_grad2
(the full second-order gradient in LNOF),charm_shs_point_guru
(guru interface for first- and second-order derivatives with respect to the spherical coordinates).
In PyHarm, the new functions are:
pyharm.shs.point_grad1
,pyharm.shs.point_grad2
,pyharm.shs.point_guru
.
Example codes are provided in the cookbook.
-
Added members
npoint
andncell
to thecharm_point
andcharm_cell
structures, respectively. The new members represent the total number of points/cells stored by the structures both for scattered points/cells and grids. They are particularly useful when allocating the memory for the signal to be synthesized at the points/cells represented by the structures. -
Values
CHARM_SHC_NMAX_MODEL
CHARM_SHC_NMAX_ERROR
are now symbolic constants instead of enumerations. This is because ISO C restricts enumerator values to the range of
int
, but we need the maximum values ofunsigned long
. The latter thus do not fall within the range that can be portably stored by enumerations. -
Renamed
charm_misc_print_version
to
charm_misc_print_info
which better describes its purpose. Now it also prints various kind of useful user-defined compilation flags (
CFLAGS
,CPPFLAGS
, etc). -
Added functions
charm_misc_get_version
to return a string specifying the CHarm version number determined on compilation time. -
Added function
charm_misc_buildopt_version_fftw
that returns a string specifying the FFTW version that was used to compile CHarm. -
Changed value of the internal parameter
SIMD_BLOCK
from8
to2
. On some recent processors, this may improve the performance up to 40 %, while no significant deteriorations was encountered on older CPUs. In the future, it would be nice to tune this parameter for the host's CPU during compilation.
CHarm v0.3.1
This release adds new functionalities to read/write spherical harmonic coefficients and modifies the API of the read/write routines.
-
Added support to read the ICGEM's time variable gravity field models. Supported are both the
icgem1.0
andicgem2.0
formats. -
Added support to get the maximum harmonic degree of coefficients from data files without the need to initialize a
charm_shc
structure. In CHarm, this can be done using the functions to read spherical harmonic coefficients from data files. In PyHarm, a new methodnmax_from_file
was added to theShc
class to this end. -
Added support to read and write spherical harmonic coefficients in the
dov
text format (degree, order, value). -
PyHarm functions to read spherical harmonic coefficients
pyharm.shc.Shc.from_file_gfc
,pyharm.shc.Shc.from_file_tbl
,pyharm.shc.Shc.from_file_bin
,pyharm.shc.Shc.from_file_mtx
were merged into a single function
pyharm.shc.Shc.from_file
.
PyHarm functions to write spherical harmonic coefficients
pyharm.shc.Shc.to_file_tbl
,pyharm.shc.Shc.to_file_bin
,pyharm.shc.Shc.to_file_mtx
were merged into a single function
pyharm.shc.Shc.to_file
.
-
Renamed symbolic constants
CHARM_SHC_WRITE_TBL_N
,CHARM_SHC_WRITE_TBL_M
,
to
CHARM_SHC_WRITE_N
,CHARM_SHC_WRITE_M
,
so that both can be used with the
tbl
anddov
formats.
CHarm v0.3.0
This release significantly improves performance of point synthesis and analysis, fixes a few bugs and makes a few modifications of the API.
-
Fixed memory leak in
charm_integ_pn1m1pn2m2
. -
Renamed symbolic constants
CHARM_LEG_PNMJ_ORDER_MNJ
,CHARM_LEG_PNMJ_ORDER_MJN
to
CHARM_LEG_PMNJ
,CHARM_LEG_PMJN
,
respectively.
-
Removed the
xnum
module from the public API. It offered only little added value, as all its functions are easy to implement. This makes the API cleaner, especially in long terms, given that much more interesting modules are yet to come. -
The function
charm_leg_pnmj_length
was removed from the API. The number of coefficients stored in thecharm_pnmj
structure is already provided in itsnpnmj
member. -
Added functions to check various features of CHarm that are determined during the compilation time:
charm_misc_buildopt_precision
,charm_misc_buildopt_omp_charm
,charm_misc_buildopt_omp_fftw
,charm_misc_buildopt_simd
,charm_misc_buildopt_isfinite
.
-
Switched to
sphinx_book_theme
for html docs. -
Added support for polar optimization to increase computational speed (Eqs. 7 and 8 of Reinecke and Seljebotn 2013). As long as the polar optimization parameters (see the
glob
module) are chosen appropriately, this technique improves the performance and still keeps an excellent accuracy. However, unwisely chosen tuning parameters may negatively affect the output accuracy.By default, the polar optimization is turned off.
References:
- Reinecke, M., Seljebotn, D. S. (2013) Libsharp -- spherical harmonic transforms revisited. Astronomy and Astrophysics 554, A112, doi: 10.1051/0004-6361/201321494.
-
Improved caching for point synthesis and analysis by the blocking technique. No effort has yet been made to apply the cache blocking also for cell analysis and synthesis, which therefore offer a sub-optimal implementation. This might change in the future if the performance of cell transforms becomes an issue or a limitation.
-
Added automatic dynamical switching to the computation of Legendre functions (Fukushima, 2016).
References:
- Fukushima, T. (2016) Numerical computation of point values, derivatives, and integrals of associated Legendre function of the first kind and point values and derivatives of oblate spheroidal harmonics of the second kind of high degree and order. In: Rizos, C., Willis, P. (eds): IAG 150 Years: Proceedings of the 2013 IAG Scientific Assembly, Potsdam, Germany, 1--6 September, 2013, 143:193--197. https://doi.org/10.1007/1345_2015_124
-
Added dynamic scheduling to parallel for loops in shs. This slightly improves the performance, because the computation time of Legendre functions varies across the meridian (mainly due to X-numbers, dynamical switching and, if applied, polar optimization). In sha, the dynamic scheduling has been in use right from the beginning.
-
Introduced
NEG_R
macro to properly switch the sign of SIMD vectors. -
Improved test suite. The functions are now smaller and cleaner. Some new tests were also added, mostly to check various custom allocation functions.
-
New benchmarks are available at https://www.charmlib.org/build/html/benchmarks.html.
CHarm 0.2.0
CHarm now has a Python wrapper called PyHarm. This step necessitated several modifications of the CHarm API, some of which are not backward compatible. The changes are listed below.
-
The
charm_crd
structure was replaced by new structures:charm_point
andcharm_cell
,
which distinguish between evaluation points and evaluation cells.
-
Functions:
charm_shc_init
,charm_crd_init
,charm_leg_pnmj_init
were replaced by
charm_shc_calloc
,charm_crd_calloc
,charm_leg_pnmj_calloc
.
The new functions behave in the same fashion as the old ones, including their interface.
The following functions have been added:
charm_shc_malloc
,charm_crd_malloc
,charm_leg_pnmj_malloc
,
which behave similarly as their
*_calloc
counterparts, but provide uninitialized memory.Functions:
charm_shc_init
,charm_crd_init
have now a different meaning. They can be used to create the respective structures from data arrays provided by the user.
-
Changed API of the following functions to read/write spherical harmonic coefficients from/to files:
charm_shc_read_bin
,charm_shc_read_gfc
,charm_shc_read_mtx
,charm_shc_read_tbl
,charm_shc_write_bin
,charm_shc_write_mtx
,charm_shc_write_tbl
.
Previously, it was necessary to open the stream for the input/output file, call the CHarm function to read/write the coefficients and, finally, close the stream. All these steps had to be done by the user. Now, the user only calls the function to read/write the coefficients and specifies the file's path name as one of the input parameters. Opening and closing the stream is done by CHarm.
This should considerably simplify making the Python wrapper, given than wrapping
TYPE *
pointer withctypes
is not that trivial. -
Symbolic constants
CHARM_CRD_CELLS_*
andCHARM_CRD_POINTS_*
were renamed toCHARM_CRD_CELL_*
andCHARM_CRD_POINT_*
. -
The
nmj_order
member ofcharm_pnmj
was renamed toordering
. -
Bug fix for
integ_pn1m1pn2m2
. The routine now works correctly with both ordering schemes of Fourier coefficients of Legendre functions. -
Bug fix for
shc_read_bin
. The function no longer throws an error when reading coefficients up to a maximum degree that is lower than the one in the binary file. -
Fix a few leg_pnmj tests that did not actually run.
-
Fixed use-after-free bug in degree amplitude tests.
-
Two members,
nc
andns
, were added to thecharm_shc
structure to represent the total number of spherical harmonic coefficientsCnm
andSnm
, respectively. This does not break the previous API. -
Added support for Fortran's
D
andd
decimal exponents when reading from text files. -
Compiling CHarm as a shared library is no longer disabled by default.
-
The documentation can now be build with
make html
instead ofmake docs
.