Skip to content

Releases: blazej-bucha/charm

CHarm 0.4.4

12 Feb 10:03
v0.4.4
d082c52
Compare
Choose a tag to compare

This release adds support for SSE4.1 and fixes likely wrong compilation of CHarm by gcc (12.2.0) on Debian 12 which may occur when -O3 and -ffast-math are enabled and all SIMD instruction sets are disabled.

  • Added support for SSE4.1.

  • On Debian 12, gcc (12.2.0) seems to incorrectly compile src/shs/shs_point_kernel.c if the -O3 and -ffast-math flags are enabled and all SIMD instructions are disabled. Although -ffast-math does not guarantee full IEEE compliance, likely causing slight increase in numerical errors, this time the code is indeed mostly likely compiled incorrectly as revealed by make check. The code itself seems to be correct. This was confirmed by successfully compiling and testing CHarm with -O3 and -ffast-math and no SIMD instructions on Manjaro with gcc 14.2.1 and also on Debian with clang 14.0.6. This is why one should always run the test suite by make check before installing and using the library.

    The issue was introduced in v0.4.3, which required some modifications to shs_point_kernel.c. The published PyHarm wheels for v0.4.3 are not affected by this issue, as they rely on AVX and also make check during the build on GitHub passed all tests.

CHarm 0.4.3

03 Dec 07:34
v0.4.3
587af1f
Compare
Choose a tag to compare

This release adds support for MPI parallelization on distributed-memory systems and fixes one bug. Added is support for Python 3.13 while support for Python 3.9 and macOS 10.x is dropped. The API is backward compatible except for one minor change (see below).

  • Added support for MPI parallelization.

    Spherical harmonic synthesis/analysis of point data can now also be conducted on systems with distributed memory. These could be, for instance, nodes of high-performance computing clusters or a few ordinary PCs interconnected via some network protocol (e.g., SSH). MPI makes it possible to distribute spherical harmonic coefficients and the signal to be analyzed/synthesized among a number of independent shared-memory systems that can communicate together. As a result, it is now fairly possible to perform spherical harmonic transforms up to degrees as high as 100,000 and beyond.

    Full documentation and cookbook-style examples are available at https://www.charmlib.org.

  • Added module mpi.

  • Added routines:

    • charm_crd_point_gl_shape,
    • charm_crd_point_dh1_shape,
    • charm_crd_point_dh2_shape,
    • charm_misc_buildopt_simd_vector_size,
    • charm_misc_buildopt_mpi.
  • Added members to structures:

    • charm_shc,
    • charm_point,
    • charm_err.
  • Bug fix in pyharm.shc.Shc.set_coeffs() by @avocakerok (#1).

  • Added support for Python 3.13, dropped support for Python 3.9 and macOS 10.x.

  • Added support for the gravity_constant keyword that appears in gfc models of celestial bodies.

  • Improved test suite.

  • Docs improvements.

  • Renamed charm_err.issaturated to charm_err.saturated.

CHarm 0.4.2

05 Aug 12:56
v0.4.2
8b4e1d2
Compare
Choose a tag to compare

This release adds support for NEON SIMD CPU instructions and improves performance on x86_64.

  • Added support for NEON SIMD instructions on ARM64 CPUs (v8 or newer).

  • Improved performance of spherical harmonic analysis and synthesis of point data values (up to ~20 %, depending on the processor). In spherical harmonic analysis, expensive horizontal sums of SIMD vectors (SUM_R) were reduced (~10 % improvement). In SIMD macros computing Legendre functions, some unnecessary blends were removed by suitable initializations (~20 % improvements in analysis and synthesis).

  • The internal parameter SIMD_BLOCK was split to SIMD_BLOCK_A and SIMD_BLOCK_S that are used with spherical harmonic analysis and synthesis, respectively. After the improvements from the previous bullet point, the optimal value of SIMD_BLOCK_A seem to be about twice that of SIMD_BLOCK_S. This further improves the performance of spherical harmonic analysis by about 10 %.

  • Added tests of MASK_TRUE_ALL, MASK_TRUE_ANY, SUM_R and BLEND_R macros.

CHarm 0.4.1

02 Jul 11:55
v0.4.1
4aeb9bf
Compare
Choose a tag to compare

This is a maintenance release enabling the installation of PyHarm using pip. A few minor bugs were additionally fixed and some minor improvements were applied.

  • On Linux (x86_64), macOS (x86_64, ARM64) and Windows (x86_64), PyHarm can now be installed using pip install pyharm.

  • In PyHarm, the pathname input parameter to the to_file method of the Shc class no longer requires to specify ./ when saving to the current working directory. pathname may now contain only the file name, implying the coefficients should be saved to the current working directory.

  • Fixed outside-of-bounds reads bug in the spherical harmonic synthesis at Driscoll--Healy grids. The outputs of the synthesis were correct but various errors might occur rarely (e.g., segmentation fault).

  • Removed variable-length arrays from one specific internal routine of CHarm. CHarm is now completely free from variable-length arrays.

  • Functions returning NaN now use the NAN macro from math.h. Previously, 0.0 / 0.0 was used to get NaN. Now this is only a fallback if the NAN macro is not found in math.h.

  • The internal parameter SIMD_BLOCK is now set to 4 for all kinds of AVX instruction sets. This value seems to perform the best overall.

  • Changed PyHarm building process. The C-part of PyHarm is now compiled from within Python as an extension module.

  • Lib names no longer have the _omp suffix, even if CHarm is compiled with OpenMP support. Depending on the precision, the names are libcharmf, libcharm and libcharmq. This affects the way how CHarm is linked.

  • Various documentation improvements.

CHarm v0.4.0

07 Feb 09:39
v0.4.0
4ed1ce1
Compare
Choose a tag to compare

This release adds new functions to synthesize the full first- and second-order gradients in the local north-oriented reference frame (LNOF) at evaluation points. The API is backward compatible except for a minor change in the misc module.

  • Added functions to compute point values of the full first- and second-order gradients in LNOF and of the first- and second-order derivatives with respect to the spherical coordinates.

    In CHarm, the new functions are placed in the shs module:

    • charm_shs_point_grad1 (the full first-order gradient in LNOF),
    • charm_shs_point_grad2 (the full second-order gradient in LNOF),
    • charm_shs_point_guru (guru interface for first- and second-order derivatives with respect to the spherical coordinates).

    In PyHarm, the new functions are:

    • pyharm.shs.point_grad1,
    • pyharm.shs.point_grad2,
    • pyharm.shs.point_guru.

    Example codes are provided in the cookbook.

  • Added members npoint and ncell to the charm_point and charm_cell structures, respectively. The new members represent the total number of points/cells stored by the structures both for scattered points/cells and grids. They are particularly useful when allocating the memory for the signal to be synthesized at the points/cells represented by the structures.

  • Values

    • CHARM_SHC_NMAX_MODEL
    • CHARM_SHC_NMAX_ERROR

    are now symbolic constants instead of enumerations. This is because ISO C restricts enumerator values to the range of int, but we need the maximum values of unsigned long. The latter thus do not fall within the range that can be portably stored by enumerations.

  • Renamed

    • charm_misc_print_version

    to

    • charm_misc_print_info

    which better describes its purpose. Now it also prints various kind of useful user-defined compilation flags (CFLAGS, CPPFLAGS, etc).

  • Added functions charm_misc_get_version to return a string specifying the CHarm version number determined on compilation time.

  • Added function charm_misc_buildopt_version_fftw that returns a string specifying the FFTW version that was used to compile CHarm.

  • Changed value of the internal parameter SIMD_BLOCK from 8 to 2. On some recent processors, this may improve the performance up to 40 %, while no significant deteriorations was encountered on older CPUs. In the future, it would be nice to tune this parameter for the host's CPU during compilation.

CHarm v0.3.1

12 Sep 11:32
v0.3.1
716bd9d
Compare
Choose a tag to compare

This release adds new functionalities to read/write spherical harmonic coefficients and modifies the API of the read/write routines.

  • Added support to read the ICGEM's time variable gravity field models. Supported are both the icgem1.0 and icgem2.0 formats.

  • Added support to get the maximum harmonic degree of coefficients from data files without the need to initialize a charm_shc structure. In CHarm, this can be done using the functions to read spherical harmonic coefficients from data files. In PyHarm, a new method nmax_from_file was added to the Shc class to this end.

  • Added support to read and write spherical harmonic coefficients in the dov text format (degree, order, value).

  • PyHarm functions to read spherical harmonic coefficients

    • pyharm.shc.Shc.from_file_gfc,
    • pyharm.shc.Shc.from_file_tbl,
    • pyharm.shc.Shc.from_file_bin,
    • pyharm.shc.Shc.from_file_mtx

    were merged into a single function

    • pyharm.shc.Shc.from_file.

    PyHarm functions to write spherical harmonic coefficients

    • pyharm.shc.Shc.to_file_tbl,
    • pyharm.shc.Shc.to_file_bin,
    • pyharm.shc.Shc.to_file_mtx

    were merged into a single function

    • pyharm.shc.Shc.to_file.
  • Renamed symbolic constants

    • CHARM_SHC_WRITE_TBL_N,
    • CHARM_SHC_WRITE_TBL_M,

    to

    • CHARM_SHC_WRITE_N,
    • CHARM_SHC_WRITE_M,

    so that both can be used with the tbl and dov formats.

CHarm v0.3.0

16 Jun 12:41
v0.3.0
6cac613
Compare
Choose a tag to compare

This release significantly improves performance of point synthesis and analysis, fixes a few bugs and makes a few modifications of the API.

  • Fixed memory leak in charm_integ_pn1m1pn2m2.

  • Renamed symbolic constants

    • CHARM_LEG_PNMJ_ORDER_MNJ,
    • CHARM_LEG_PNMJ_ORDER_MJN

    to

    • CHARM_LEG_PMNJ,
    • CHARM_LEG_PMJN,

    respectively.

  • Removed the xnum module from the public API. It offered only little added value, as all its functions are easy to implement. This makes the API cleaner, especially in long terms, given that much more interesting modules are yet to come.

  • The function charm_leg_pnmj_length was removed from the API. The number of coefficients stored in the charm_pnmj structure is already provided in its npnmj member.

  • Added functions to check various features of CHarm that are determined during the compilation time:

    • charm_misc_buildopt_precision,
    • charm_misc_buildopt_omp_charm,
    • charm_misc_buildopt_omp_fftw,
    • charm_misc_buildopt_simd,
    • charm_misc_buildopt_isfinite.
  • Switched to sphinx_book_theme for html docs.

  • Added support for polar optimization to increase computational speed (Eqs. 7 and 8 of Reinecke and Seljebotn 2013). As long as the polar optimization parameters (see the glob module) are chosen appropriately, this technique improves the performance and still keeps an excellent accuracy. However, unwisely chosen tuning parameters may negatively affect the output accuracy.

    By default, the polar optimization is turned off.

    References:

    • Reinecke, M., Seljebotn, D. S. (2013) Libsharp -- spherical harmonic transforms revisited. Astronomy and Astrophysics 554, A112, doi: 10.1051/0004-6361/201321494.
  • Improved caching for point synthesis and analysis by the blocking technique. No effort has yet been made to apply the cache blocking also for cell analysis and synthesis, which therefore offer a sub-optimal implementation. This might change in the future if the performance of cell transforms becomes an issue or a limitation.

  • Added automatic dynamical switching to the computation of Legendre functions (Fukushima, 2016).

    References:

    • Fukushima, T. (2016) Numerical computation of point values, derivatives, and integrals of associated Legendre function of the first kind and point values and derivatives of oblate spheroidal harmonics of the second kind of high degree and order. In: Rizos, C., Willis, P. (eds): IAG 150 Years: Proceedings of the 2013 IAG Scientific Assembly, Potsdam, Germany, 1--6 September, 2013, 143:193--197. https://doi.org/10.1007/1345_2015_124
  • Added dynamic scheduling to parallel for loops in shs. This slightly improves the performance, because the computation time of Legendre functions varies across the meridian (mainly due to X-numbers, dynamical switching and, if applied, polar optimization). In sha, the dynamic scheduling has been in use right from the beginning.

  • Introduced NEG_R macro to properly switch the sign of SIMD vectors.

  • Improved test suite. The functions are now smaller and cleaner. Some new tests were also added, mostly to check various custom allocation functions.

  • New benchmarks are available at https://www.charmlib.org/build/html/benchmarks.html.

CHarm 0.2.0

10 Jan 06:58
v0.2.0
2487743
Compare
Choose a tag to compare

CHarm now has a Python wrapper called PyHarm. This step necessitated several modifications of the CHarm API, some of which are not backward compatible. The changes are listed below.

  • The charm_crd structure was replaced by new structures:

    • charm_point and
    • charm_cell,

    which distinguish between evaluation points and evaluation cells.

  • Functions:

    • charm_shc_init,
    • charm_crd_init,
    • charm_leg_pnmj_init

    were replaced by

    • charm_shc_calloc,
    • charm_crd_calloc,
    • charm_leg_pnmj_calloc.

    The new functions behave in the same fashion as the old ones, including their interface.

    The following functions have been added:

    • charm_shc_malloc,
    • charm_crd_malloc,
    • charm_leg_pnmj_malloc,

    which behave similarly as their *_calloc counterparts, but provide uninitialized memory.

    Functions:

    • charm_shc_init,
    • charm_crd_init

    have now a different meaning. They can be used to create the respective structures from data arrays provided by the user.

  • Changed API of the following functions to read/write spherical harmonic coefficients from/to files:

    • charm_shc_read_bin,
    • charm_shc_read_gfc,
    • charm_shc_read_mtx,
    • charm_shc_read_tbl,
    • charm_shc_write_bin,
    • charm_shc_write_mtx,
    • charm_shc_write_tbl.

    Previously, it was necessary to open the stream for the input/output file, call the CHarm function to read/write the coefficients and, finally, close the stream. All these steps had to be done by the user. Now, the user only calls the function to read/write the coefficients and specifies the file's path name as one of the input parameters. Opening and closing the stream is done by CHarm.

    This should considerably simplify making the Python wrapper, given than wrapping TYPE * pointer with ctypes is not that trivial.

  • Symbolic constants CHARM_CRD_CELLS_* and CHARM_CRD_POINTS_* were renamed to CHARM_CRD_CELL_* and CHARM_CRD_POINT_*.

  • The nmj_order member of charm_pnmj was renamed to ordering.

  • Bug fix for integ_pn1m1pn2m2. The routine now works correctly with both ordering schemes of Fourier coefficients of Legendre functions.

  • Bug fix for shc_read_bin. The function no longer throws an error when reading coefficients up to a maximum degree that is lower than the one in the binary file.

  • Fix a few leg_pnmj tests that did not actually run.

  • Fixed use-after-free bug in degree amplitude tests.

  • Two members, nc and ns, were added to the charm_shc structure to represent the total number of spherical harmonic coefficients Cnm and Snm, respectively. This does not break the previous API.

  • Added support for Fortran's D and d decimal exponents when reading from text files.

  • Compiling CHarm as a shared library is no longer disabled by default.

  • The documentation can now be build with make html instead of make docs.