Skip to content

Multi-dimensional Parallel Solver for x64/arm/arm64 with MPI/CUDA support

Compare
Choose a tag to compare
@zer011b zer011b released this 13 Nov 09:58
· 31 commits to master since this release

Introducing version 1.1!

Arm32 and Arm64 support

Arm32 and Arm64 are now fully supported in sequential mode (in addition to x64, which was the only supported arch previously). Both native and cross compilations for arm32 and arm64 are supported, for both gcc and clang toolchains.

Performance improvements

Up to 25% speed improvement is achieved with this release in different scenarios for all architectures.

Improved command line interface

Coordinates setup is simpler now, for example one option --num-cuda-threads x:<Nx>,y:<Ny>,z:<Nz> can now be used instead of three --num-cuda-threads-x <Nx> --num-cuda-threads-y <Ny> --num-cuda-threads-z <Nz>.

Build system simplification

Some build flags are automatically set up now, some are fully removed. More build flags are going to be revised in future.

Test system improvements

Test process is now fully documented and test system is changed to support cross arch testing.

Code unification, code improvements and various bug fixes