Multi-dimensional Parallel Solver for x64/arm/arm64 with MPI/CUDA support
Introducing version 1.1!
Arm32 and Arm64 support
Arm32 and Arm64 are now fully supported in sequential mode (in addition to x64, which was the only supported arch previously). Both native and cross compilations for arm32 and arm64 are supported, for both gcc and clang toolchains.
Performance improvements
Up to 25% speed improvement is achieved with this release in different scenarios for all architectures.
Improved command line interface
Coordinates setup is simpler now, for example one option --num-cuda-threads x:<Nx>,y:<Ny>,z:<Nz>
can now be used instead of three --num-cuda-threads-x <Nx> --num-cuda-threads-y <Ny> --num-cuda-threads-z <Nz>
.
Build system simplification
Some build flags are automatically set up now, some are fully removed. More build flags are going to be revised in future.
Test system improvements
Test process is now fully documented and test system is changed to support cross arch testing.