From ec95290d5d882eeee5659b7090aa9a3bc50bf7b9 Mon Sep 17 00:00:00 2001 From: Miroslav Stoyanov Date: Thu, 21 Mar 2024 11:22:04 -0400 Subject: [PATCH 1/2] updated the acceleration install docs --- Doxygen/Installation.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/Doxygen/Installation.md b/Doxygen/Installation.md index 9220db330..27909919a 100644 --- a/Doxygen/Installation.md +++ b/Doxygen/Installation.md @@ -25,9 +25,10 @@ Recommended additional features: * [OpenMP](https://en.wikipedia.org/wiki/OpenMP) implementation (usually included with the compiler) Optional features: +* Acceleration using [OpenMP](https://www.openmp.org/) multicore algorithms (CPU only), the OpenMP standard is supported on most major compilers. * Acceleration using Nvidia [linear algebra libraries](https://developer.nvidia.com/cublas) and custom [CUDA kernels](https://developer.nvidia.com/cuda-zone) * Acceleration using AMD ROCm [linear algebra libraries](https://rocsparse.readthedocs.io/en/master/) and custom [HIP kernels](https://rocmdocs.amd.com/en/latest/ROCm_API_References/HIP-API.html) -* Acceleration using Intel OneAPI [oneMKL](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html) and custom [DPC++ kernels](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html) +* Acceleration using Intel OneAPI [oneMKL](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html) and custom [SYCL kernels](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html) * GPU out-of-core algorithms using the [UTK MAGMA library](http://icl.cs.utk.edu/magma/) * Basic [Python matplotlib](https://matplotlib.org/) support * Fully featured [MATLAB/Octave](https://www.gnu.org/software/octave/) interface via wrappers around the command-line tool @@ -100,7 +101,11 @@ ROCm capabilities require CMake 3.21. ``` * Acceleration options: - * OpenMP allows Tasmanian to use more than one CPU core, which greatly increases the performance + * OpenMP allows Tasmanian to use more than one CPU core, which greatly increases the performance. + While many of the Tasmanian algorithms have been parallelized, the buildin C++ algorithms are usually sequential. + This is most notable in the case of `std::sort` but affects others as well. + Some compilers support parallel standard algorithms but those in turn reuqire additional compiler flags. + * for GCC add `-D_GLIBCXX_PARALLEL` to the `CMAKE_CXX_FLAGS` * Basic Linear Algebra Subroutines (BLAS) is a standard with many implementations, e.g., [https://www.openblas.net/](https://www.openblas.net/); optimized BLAS improves the performance when using evaluate commands on grids with many points or working with models with many outputs From 6207002b77ac574b732f51c93e5424e24d7ce40f Mon Sep 17 00:00:00 2001 From: Miroslav Stoyanov Date: Thu, 21 Mar 2024 11:27:15 -0400 Subject: [PATCH 2/2] updated the changelog with new parallel algs --- CHANGELOG.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 20e2622fe..846fc2f53 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,10 @@ Changelog for version 8.1 -------------- +* added more multicore cpu support + * parallelized setting surplus refinement + * compatibility with gcc parallel STL algorithms + * implemented a new algorithm for global sparse Kronecker * significant speedup when loading needed values