-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{toolchain} gobff/2020.11 + gobff/2020.06-amd (toolchains with BLIS + libFLAME) #11761
{toolchain} gobff/2020.11 + gobff/2020.06-amd (toolchains with BLIS + libFLAME) #11761
Conversation
…20a-amd.eb, BLIS-0.8.0-GCC-9.3.0.eb, BLIS-2.2-GCC-9.3.0-amd.eb, FFTW-3.3.8-gompi-2020a-amd.eb, HPL-2.3-gobff-2020a-amd.eb, HPL-2.3-gobff-2020a.eb, libFLAME-2.2-GCC-9.3.0-amd.eb, libFLAME-5.2.0-GCC-9.3.0.eb, make-4.3-GCC-9.3.0.eb, ScaLAPACK-2.1.0-gompi-2020a-bf.eb, ScaLAPACK-2.2-gompi-2020a-amd.eb
This pull request adds two toolchains:
Both toolchains use GNU Compiler with OpenMPI, BLIS, libFLAME, ScaLAPACK and FFTW.
I tested the performance on JUSUF @ JSC. On JUSUF modified toolchains are used, however the HPL and dgemm performance relies almost entirely on the math libraries. So the following numbers can be seen as comparison between Performance on JUSUF @ JSC (theoretical PEAK: 4608 GFLOPS) with HPL using 1 node with 32 ranks and 4 cores each with ~20% of memory (51.2GB out of 256G) used (Pinning with OMP_PROC_BIND=TRUE and OMP_PLACES=cores) gpsmkl/2020 (MKL)
gpsbff/2020 (BLIS)
Performance on JUSUF @ JSC (theoretical PEAK: 4608 GFLOPS) with DGEMM with m=n=k=10240 using 1 node with 128 cores each (Pinning with OMP_PROC_BIND=TRUE and OMP_PLACES=cores)
|
I'm just wondering what the benefit is to including libFLAME into a toolchain, ie. why not simply use |
@bartoldeman |
By the way my testing on a dual AMD 7452 showed this for HPL some months ago:
I agree that AMD BLIS is the way to go for optimal performance on AMD chips since MKL slightly edged it out here but depended on undocumented settings. |
ah yes I missed that it replaces LAPACK if you use its included lapack2flame. |
That is very interesting! Yes, I also used the threaded BLIS library. |
As far as I understood this is way that is suggested by AMD. At least this is how I understood the AMD Optimized CPU Libraries User Guide: |
@SebastianAchilles did you also compare FFTW (either vanilla or AMD's fork) with MKL DFT? |
@migueldiascosta That is very interesting question. Do you have a specific benchmark in mind? I used the 3d complex-to-complex benchmark from This are the results on a dual AMD EPYC 7742 (JUSUF @ JSC):
|
@SebastianAchilles I think that in regarding FFT benchmarks, a while back we used more importantly for us, the FFT related timings in (material science) application benchmarks seemed to show that MKL (when forcing AVX2 with and to be clear, this is a bit orthogonal to the PR - having this toolchain in eb would be very useful in any case |
a reminder that with EB's HierarchicalMNS, above |
Test report by @migueldiascosta |
@boegelbot please test @ generoso |
@boegel: Request for testing this PR well received on generoso PR test command '
Test results coming soon (I hope)... - notification for comment with ID 735396322 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
Test report by @boegel |
…asyconfigs into 20201125113818_new_pr_BLIS080
I don't understand the dependency conflict. It is not possible to distinguish toolchains with a |
@SebastianAchilles The easyconfigs you're adding violate a policy we're trying to maintain where there's only a single version of a dependency in each "generation" of easyconfig files. We do this to minimize the amount of conflicts between easyconfigs from the same generation. We should probably add exceptions for Before we do that, we should agree on the naming scheme we'll use. The Can you post an overview of the toolchains you're adding here in a comment, and how they compare with standard |
@boegel Sure, I try to elaborate.
The
The The
My initial idea was to offer both toolchains, so that people can choose which variant they prefer. However, I didn't measure performance difference in What is your opinion? Do you want to add an exceptions? Or do we want to try making the optimization depending on the system where the toolchain is used, e.g. something like a conditional easyconfig? Regarding the naming scheme: I am not convinced that the names or suffix I came up with are the best ones. An alternative idea I have is to rename |
…s to avoid tests tripping over two BLIS/libFLAME variants
…riable for amd-fftw version in source tarball)
@boegelbot please test @ generoso |
@boegel: Request for testing this PR well received on generoso PR test command '
Test results coming soon (I hope)... - notification for comment with ID 741991017 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegel |
This should be good to go now, the necessary exceptions in the tests have been added to allow the I'll squeeze this in for the upcoming EasyBuild v4.3.2, so we can start experimenting with this BLIS-based toolchain. Thanks a lot for the contribution @SebastianAchilles ! |
Test report by @boegel |
Going in, thanks @SebastianAchilles! |
Test report by @boegelbot |
(created using
eb --new-pr
)depends on PR
easybuilders/easybuild-framework#3505