-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding GROMACS #61
Adding GROMACS #61
Conversation
repo/gromacs/application.py
Outdated
figure_of_merit('Nanosecs per day', log_file=log_str, | ||
fom_regex=r'Performance:\s+' + | ||
r'(?P<ns_per_day>[0-9]+\.[0-9]+).*', | ||
group_name='ns_per_day', units='ns/day') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is the only FOM that really matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pszi1ard Do you think it would be less confusing if we remove the rest of the FOMs?
processes_per_node: ['8', '4'] | ||
n_nodes: ['1', '2'] | ||
threads_per_node_core: ['2', '4'] | ||
omp_num_threads: '{threads_per_node_core} * {n_nodes}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what this means? Isn't omp_num_threads
here the same as OMP_NUM_THREADS
, i.e. the threads per MPI rank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pszi1ard OMP_NUM_THREADS (set using the value of omp_num_threads) is the threads per MPI rank. For this particular experiment config we scale omp_num_threads with n_nodes - for 1 node we use 2 and 4 threads, while for 2 nodes we use 4 and 8 threads. In both cases the total number of MPI ranks is 8, but since the latter case uses 2 nodes, we can assign the additional cores to more omp threads. This can be changed per the experiment's requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. That's a atypical way of scaling to multiple nodes in MD/GROMACS. I suggest to using a fixed OMP_NUM_THREADS
and scale the number of ranks instead. On CPU-only systems typically small values of OMP_NUM_THREADS
are best (1-3) until the case starts running out of parallelism (limiting domain decomposition) and increasing the OMP_NUM_THREADS
can help to keep the number of ranks fixed while still using more resources.
On GPU-accelerated machines, the decomposition, hence the rank count is determined by the number of GPU devices (typically 1 or 2 ranks per device).
Co-authored-by: Szilárd Páll <pall.szilard@gmail.com>
Co-authored-by: Szilárd Páll <pall.szilard@gmail.com>
packages: | ||
- lapack | ||
- default-mpi | ||
- fftw-omp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rfhaque Can we use spack to build fftw instead?
@@ -6,12 +6,16 @@ | |||
spack: | |||
packages: | |||
default-compiler: | |||
spack_spec: xl@16.1.1-2022.08.19-cuda{default_cuda_version} | |||
spack_spec: clang@16.0.6-cuda{default_cuda_version} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xl should be the default compiler on sierra. Please create compiler-clang to use for Gromacs if you must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments starting with "minor" or "for a later PR" aren't important for this.
Let me know if you disagree with any of these or the requests seem unclear.
@@ -32,8 +32,12 @@ packages: | |||
mvapich2: | |||
buildable: false | |||
externals: | |||
- spec: mvapich2@2.3.7%intel@2021.6.0 | |||
- spec: mvapich2@2.3.7-gcc-11.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might mean
mvapich2@2.3.7%gcc@11.2.1
vs
mvapich2@2.3.7-gcc-11.2.1
in the former case, Spack will know that mvapich2
was compiled with gcc
externals: | ||
- spec: intel-oneapi-mkl@2022.1.0 | ||
prefix: /usr/tce/packages/mkl/mkl-2022.1.0 | ||
buildable: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be cleaner to reorganize
blas:
externals:
- spec: intel-oneapi-mkl@2022.1.0
prefix: /usr/tce/packages/mkl/mkl-2022.1.0
buildable: false
lapack:
externals:
- spec: intel-oneapi-mkl@2022.1.0
prefix: /usr/tce/packages/mkl/mkl-2022.1.0
buildable: false
intel-oneapi-mkl:
externals:
- spec: intel-oneapi-mkl@2022.1.0
prefix: /usr/tce/packages/mkl/mkl-2022.1.0/
buildable: false
as
blas:
buildable: false
lapack:
buildable: false
intel-oneapi-mkl:
externals:
- spec: intel-oneapi-mkl@2022.1.0
prefix: /usr/tce/packages/mkl/mkl-2022.1.0/
buildable: false
intel-oneapi-mkl
provides both blas
and lapack
, so this reduces redundancy.
flags: | ||
cflags: -g -O2 | ||
cxxflags: -g -O2 -std=c++17 | ||
fflags: '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) you can remove this line entirely if you want.
@@ -65,3 +65,19 @@ compilers: | |||
modules: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for another PR) I noticed when looking above that there are 3 xl/cuda compiler combos - is that still necessary?
@@ -70,4 +75,8 @@ packages: | |||
prefix: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2022.08.19-cuda-10.1.243 | |||
extra_attributes: | |||
ldflags: "-lmpiprofilesupport -lmpi_ibm_usempi -lmpi_ibm_mpifh -lmpi_ibm" | |||
- spec: spectrum-mpi@2022.08.19-clang16.0.6-cuda-11.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want a build of a package with clang to use this, then you'd want to make the compiler association explicit using %
like spectrum-mpi@2022.08.19%clang16.0.6-cuda-11.8.0
(question, no change required) Do you know if it's strictly required to use/link an MPI compiled w/clang when building dependents with clang?
repo/gromacs/application.py
Outdated
|
||
|
||
class Gromacs(SpackApplication): | ||
'''Define a Gromacs application''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Are comments required? This doesn't add much by itself.
repo/gromacs/package.py
Outdated
options.append("-DGMX_LAPACK_USER={0}".format(self.spec["lapack"].libs.joined(";"))) | ||
target = self.spec.target | ||
if "+cuda" in self.spec and target.family == "ppc64le": | ||
options.append("-DGMX_EXTERNAL_LAPACK:BOOL=OFF") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to disable the use of external lapack on a specific system if I understand: LLNL-Sierra-IBM-power9-V100-Infiniband/spack.yaml
. I think it would be useful to add an explicit comment about that so it's easier to correct in the future.
repo/cusolver/package.py
Outdated
from spack.package import * | ||
|
||
|
||
class Cusolver(Package): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anything use this? Is it still needed?
externals: | ||
- spec: fftw@3.3.10 | ||
prefix: /usr/tcetmp/packages/fftw/fftw-3.3.10-xl-2023.06.28 | ||
buildable: false | ||
lapack: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this PR disables use of external blas/lapack, I think it would be worth reorganizing it:
blas:
buildable: false
lapack:
buildable: false
netlib-lapack:
externals:
- spec: netlib-lapack@3.6.0
prefix: /usr/tcetmp/packages/lapack/lapack-3.11.0-gcc-11.2.1/
(this config worked for me when using external blas/lapack and also avoids any potential issues with listing lapack-xl
as an external, which isn't a Spack package).
@@ -193,6 +193,13 @@ packages: | |||
prefix: /opt/cray/pe/libsci/23.05.1.4/gnu/10.3/x86_64/ | |||
lapack: | |||
buildable: false | |||
hypre: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove hypre from this list. All variants are already set to amdgpu_target=gfx90a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rfhaque you need rocmcc and you should compile both ACPP/hipsycl and GROMACS using that
ref pszi1ard@2398aba#diff-fd7d81e7edf97a63d331611675196dd0ab63b3ccc6f6dee144bd01eb36438fa2R113
* remove version locking from experiment config; achieve this with a mixin package * add needed externals, proper dependency reference for mixin * also align rocmcc version w/hip version
Working to add a new problem definition for GROMACS. Will upstream to Ramble at a later point.