-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LAMMPS tests to EESSI test-suite #131
Conversation
These tests are CPU only to run LAMMPS with GPU you need the execuatble |
The test are now compatible with the lammps package in EESSI and https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/l/LAMMPS/LAMMPS-2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1.eb. So I think that it is ready for review. |
@laraPPr Can we somehow avoid including the |
We had discussed this in the last test-suite sync. And said that it was oke for this one. I've already minimized its impact as much as possible by using a reframe option that it does not copy it to the stage directory. But we can discuss it again tomorrow. |
Can you add a small README file that mentions where we got the input files, when they were obtained, from which version, what the SHA256 checksum is, etc.? |
|
||
@performance_function('img/s') | ||
def perf(self): | ||
regex = r'^(?P<perf>[.0-9]+)% CPU use with [0-9]+ MPI tasks x [0-9]+ OpenMP threads' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look right, this is the %CPU usage.
performance should be one of the following:
Performance: 0.823 ns/day, 29.175 hours/ns, 4.761 timesteps/s, 152.338 katom-step/s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance: 205379.307 tau/day, 475.415 timesteps/s, 15.213 Matom-step/s
this than?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your line comes from the lj
test, while mine comes from the rhodo
test.
let's take timesteps/s
, as this unit is available for both tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now added tau/day
for lj
and ns/day
for rhodo
but I can also take the timesteps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just checked, and both tau/day
for lj and ns/day
for rhodo scale in exactly the same way as timesteps/s
, so it doesn't really matter.
i do have a slight preference for timesteps/s
as it is easy to understand. otherwise, can you add a comment explaining what exactly tau/day
means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also do not know what tau/day
is and google does not have a ready explanation. So changed it to the timesteps/s
for lj
and rhodo
Just nocticed something going really wrong when testing lammps related to this issue #132
I'm also seeing this one but I'm not sure if I'm triggering it because I don't see it when their are no duplicate modules and I also cannot find this error in any of the test-reports from the CI that I am running (
EDIT: The sbatch; error: triggers a bug in reframe older than 4.3 (I am using 4.2) See reframe-hpc/reframe#2885 |
seems to work well, though haven't tested on GPUs, our GPU nodes are too busy at the moment.
|
the only thing i'm still missing is an energy sanity check. i would use
for rhodo, the total energy is also very stable, with only the last digit showing slight differences:
|
I see Sam gave great comments on the content. I'm not familiar with LAMMPS, but at least checked that the runs succeeded. All succeeded both on Karolina:
And on Snellius:
The failure was due to a node failure, a rerun of that particular test succeeded. So... everything looks good from my side :) |
Still todo: add check that a CUDA-build is built with Kokkos, this test requires it. |
num_default = 0 # If this test already has executable opts, they must have come from the command line | ||
hooks.check_custom_executable_opts(self, num_default=num_default) | ||
if not self.has_custom_executable_opts: | ||
# should also check if the lammps is installed with kokkos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder for me to also look at this one again
Karolina, LAMMPS/2Aug2023_update2-foss-2023a-kokkos 1 rank, rhodo:
128 ranks, rhodo:
16*128 ranks, rhodo:
1 rank, lj:
128 ranks, lj:
16*128 ranks, lj:
Snellius, LAMMPS/2Aug2023_update2-foss-2023a-kokkos
128 ranks, rhodo:
16*128 ranks, rhodo:
1 rank, lj:
128 ranks, lj:
16*128 ranks, lj:
|
Just as another check, I ran with an older version of LAMMPS ( 128 ranks,lj:
128 ranks, rhodo:
Seems the total energy is consistent across versions. But, I am getting:
I'm not entirely sure why, because I do see:
And that seems to match...? |
With yet another module,
4 ranks (gpus), rhodo:
So, also consistent results for |
Its the comma can remove that one |
Ah, the trailing comma, you're right! |
I also tried to run on GPUs. Works fine, except multinode |
Added the energy check and added support for the LAMMPS with the GPU package (no kokkos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is pure MPI, it's probably good to call hooks.set_compact_process_binding
.
Also, please call req_memory_per_node
and specify how much memory the test needs. This makes sure the test gets skipped on systems with insufficient memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, did two more successful runs: on Snellius & Karolina. All good. Going in, thanks @laraPPr !
No description provided.