Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MPI tests on Apple Silicon #1922

Open
sloede opened this issue Apr 29, 2024 · 10 comments
Open

Fix MPI tests on Apple Silicon #1922

sloede opened this issue Apr 29, 2024 · 10 comments
Labels
bug Something isn't working parallelization Related to MPI, threading, tasks etc. testing

Comments

@sloede
Copy link
Member

sloede commented Apr 29, 2024

          Not everything works on ARM:

LoadError: User-defined reduction operators are currently not supported on non-Intel architectures.
See JuliaParallel/MPI.jl#404 for more details.

see https://github.com/trixi-framework/Trixi.jl/actions/runs/8865649949/job/24342153523?pr=1562#step:7:1124

We need to update our CI settings to reflect this - can we still use older versions/architectures for some MacOS tests?

Originally posted by @ranocha in #1562 (comment)

@sloede sloede added the bug Something isn't working label Apr 29, 2024
@sloede
Copy link
Member Author

sloede commented Apr 29, 2024

See also #1838.

@ranocha ranocha added testing parallelization Related to MPI, threading, tasks etc. labels Apr 29, 2024
@benegee
Copy link
Contributor

benegee commented Aug 21, 2024

I just came across this error when trying to run Trixi on a Grace Hopper node.

In my case it was caused by the MPI reductions for stepsize computation:

dt = MPI.Allreduce!(Ref(dt), min, mpi_comm())[]

A workaround it to use Base.min instead of

@inline min(args...) = @fastmath min(args...)

But of course this destroys the performance gain of the Fastmath version.

Maybe we could select the min implementation depending on the current architecture?

@ranocha
Copy link
Member

ranocha commented Aug 21, 2024

Could we use the Base.min for the MPI reduction explicitly? I would not expect significant performance gains there with our version

@benegee
Copy link
Contributor

benegee commented Aug 22, 2024

Yes, this works! Also not too many lines are affected:
2393f83

@ranocha
Copy link
Member

ranocha commented Aug 22, 2024

Could you please add some comments explaining why we need Base. there and make a PR?

@ranocha
Copy link
Member

ranocha commented Aug 27, 2024

Ping @benegee - would be a nice PR

@benegee
Copy link
Contributor

benegee commented Sep 3, 2024

@vchuravy miraculously just created JuliaParallel/MPI.jl#871

This could be used to fix the follow-up issues that appeared in #2054.

@vchuravy
Copy link
Member

vchuravy commented Sep 4, 2024

I think my PR is ready for a first test in a user package.

@benegee
Copy link
Contributor

benegee commented Sep 12, 2024

Intermediate status

I split this into smaller issues and PRs:

  • LoopVectorization error in AMR on ARM #2075 is an all new error, which we encountered during our tests when switching to aarch64. So far I have no idea how to fix it.

  • Use Base.min / Base.max in MPI reductions #2054 (which solves my initial problem with custom min and max operations on ARM) could be merged in my opinion. Changing Trixi.min to Base.min seems to be a mild change only and is not performance critical here.

  • Use new MPI custom ops in mpi reduce #2066 is the first PR to resolve remaining issues with SVectors in MPI reductions for integrated quantities in the analysis callbacks. This PR is based on @vchuravy's magical new macro add macro to create custom Ops also on aarch64 JuliaParallel/MPI.jl#871. I cannot really judge this approach. To me it seems very elegant. It requires only few changes in our code, and it will be easy to define more complex custom reduction operators in analogy in the future. It could also be used to solve the min/max issue above. Of course the PR would have to get accepted for MPI.jl first.

  • Reinterpret SVector as pointer in mpi reduce #2067 is the second PR to resolve the SVector issue. This is based on @ranocha's and @vchuravy's suggestion to do a reinterpretation as Ptr{Float64}. To me this seems like the C approach. It just works and does not require any additional tricks. The code currently looks clumsy because of the distinction between scalars and a vectors. But some Julian wizard could probably come up with a single expression here.

I could use some advice here on how to proceed. @vchuravy @ranocha @sloede

@ranocha
Copy link
Member

ranocha commented Sep 12, 2024

Please ping me for a review of the PR solving your current issue. We can discuss the rest later.
It would be good to hear from @vchuravy what he thinks about his MPI.jl PR to decide how to proceed with the other parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parallelization Related to MPI, threading, tasks etc. testing
Projects
None yet
Development

No branches or pull requests

4 participants