Use Base.min / Base.max in MPI reductions #2054

benegee · 2024-08-30T10:42:35Z

We can use this workaround to resolve one part of #1922.

MPI.jl's reduce currently does not work for custom operators (such as Trixi's min/max) on ARM

github-actions · 2024-08-30T10:42:49Z

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

The PR has a single goal that is clear from the PR title and/or description.
All code changes represent a single set of modifications that logically belong together.
No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

The code can be understood easily.
Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
There are no redundancies that can be removed by simple modularization/refactoring.
There are no leftover debug statements or commented code sections.
The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

New functions and types are documented with a docstring or top-level comment.
Relevant publications are referenced in docstrings (see example for formatting).
Inline comments are used to document longer or unusual code sections.
Comments describe intent ("why?") and not just functionality ("what?").
If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

The PR passes all tests.
New or modified lines of code are covered by tests.
New or modified tests run in less then 10 seconds.

Performance

There are no type instabilities or memory allocations in performance-critical parts.
If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

The correctness of the code was verified using appropriate tests.
If new equations/methods are added, a convergence test has been run and the results
are posted in the PR.

Created with ❤️ by the Trixi.jl community.

codecov · 2024-08-30T11:25:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.32%. Comparing base (e4040e7) to head (ffad95a).
Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2054   +/-   ##
=======================================
  Coverage   96.32%   96.32%           
=======================================
  Files         470      470           
  Lines       37486    37486           
=======================================
  Hits        36107    36107           
  Misses       1379     1379

Flag	Coverage Δ
unittests	`96.32% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ranocha

Thanks! Shall we also switch to macos-latest in

Trixi.jl/.github/workflows/ci.yml

Line 93 in 2ac203e

os: macos-13

?

src/callbacks_step/analysis.jl

src/callbacks_step/analysis_dg2d_parallel.jl

src/callbacks_step/analysis_dg3d_parallel.jl

src/callbacks_step/stepsize_dg2d.jl

src/callbacks_step/stepsize_dg3d.jl

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

benegee · 2024-08-30T13:09:49Z

Thanks! Shall we also switch to macos-latest in

Trixi.jl/.github/workflows/ci.yml

Line 93 in 2ac203e

os: macos-13

?

Fine with me! However I am not quiet clear as to why this fixed things in the past.

Also, should we not test ARM here, besides / instead of x86?

ranocha · 2024-08-30T17:53:09Z

macos-latest is macos-14, which is only available with ARM - we should also delete the confusing arch specification when updating it.
macos-13 is an x86 Intel architecture, so this fixed the issue

macos-latest is 14, which is ARM

ranocha · 2024-08-31T06:29:33Z

There are still some user-defined MPI reductions in the integration methods.

Ideally, we should also fix those.
If the current implementation already enables you to do something that has not been possible before, we can merge the fixes (without the switch to macos-latest) and fix the remaining issues later in another PR.

What do you prefer?

benegee · 2024-08-31T09:00:19Z

Indeed! This time not the operator is the problem, but the operands. E.g. where the current CI fails, we are dealing with buf::Base.RefValue{StaticArraysCore.SVector{4, Float64}}. I am not sure if any vectorial data structure would fail, but I suppose so, cf. https://github.com/JuliaParallel/MPI.jl/blob/780aaa0fdb768713a329659338a9c9cde23c41a8/src/operators.jl#L59C1-L59C110

For my current work I only need the fixes in my personal branch, where I tested this initially. So I am in favor of fixing all occurrences.

I do not have a great idea though. Just reducing each entry in the vector individually would of course be an option. A nicer solution would probably be to define custom reduction operators ourselves, as done here https://juliaparallel.org/MPI.jl/stable/examples/03-reduce/.

ranocha · 2024-08-31T12:17:18Z

Did you test the example with macos?

benegee · 2024-08-31T14:05:13Z

No, I do not have a mac. But I could try with the GH system.

ranocha · 2024-08-31T17:33:27Z

That would be great 👍

benegee · 2024-08-31T21:58:05Z

It does not work. Another individual operator does not help. Instead one would need to directly generate an (MPI.jl) Op object.

vchuravy · 2024-09-05T11:21:21Z

src/auxiliary/mpi.jl

+function reduce_vector_plus(x, y)
+    x .+ y
+end
+MPI.@Op(reduce_vector_plus, SVector)


I don't think that will work...

You would need to say:
MPI.@Op(reduce_vector_plus, SVector{3, Float32,})

See https://github.com/JuliaParallel/MPI.jl/blob/780aaa0fdb768713a329659338a9c9cde23c41a8/src/operators.jl#L84

Which means that we will have to do this for many types (different lengths, Float64 and maybe Float32, ...)

I don't think that will work...

You are absolutely right.

Which means that we will have to do this for many types (different lengths, Float64 and maybe Float32, ...)

I was just trying to understand this. Is there no supertype?

Sadly that doesn't work.

We generate a "wrapper" that looks like this:

function (w::OpWrapper{F,T})(_a::Ptr{Cvoid}, _b::Ptr{Cvoid}, _len::Ptr{Cint}, t::Ptr{MPI_Datatype}) where {F,T} len = unsafe_load(_len) @assert isconcretetype(T) a = Ptr{T}(_a) b = Ptr{T}(_b) for i = 1:len unsafe_store!(b, w.f(unsafe_load(a,i), unsafe_load(b,i)), i) end return nothing end

So we get two pointer to an array of data, and we must reinterpret the pointer to a concrete type so that we can load it. Maybe one could use t to identify which Julia type one aught to use, but that would be less efficient.

If so, would it make sense to convert the SVector to play Vectors in our MPI routines to make our life easier and fix this issue?

IIUC you currently have data = Vector{SVector{5, Float64}}, you could reinterpret that to Ptr{Float64} as long as your reduce_vector function is not using the fact that the datatype is a SVector.

We do not need any special SVector functionality here.
But we need to know the number of elements?

Only when @vchuravy mentioned the OpWrapper I realized that it already iterates through something like a vector.
_len will be 1 in case of our SVectors, but carry the right number when using Vectors (where does this actually come from?). So, using a Vector or reinterpreting the SVector as Ptr[Float64} seems to make the reduction work, without a custom operator (currently tree 2d only).

Doing this here now: #2067

ranocha · 2024-09-05T12:02:38Z

src/callbacks_step/analysis_dg2d_parallel.jl

@@ -161,7 +162,7 @@ function integrate_via_indices(func::Func, u,
                            normalize = normalize)

    # OBS! Global results are only calculated on MPI root, all other domains receive `nothing`
-    global_integral = MPI.Reduce!(Ref(local_integral), +, mpi_root(), mpi_comm())
+    global_integral = MPI.Reduce!(Ref(local_integral), reduce_vector_plus, mpi_root(), mpi_comm())


This is the place where we need the vector reduction. Currently, local_integral can be a Float64 in some cases (when we compute the total entropy) or an SVector (when we compute the total mass of all conserved quantities). What I'm suggesting is to reduce collect(local_integral) instead of Ref(local_integral). That should work, shouldn't it?

Yeah, that should work, but of course it would require an extra allocation.

That's true. I'm just looking for a solution that is at the Pareto front of optimality in terms of code complexity, code generality, and efficiency. While the @Op approach is likely best in terms of efficiency, I have some doubts about the code complexity and generality - shall we do it for SVector{N, T} for N in 1:10 (or more?) and T in (Float32, Float64) - and maybe also scalars? Will we need something else? It's kind of bad that Trixi.jl shall be a library and not a single code for a specific application.

It's annoying that MPI doesn't specify a "reverse" translation of MPI_Datatype.
We could maybe have a dictonary where we do MPI_Datatype => Type and then we can use that to get a concrete type, but that would cause a dynamic dispatch...

Turns out MPI.jl has support for reverse translations.

I just pushed a commit that allows for @Op(+, Any).

Nice! Can we please test this here, @benegee?

Doing this here now: #2066

ranocha · 2024-09-05T12:03:14Z

src/callbacks_step/analysis_dg2d_parallel.jl

@@ -161,7 +162,7 @@ function integrate_via_indices(func::Func, u,
                            normalize = normalize)

    # OBS! Global results are only calculated on MPI root, all other domains receive `nothing`
-    global_integral = MPI.Reduce!(Ref(local_integral), +, mpi_root(), mpi_comm())
+    global_integral = MPI.Reduce!(Ref(local_integral), reduce_vector_plus, mpi_root(), mpi_comm())
    if mpi_isroot()
        integral = convert(typeof(local_integral), global_integral[])


If we do this, we may have to use a special handling if local_integral isa Real

ranocha

Thanks!

benegee added 3 commits August 30, 2024 12:29

use Base.min/max in MPI.Allreduce

4a137bd

MPI.jl's reduce currently does not work for custom operators (such as Trixi's min/max) on ARM

add comments

ed2540d

explain workdaround

cc1147d

typo

fb8a769

ranocha requested changes Aug 30, 2024

View reviewed changes

benegee and others added 2 commits August 30, 2024 14:59

Apply suggestions from code review

7cd8456

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

switch to macos-latest in mpi tests

246f3ac

benegee added 2 commits August 30, 2024 22:03

remove arch specification for macos-latest

1d6e7f9

macos-latest is 14, which is ARM

readd arch, required by julia-actions/setup-julia

db209ec

benegee mentioned this pull request Sep 3, 2024

Fix MPI tests on Apple Silicon #1922

Open

Merge branch 'main' into bg/base_min_in_mpi_reduce

e6152c7

vchuravy reviewed Sep 5, 2024

View reviewed changes

ranocha reviewed Sep 5, 2024

View reviewed changes

back to macos-13 and x64

6c659f5

benegee force-pushed the bg/base_min_in_mpi_reduce branch from 6140e98 to 6c659f5 Compare September 5, 2024 16:27

This was referenced Sep 5, 2024

Use new MPI custom ops in mpi reduce #2066

Draft

Reinterpret SVector as pointer in mpi reduce #2067

Draft

DanielDoehring added the parallelization Related to MPI, threading, tasks etc. label Sep 6, 2024

Merge branch 'main' into bg/base_min_in_mpi_reduce

ffad95a

benegee requested a review from ranocha September 12, 2024 21:02

ranocha approved these changes Sep 13, 2024

View reviewed changes

ranocha merged commit 148dd67 into main Sep 13, 2024
38 checks passed

ranocha deleted the bg/base_min_in_mpi_reduce branch September 13, 2024 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Base.min / Base.max in MPI reductions #2054

Use Base.min / Base.max in MPI reductions #2054

benegee commented Aug 30, 2024 •

edited by ranocha

Loading

github-actions bot commented Aug 30, 2024

codecov bot commented Aug 30, 2024 •

edited

Loading

ranocha left a comment

benegee commented Aug 30, 2024

ranocha commented Aug 30, 2024 •

edited

Loading

ranocha commented Aug 31, 2024

benegee commented Aug 31, 2024 •

edited

Loading

ranocha commented Aug 31, 2024

benegee commented Aug 31, 2024

ranocha commented Aug 31, 2024

benegee commented Aug 31, 2024

vchuravy Sep 5, 2024

vchuravy Sep 5, 2024

ranocha Sep 5, 2024

benegee Sep 5, 2024 •

edited

Loading

vchuravy Sep 5, 2024

ranocha Sep 5, 2024

vchuravy Sep 5, 2024

benegee Sep 5, 2024

benegee Sep 5, 2024

benegee Sep 5, 2024

ranocha Sep 5, 2024

vchuravy Sep 5, 2024

ranocha Sep 5, 2024 •

edited

Loading

vchuravy Sep 5, 2024

vchuravy Sep 5, 2024

ranocha Sep 5, 2024 •

edited

Loading

benegee Sep 5, 2024

ranocha Sep 5, 2024

ranocha left a comment

Use Base.min / Base.max in MPI reductions #2054

Use Base.min / Base.max in MPI reductions #2054

Conversation

benegee commented Aug 30, 2024 • edited by ranocha Loading

github-actions bot commented Aug 30, 2024

Review checklist

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

codecov bot commented Aug 30, 2024 • edited Loading

Codecov Report

ranocha left a comment

Choose a reason for hiding this comment

benegee commented Aug 30, 2024

ranocha commented Aug 30, 2024 • edited Loading

ranocha commented Aug 31, 2024

benegee commented Aug 31, 2024 • edited Loading

ranocha commented Aug 31, 2024

benegee commented Aug 31, 2024

ranocha commented Aug 31, 2024

benegee commented Aug 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benegee Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ranocha Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ranocha Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ranocha left a comment

Choose a reason for hiding this comment

benegee commented Aug 30, 2024 •

edited by ranocha

Loading

codecov bot commented Aug 30, 2024 •

edited

Loading

ranocha commented Aug 30, 2024 •

edited

Loading

benegee commented Aug 31, 2024 •

edited

Loading

benegee Sep 5, 2024 •

edited

Loading

ranocha Sep 5, 2024 •

edited

Loading

ranocha Sep 5, 2024 •

edited

Loading