Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port MPI to Taal #292

Merged
merged 32 commits into from
Nov 12, 2020
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
dfb810e
WIP: MPI in Taal
ranocha Nov 4, 2020
e1c48ce
debugging MPI in Taal
ranocha Nov 4, 2020
b8efa06
fix prolong2mpiinterfaces
ranocha Nov 5, 2020
b6d09e6
printing only on root
ranocha Nov 5, 2020
308a600
enable saving callbacks again
ranocha Nov 5, 2020
02f3afd
test MPI in Taal
ranocha Nov 5, 2020
14ffea4
Merge branch 'dev' into taal_mpi
ranocha Nov 5, 2020
3fab78b
test errors only on MPI root
ranocha Nov 5, 2020
c3b034b
Merge branch 'dev' into taal_mpi
ranocha Nov 5, 2020
7923711
adapt output of test_trixi_include to MPI
ranocha Nov 8, 2020
995bec7
Merge branch 'dev' into taal_mpi
ranocha Nov 8, 2020
c9aa3f1
update MPI docs to Taal
ranocha Nov 8, 2020
f18c077
port latency improvements to Taal MPI
ranocha Nov 8, 2020
baa50a2
fix AnalysisCallback initialization with MPI
ranocha Nov 12, 2020
c33ccc9
fix MPI residual colculation
ranocha Nov 12, 2020
c4967bc
use mpi_println
ranocha Nov 12, 2020
7d80448
fix parallel load_mesh
ranocha Nov 12, 2020
cdd6a81
switch order of conditions in test_trixi_include
ranocha Nov 12, 2020
2414997
Update test/test_examples_2d_parallel.jl
ranocha Nov 12, 2020
86aeee1
Update src/callbacks/steady_state.jl
ranocha Nov 12, 2020
0982d66
expand comment on partition!
ranocha Nov 12, 2020
35b9494
Update src/callbacks/save_solution_dg.jl
ranocha Nov 12, 2020
a03436f
Update src/callbacks/analysis.jl
ranocha Nov 12, 2020
314fca8
TODO notes for #328
ranocha Nov 12, 2020
f176993
Merge branch 'dev' into taal_mpi
ranocha Nov 12, 2020
b7b76ff
fix merge conflicts
ranocha Nov 12, 2020
c1ea4b0
move mesh saving/loading to mesh_io.jl
ranocha Nov 12, 2020
7b745c3
add note on hybrid parallelism
ranocha Nov 12, 2020
5486c22
add note on precompile assertions
ranocha Nov 12, 2020
f774c9d
return NaN for non-root ranks in reductions
ranocha Nov 12, 2020
544e834
Update src/mesh/mesh.jl
ranocha Nov 12, 2020
4c716ea
fix parallel reductions
ranocha Nov 12, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/src/parallelization.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Many compute-intensive loops in Trixi.jl are parallelized using the
support provided by Julia. You can recognize those loops by the
`Threads.@threads` macro prefixed to them, e.g.,
```julia
Threads.@threads for element_id in 1:dg.n_elements
Threads.@threads for element in eachelement(dg, cache)
...
end
```
Expand Down Expand Up @@ -51,7 +51,7 @@ To start Trixi in parallel with MPI, there are three options:
julia> using MPI

julia> mpiexec() do cmd
run(`$cmd -n 3 $(Base.julia_cmd()) --project=@. -e 'using Trixi; trixi_include("examples/2d/parameters.toml")'`)
run(`$cmd -n 3 $(Base.julia_cmd()) --threads=1 --project=@. -e 'using Trixi; trixi_include(default_example())'`)
sloede marked this conversation as resolved.
Show resolved Hide resolved
end
```
The parameter `-n 3` specifies that Trixi should run with three processes (or
Expand All @@ -73,7 +73,7 @@ To start Trixi in parallel with MPI, there are three options:
Then, to execute Trixi in parallel, execute the following command from your
command line:
```bash
mpiexecjl -n 3 julia --project=@. -e 'using Trixi; trixi_include("examples/2d/parameters.toml")'
mpiexecjl -n 3 julia --threads=1 --project=@. -e 'using Trixi; trixi_include(default_example())'
```
3. **Run interactively with `tmpi` (Linux/MacOS only):** If you are on a
Linux/macOS system, you have a third option which lets you run Julia in
Expand All @@ -95,7 +95,7 @@ To start Trixi in parallel with MPI, there are three options:
Finally, you can start and control multiple Julia REPLs simultaneously by
running
```bash
tmpi 3 julia --project=@.
tmpi 3 julia --threads=1 --project=@.
```
This will start Julia inside `tmux` three times and multiplexes all commands
you enter in one REPL to all other REPLs (try for yourself to understand what
Expand Down
6 changes: 4 additions & 2 deletions src/callbacks/alive.jl
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,14 @@ end
function (alive_callback::AliveCallback)(integrator)
@unpack t, dt, iter = integrator

if isfinished(integrator)
# Checking for floating point equality is OK here as `DifferentialEquations.jl`
# sets the time exactly to the final time in the last iteration
if isfinished(integrator) && mpi_isroot()
println("-"^80)
println("Trixi simulation run finished. Final time: ", integrator.t, " Time steps: ", integrator.iter)
println("-"^80)
println()
else
elseif mpi_isroot()
runtime_absolute = 1.0e-9 * (time_ns() - alive_callback.start_time)
@printf("#t/s: %6d | dt: %.4e | Sim. time: %.4e | Run time: %.4e s\n",
iter, dt, t, runtime_absolute)
Expand Down
2 changes: 1 addition & 1 deletion src/callbacks/amr.jl
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ function (amr_callback::AMRCallback)(u_ode::AbstractVector, mesh::TreeMesh,
lambda = @timeit_debug timer() "indicator" controller(u, mesh, equations, dg, cache,
t=t, iter=iter)

leaf_cell_ids = leaf_cells(mesh.tree)
leaf_cell_ids = local_leaf_cells(mesh.tree)
@boundscheck begin
@assert axes(lambda) == axes(leaf_cell_ids) ("Indicator (axes = $(axes(lambda))) and leaf cell (axes = $(axes(leaf_cell_ids))) arrays have different axes")
end
Expand Down
4 changes: 2 additions & 2 deletions src/callbacks/amr_dg1d.jl
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function refine!(u_ode::AbstractVector, adaptor, mesh::TreeMesh{1},
old_u = wrap_array(old_u_ode, mesh, equations, dg, cache)

# Get new list of leaf cells
leaf_cell_ids = leaf_cells(mesh.tree)
leaf_cell_ids = local_leaf_cells(mesh.tree)

# re-initialize elements container
@unpack elements = cache
Expand Down Expand Up @@ -139,7 +139,7 @@ function coarsen!(u_ode::AbstractVector, adaptor, mesh::TreeMesh{1},
old_u = wrap_array(old_u_ode, mesh, equations, dg, cache)

# Get new list of leaf cells
leaf_cell_ids = leaf_cells(mesh.tree)
leaf_cell_ids = local_leaf_cells(mesh.tree)

# re-initialize elements container
@unpack elements = cache
Expand Down
4 changes: 2 additions & 2 deletions src/callbacks/amr_dg2d.jl
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function refine!(u_ode::AbstractVector, adaptor, mesh::TreeMesh{2},
old_u = wrap_array(old_u_ode, mesh, equations, dg, cache)

# Get new list of leaf cells
leaf_cell_ids = leaf_cells(mesh.tree)
leaf_cell_ids = local_leaf_cells(mesh.tree)

# re-initialize elements container
@unpack elements = cache
Expand Down Expand Up @@ -166,7 +166,7 @@ function coarsen!(u_ode::AbstractVector, adaptor, mesh::TreeMesh{2},
old_u = wrap_array(old_u_ode, mesh, equations, dg, cache)

# Get new list of leaf cells
leaf_cell_ids = leaf_cells(mesh.tree)
leaf_cell_ids = local_leaf_cells(mesh.tree)

# re-initialize elements container
@unpack elements = cache
Expand Down
4 changes: 2 additions & 2 deletions src/callbacks/amr_dg3d.jl
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function refine!(u_ode::AbstractVector, adaptor, mesh::TreeMesh{3},
old_u = wrap_array(old_u_ode, mesh, equations, dg, cache)

# Get new list of leaf cells
leaf_cell_ids = leaf_cells(mesh.tree)
leaf_cell_ids = local_leaf_cells(mesh.tree)

# re-initialize elements container
@unpack elements = cache
Expand Down Expand Up @@ -179,7 +179,7 @@ function coarsen!(u_ode::AbstractVector, adaptor, mesh::TreeMesh{3},
old_u = wrap_array(old_u_ode, mesh, equations, dg, cache)

# Get new list of leaf cells
leaf_cell_ids = leaf_cells(mesh.tree)
leaf_cell_ids = local_leaf_cells(mesh.tree)

# re-initialize elements container
@unpack elements = cache
Expand Down
135 changes: 68 additions & 67 deletions src/callbacks/analysis.jl
Original file line number Diff line number Diff line change
Expand Up @@ -93,45 +93,45 @@ function initialize!(cb::DiscreteCallback{Condition,Affect!}, u_ode, t, integrat

# write header of output file
open(joinpath(output_directory, analysis_filename), "w") do io
ranocha marked this conversation as resolved.
Show resolved Hide resolved
@printf(io, "#%-8s", "timestep")
@printf(io, " %-14s", "time")
@printf(io, " %-14s", "dt")
mpi_isroot() && @printf(io, "#%-8s", "timestep")
mpi_isroot() && @printf(io, " %-14s", "time")
mpi_isroot() && @printf(io, " %-14s", "dt")
if :l2_error in analysis_errors
for v in varnames_cons(equations)
@printf(io, " %-14s", "l2_" * v)
mpi_isroot() && @printf(io, " %-14s", "l2_" * v)
end
end
if :linf_error in analysis_errors
for v in varnames_cons(equations)
@printf(io, " %-14s", "linf_" * v)
mpi_isroot() && @printf(io, " %-14s", "linf_" * v)
end
end
if :conservation_error in analysis_errors
for v in varnames_cons(equations)
@printf(io, " %-14s", "cons_" * v)
mpi_isroot() && @printf(io, " %-14s", "cons_" * v)
end
end
if :residual in analysis_errors
for v in varnames_cons(equations)
@printf(io, " %-14s", "res_" * v)
mpi_isroot() && @printf(io, " %-14s", "res_" * v)
end
end
if :l2_error_primitive in analysis_errors
for v in varnames_prim(equations)
@printf(io, " %-14s", "l2_" * v)
mpi_isroot() && @printf(io, " %-14s", "l2_" * v)
end
end
if :linf_error_primitive in analysis_errors
for v in varnames_prim(equations)
@printf(io, " %-14s", "linf_" * v)
mpi_isroot() && @printf(io, " %-14s", "linf_" * v)
end
end

for quantity in analysis_integrals
@printf(io, " %-14s", pretty_form_ascii(quantity))
mpi_isroot() && @printf(io, " %-14s", pretty_form_ascii(quantity))
end

println(io)
mpi_isroot() && println(io)
end

end
Expand All @@ -155,18 +155,18 @@ function (analysis_callback::AnalysisCallback)(integrator)

@timeit_debug timer() "analyze solution" begin
# General information
println()
println("-"^80)
mpi_isroot() && println()
ranocha marked this conversation as resolved.
Show resolved Hide resolved
mpi_isroot() && println("-"^80)
# TODO: Taal refactor, polydeg is specific to DGSEM
println(" Simulation running '", get_name(equations), "' with POLYDEG = ", polydeg(solver))
println("-"^80)
println(" #timesteps: " * @sprintf("% 14d", iter) *
" " *
" run time: " * @sprintf("%10.8e s", runtime_absolute))
println(" dt: " * @sprintf("%10.8e", dt) *
" " *
" Time/DOF/rhs!: " * @sprintf("%10.8e s", runtime_relative))
println(" sim. time: " * @sprintf("%10.8e", t))
mpi_isroot() && println(" Simulation running '", get_name(equations), "' with polydeg = ", polydeg(solver))
mpi_isroot() && println("-"^80)
mpi_isroot() && println(" #timesteps: " * @sprintf("% 14d", iter) *
" " *
" run time: " * @sprintf("%10.8e s", runtime_absolute))
mpi_isroot() && println(" dt: " * @sprintf("%10.8e", dt) *
" " *
" Time/DOF/rhs!: " * @sprintf("%10.8e s", runtime_relative))
mpi_isroot() && println(" sim. time: " * @sprintf("%10.8e", t))

# Level information (only show for AMR)
uses_amr = false
Expand All @@ -190,20 +190,20 @@ function (analysis_callback::AnalysisCallback)(integrator)
max_level = max(max_level, current_level)
end

println(" #elements: " * @sprintf("% 14d", nelements(solver, cache)))
mpi_isroot() && println(" #elements: " * @sprintf("% 14d", nelements(solver, cache)))
for level = max_level:-1:min_level+1
println(" ├── level $level: " * @sprintf("% 14d", count(isequal(level), levels)))
mpi_isroot() && println(" ├── level $level: " * @sprintf("% 14d", count(isequal(level), levels)))
end
println(" └── level $min_level: " * @sprintf("% 14d", count(isequal(min_level), levels)))
mpi_isroot() && println(" └── level $min_level: " * @sprintf("% 14d", count(isequal(min_level), levels)))
end
println()
mpi_isroot() && println()

# Open file for appending and store time step and time information
if analysis_callback.save_analysis
if analysis_callback.save_analysis && mpi_isroot()
io = open(joinpath(analysis_callback.output_directory, analysis_callback.analysis_filename), "a")
@printf(io, "% 9d", iter)
@printf(io, " %10.8e", t)
@printf(io, " %10.8e", dt)
mpi_isroot() && @printf(io, "% 9d", iter)
ranocha marked this conversation as resolved.
Show resolved Hide resolved
mpi_isroot() && @printf(io, " %10.8e", t)
mpi_isroot() && @printf(io, " %10.8e", dt)
end

# the time derivative can be unassigned before the first step is made
Expand All @@ -220,109 +220,109 @@ function (analysis_callback::AnalysisCallback)(integrator)
# Variable names required for L2 error, Linf error, and conservation error
if any(q in analysis_errors for q in
(:l2_error, :linf_error, :conservation_error, :residual))
print(" Variable: ")
mpi_isroot() && print(" Variable: ")
for v in eachvariable(equations)
@printf(" %-14s", varnames_cons(equations)[v])
mpi_isroot() && @printf(" %-14s", varnames_cons(equations)[v])
end
println()
mpi_isroot() && println()
end

# Calculate L2/Linf errors, which are also returned by analyze_solution
l2_error, linf_error = calc_error_norms(u, t, analyzer, semi)

# L2 error
if :l2_error in analysis_errors
print(" L2 error: ")
mpi_isroot() && print(" L2 error: ")
for v in eachvariable(equations)
@printf(" % 10.8e", l2_error[v])
analysis_callback.save_analysis && @printf(io, " % 10.8e", l2_error[v])
mpi_isroot() && @printf(" % 10.8e", l2_error[v])
analysis_callback.save_analysis && mpi_isroot() && @printf(io, " % 10.8e", l2_error[v])
sloede marked this conversation as resolved.
Show resolved Hide resolved
end
println()
mpi_isroot() && println()
end

# Linf error
if :linf_error in analysis_errors
print(" Linf error: ")
mpi_isroot() && print(" Linf error: ")
for v in eachvariable(equations)
@printf(" % 10.8e", linf_error[v])
analysis_callback.save_analysis && @printf(io, " % 10.8e", linf_error[v])
mpi_isroot() && @printf(" % 10.8e", linf_error[v])
analysis_callback.save_analysis && mpi_isroot() && @printf(io, " % 10.8e", linf_error[v])
end
println()
mpi_isroot() && println()
end

# Conservation errror
if :conservation_error in analysis_errors
@unpack initial_state_integrals = analysis_callback
state_integrals = integrate(integrator.u, semi)
sloede marked this conversation as resolved.
Show resolved Hide resolved

print(" |∑U - ∑U₀|: ")
mpi_isroot() && print(" |∑U - ∑U₀|: ")
for v in eachvariable(equations)
err = abs(state_integrals[v] - initial_state_integrals[v])
@printf(" % 10.8e", err)
analysis_callback.save_analysis && @printf(io, " % 10.8e", err)
mpi_isroot() && @printf(" % 10.8e", err)
analysis_callback.save_analysis && mpi_isroot() && @printf(io, " % 10.8e", err)
end
println()
mpi_isroot() && println()
end

# Residual (defined here as the vector maximum of the absolute values of the time derivatives)
if :residual in analysis_errors
print(" max(|Uₜ|): ")
mpi_isroot() && print(" max(|Uₜ|): ")
for v in eachvariable(equations)
# Calculate maximum absolute value of Uₜ
@views res = maximum(abs, view(du, v, ..))
sloede marked this conversation as resolved.
Show resolved Hide resolved
ranocha marked this conversation as resolved.
Show resolved Hide resolved
@printf(" % 10.8e", res)
analysis_callback.save_analysis && @printf(io, " % 10.8e", res)
mpi_isroot() && @printf(" % 10.8e", res)
analysis_callback.save_analysis && mpi_isroot() && @printf(io, " % 10.8e", res)
end
println()
mpi_isroot() && println()
end

# L2/L∞ errors of the primitive variables
if :l2_error_primitive in analysis_errors || :linf_error_primitive in analysis_errors
l2_error_prim, linf_error_prim = calc_error_norms(cons2prim, semi, t)

print(" Variable: ")
mpi_isroot() && print(" Variable: ")
for v in eachvariable(equations)
@printf(" %-14s", varnames_prim(equations)[v])
mpi_isroot() && @printf(" %-14s", varnames_prim(equations)[v])
end
println()
mpi_isroot() && println()

# L2 error
if :l2_error_primitive in analysis_errors
print(" L2 error prim.: ")
mpi_isroot() && print(" L2 error prim.: ")
for v in eachvariable(equations)
@printf("%10.8e ", l2_error_prim[v])
analysis_callback.save_analysis && @printf(io, " % 10.8e", l2_error_prim[v])
mpi_isroot() && @printf("%10.8e ", l2_error_prim[v])
analysis_callback.save_analysis && mpi_isroot() && @printf(io, " % 10.8e", l2_error_prim[v])
end
println()
mpi_isroot() && println()
end

# L∞ error
if :linf_error_primitive in analysis_errors
print(" Linf error pri.:")
mpi_isroot() && print(" Linf error pri.:")
for v in eachvariable(equations)
@printf("%10.8e ", linf_error_prim[v])
analysis_callback.save_analysis && @printf(io, " % 10.8e", linf_error_prim[v])
mpi_isroot() && @printf("%10.8e ", linf_error_prim[v])
analysis_callback.save_analysis && mpi_isroot() && @printf(io, " % 10.8e", linf_error_prim[v])
end
println()
mpi_isroot() && println()
end
end


# additional
for quantity in analysis_integrals
res = analyze(quantity, du, u, t, semi)
sloede marked this conversation as resolved.
Show resolved Hide resolved
@printf(" %-12s:", pretty_form_utf(quantity))
@printf(" % 10.8e", res)
analysis_callback.save_analysis && @printf(io, " % 10.8e", res)
println()
mpi_isroot() && @printf(" %-12s:", pretty_form_utf(quantity))
mpi_isroot() && @printf(" % 10.8e", res)
analysis_callback.save_analysis && mpi_isroot() && @printf(io, " % 10.8e", res)
mpi_isroot() && println()
end
end # GC.@preserve du_ode

println("-"^80)
println()
mpi_isroot() && println("-"^80)
ranocha marked this conversation as resolved.
Show resolved Hide resolved
mpi_isroot() && println()

# Add line break and close analysis file if it was opened
if analysis_callback.save_analysis
if analysis_callback.save_analysis && mpi_isroot()
println(io)
close(io)
end
Expand Down Expand Up @@ -392,4 +392,5 @@ pretty_form_ascii(::Val{:linf_divb}) = "linf_divb"
# specialized implementations specific to some solvers
include("analysis_dg1d.jl")
include("analysis_dg2d.jl")
include("analysis_dg2d_parallel.jl")
include("analysis_dg3d.jl")
Loading