Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel postprocessing on Vorna #27

Closed
henry2004y opened this issue Sep 9, 2021 · 5 comments
Closed

Parallel postprocessing on Vorna #27

henry2004y opened this issue Sep 9, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@henry2004y
Copy link
Owner

For some unknown reasons, Julia cannot launch more than 8 worker processes on Vorna which has 16 cores/node consisted of 2 CPUs. Weird.

@henry2004y henry2004y added the bug Something isn't working label Sep 9, 2021
@henry2004y
Copy link
Owner Author

henry2004y commented Sep 9, 2021

Worker 16 terminated.
ERROR: LoadError: ProcessExitedException(16)
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:369
 [2] macro expansion
   @ ./task.jl:388 [inlined]
 [3] _require_callback(mod::Base.PkgId)
   @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/Distributed.jl:76
 [4] #invokelatest#2
   @ ./essentials.jl:708 [inlined]
 [5] invokelatest
   @ ./essentials.jl:706 [inlined]
 [6] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:920
 [7] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:901
 [8] top-level scope
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/macros.jl:204
in expression starting at /wrk/users/hongyang/result/demo_1d2d_parallel_pyplot.jl:6
      From worker 16:	OpenBLAS blas_thread_init: pthread_create failed for thread 5 of 8: Resource temporarily unavailable
      From worker 16:	OpenBLAS blas_thread_init: RLIMIT_NPROC 200 current, 257413 max
...

I found that ClusterManagers.jl is the recommended way to handle cross-node cluster jobs. I've tested on Vorna with 2nodes, 32 cores successfully. One thing that might look a little bit annoying is that there is a standard output file for every process launched in Julia. Maybe there's also a way to turn if off?

Also note that someone complained about the slowness when calling @everywhere for the first time. He tried to launch a job with over 1000 cores and it takes more than an hour for the 1st @everywhere call.

@henry2004y
Copy link
Owner Author

henry2004y commented Sep 9, 2021

Another observation is that as time goes by, the time it takes to generate plots becomes longer on Vorna. Not sure if this is something related to the machine or if there's any memory leak.

Not sure about how to address this. Shall I turn on the profiler?

@henry2004y
Copy link
Owner Author

henry2004y commented Sep 30, 2021

After looking more into multiprocessing, I realized that my current implementation with broadcasting pattern like

const cmap = matplotlib.cm.turbo
@everywhere cmap = $cmap

is actually type-unstable. This can be verified by attempting to change the type of cmap on the remote process: Julia won't complain, which indicates that this is bad for performance.

I need to think of a better way to handle parameters.


By making the common parameters type-stable, we have some improvements on memory usage. For example, for the 2D parallel contour plots with PyPlot, the previous version has

julia> @time include("demo_2d_parallel_pyplot.jl")
Total number of files: 3
Running with 1 workers...
      From worker 2:	filename = ./bulk.0001347.vlsv
      From worker 2:	filename = ./bulk.0001348.vlsv
      From worker 2:	filename = ./bulk.0001492.vlsv
┌ Warning: Less than 1GB free memory detected. Using memory-mapped I/O!
└ @ Vlasiator ~/.julia/packages/Vlasiator/muKF0/src/vlsv/vlsvreader.jl:127
Finished!
 54.947133 seconds (26.79 M allocations: 1.597 GiB, 1.19% gc time, 5.08% compilation time)

while the modified version has

julia> @time include("/home/hongyang/Vlasiator/Vlasiator.jl/examples/demo_2d_parallel_pyplot.jl")
Total number of files: 3
Running with 1 workers...
      From worker 2:	file = ./bulk.0001347.vlsv
      From worker 2:	file = ./bulk.0001348.vlsv
      From worker 2:	file = ./bulk.0001492.vlsv
┌ Warning: Less than 1GB free memory detected. Using memory-mapped I/O!
└ @ Vlasiator ~/.julia/packages/Vlasiator/muKF0/src/vlsv/vlsvreader.jl:127
Finished!
 53.315913 seconds (17.11 M allocations: 1.021 GiB, 0.65% gc time, 5.20% compilation time)

henry2004y added a commit that referenced this issue Sep 30, 2021
@henry2004y
Copy link
Owner Author

Now we have learned

  • how to improve on memory usage by taking advantage of type-stability
  • how to run multi-node job with ClusterManagers

I would consider this done.

@henry2004y henry2004y reopened this Oct 6, 2021
@henry2004y
Copy link
Owner Author

henry2004y commented Oct 6, 2021

The parallel contour plotting is still slow as time progresses. Now I have a hypothesis: the default behavior in Matplotlib is appending data to the canvas, but not replacing. This means that with more frames, you are overlapping data and creating a huge memory burden.


Confirmed by tests.
Old method (overlapping):

Total number of files: 3
Running with 1 workers...
      From worker 2:	filename = ./bulk.0001347.vlsv
      From worker 2:	filename = ./bulk.0001348.vlsv
┌ Warning: Less than 1GB free memory detected. Using memory-mapped I/O!
└ @ Vlasiator ~/.julia/packages/Vlasiator/muKF0/src/vlsv/vlsvreader.jl:127
      From worker 2:	filename = ./bulk.0001492.vlsv
Finished in 38.37s.
 60.756959 seconds (26.83 M allocations: 1.599 GiB, 1.22% gc time, 5.21% compilation time)

New method (no overlapping):

Total number of files: 3
Running with 1 workers...
      From worker 2:	file = ./bulk.0001347.vlsv
      From worker 2:	file = ./bulk.0001348.vlsv
┌ Warning: Less than 1GB free memory detected. Using memory-mapped I/O!
└ @ Vlasiator ~/.julia/packages/Vlasiator/muKF0/src/vlsv/vlsvreader.jl:127
      From worker 2:	file = ./bulk.0001492.vlsv
Finished in 26.08s.
 50.044524 seconds (17.15 M allocations: 1.023 GiB, 0.70% gc time, 6.37% compilation time)

This change makes it 13x faster with 8 workers on Vorna!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant