Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI jobs in R: performance and behavior #7

Open
Laurae2 opened this issue May 14, 2019 · 0 comments
Open

MPI jobs in R: performance and behavior #7

Laurae2 opened this issue May 14, 2019 · 0 comments

Comments

@Laurae2
Copy link
Owner

Laurae2 commented May 14, 2019

For information, in R you can run MPI jobs if you hit the 125 communication socket limit.

However, the work is spread differently, it must be taken into account. The work could be more realistic using MPI, but its performance for sharing data is another story.

MPI chart behavior:

image

To use MPI, in R, replace the following, make sure to add the MPI type to the cluster spawn:

makeCluster(my_threads, type = "MPI")

BIG WARNING: You should close the MPI cluster with Rmpi::mpi.exit(), otherwise it crashes (it could still crash with Rmpi::mpi.exit()). I would recommend to use that command only at the very end (don't use stopCluster(cl)), and to check afterwards if there are any zombie R processes left in the wild.

Due to how MPI and R socket spreads the work at the beginning, the timings are entirely different from 1 to 7 (except the "[Parallel"] part which is creating processes and sending data to each process). MPI spawns processes faster than R sockets, however it is faster to communicate to R socket processes than MPI processes. The huge difference in sending data might be also because we are using serialization v3 in R, which is the default now in R 3.6.

MPI:

[Parallel] 58 Process(es) Creation Time: 1.102s
[Parallel] Sending Hardware Specifications Time: 24.290s
[Parallel] Sending Data Time: 59.740s
1. Total Time (Σ=6518.746, μ=61.498, σ=9.142)
2. Model Time (Σ=5917.713, μ=55.827, σ=8.112)
3. Matrix Train Build Time (Σ=58.603, μ=0.553, σ=0.113)
4. Matrix Test Build Time (Σ=57.090, μ=0.539, σ=0.115)
5. Predict Time (Σ=357.093, μ=3.369, σ=1.331)
6. Garbage Collector Time (Σ=15.051, μ=0.142, σ=0.040)
7. Metric (Σ=77.781147, μ=0.733784, σ=0.000353)

R socket:

[Parallel] 58 Process(es) Creation Time: 8.146s
[Parallel] Sending Hardware Specifications Time: 0.007s
[Parallel] Sending Data Time: 1.428s
1. Total Time (Σ=6774.940, μ=63.915, σ=20.778)
2. Model Time (Σ=6126.667, μ=57.799, σ=18.843)
3. Matrix Train Build Time (Σ=52.500, μ=0.495, σ=0.154)
4. Matrix Test Build Time (Σ=48.795, μ=0.460, σ=0.110)
5. Predict Time (Σ=320.306, μ=3.022, σ=1.262)
6. Garbage Collector Time (Σ=14.658, μ=0.138, σ=0.048)
7. Metric (Σ=77.781523, μ=0.733788, σ=0.000356)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant