Add thread pinning optimization #233

tkoskela · 2023-01-27T16:32:00Z

I've added thread pinning opimizations (thanks to @giordano) that could improve performance on Archer2. I would like to test the performance with

8 ranks per node, 16 threads per rank
16 ranks per node, 8 threads per rank
32 ranks per node, 4 threads per rank

giordano · 2023-02-28T17:26:33Z

extra/weak_scaling/run_particleda.jl

+comm = MPI.COMM_WORLD
+mpi_size = MPI.Comm_size(comm)
+my_rank = MPI.Comm_rank(comm)
+
+cores_per_numa = 16
+threads_per_rank = Threads.nthreads()
+ranks_per_numa = div(cores_per_numa, threads_per_rank)
+
+# Pin threads so that threads of a MPI rank will be pinned to cores with
+# contiguous IDs. This will ensure that
+#  - When running 16 or less threads per rank, all threads will be pinned to the same
+#    NUMA region as their master (sharing a memory controller within Infinity fabric)
+#  - When running 8 or less threads per rank, all threads will be pinned to the same
+#    Core Complex Die
+#  - When running 4 or less threads per rank, all threads will be pinned to the same
+#    Core Complex (sharing a L3 cache)
+
+my_numa, my_id_in_numa = divrem(my_rank, ranks_per_numa) .+ (1, 0)
+pinthreads( numa( my_numa, 1:Threads.nthreads() ) .+ threads_per_rank .* my_id_in_numa )


With ThreadPinning v0.7.3 you can use simply pinthreads(:affinitymask)

matt-graham · 2023-05-25T09:09:16Z

I think the failed CI jobs on nightly build here may have been due to the same upstream problems that were causing issues in #236 (comment) but we've now exceeded the 30 day window for being able to re-run workflows.

This otherwise looks good to merge to me other than @giordano's suggestion above to use pinthreads(:affinitymask). Not sure if there is any other way to trigger a new workflow run than pushing a new commit to this branch as we don't have a workflow_dispatch trigger specified to allow running manually.

Also just noticed this is set to merge in to scaling_archer2 branch - as the extra/weak_scaling/run_particleda.jl script currently on master hasn't been updated for the changes in #232 and so doesn't run, it would probably be worth merging both those changes and those here in to master?

Add thread pinning optimization

e54259e

tkoskela requested a review from DanGiles January 27, 2023 16:32

Undo changes to test environment

9efcbff

tkoskela force-pushed the tk/thread_pinning branch from f07b921 to 9efcbff Compare January 27, 2023 16:33

giordano changed the title ~~Tk/thread pinning~~ Add thread pinning optimization Jan 27, 2023

giordano mentioned this pull request Jan 27, 2023

Run MPI tests with Base.julia_cmd instead of Base.julia_exename #234

Merged

giordano reviewed Feb 28, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add thread pinning optimization #233

Add thread pinning optimization #233

tkoskela commented Jan 27, 2023

giordano Feb 28, 2023

matt-graham commented May 25, 2023

Add thread pinning optimization #233

Are you sure you want to change the base?

Add thread pinning optimization #233

Conversation

tkoskela commented Jan 27, 2023

giordano Feb 28, 2023

Choose a reason for hiding this comment

matt-graham commented May 25, 2023