-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thread pinning optimization #233
base: scaling_archer2
Are you sure you want to change the base?
Conversation
f07b921
to
9efcbff
Compare
comm = MPI.COMM_WORLD | ||
mpi_size = MPI.Comm_size(comm) | ||
my_rank = MPI.Comm_rank(comm) | ||
|
||
cores_per_numa = 16 | ||
threads_per_rank = Threads.nthreads() | ||
ranks_per_numa = div(cores_per_numa, threads_per_rank) | ||
|
||
# Pin threads so that threads of a MPI rank will be pinned to cores with | ||
# contiguous IDs. This will ensure that | ||
# - When running 16 or less threads per rank, all threads will be pinned to the same | ||
# NUMA region as their master (sharing a memory controller within Infinity fabric) | ||
# - When running 8 or less threads per rank, all threads will be pinned to the same | ||
# Core Complex Die | ||
# - When running 4 or less threads per rank, all threads will be pinned to the same | ||
# Core Complex (sharing a L3 cache) | ||
|
||
my_numa, my_id_in_numa = divrem(my_rank, ranks_per_numa) .+ (1, 0) | ||
pinthreads( numa( my_numa, 1:Threads.nthreads() ) .+ threads_per_rank .* my_id_in_numa ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With ThreadPinning v0.7.3 you can use simply pinthreads(:affinitymask)
I think the failed CI jobs on nightly build here may have been due to the same upstream problems that were causing issues in #236 (comment) but we've now exceeded the 30 day window for being able to re-run workflows. This otherwise looks good to merge to me other than @giordano's suggestion above to use Also just noticed this is set to merge in to |
I've added thread pinning opimizations (thanks to @giordano) that could improve performance on Archer2. I would like to test the performance with