Skip to content

Commit

Permalink
Add comment on explicit assumption for computing the local rank
Browse files Browse the repository at this point in the history
  • Loading branch information
Caspar van Leeuwen committed May 3, 2024
1 parent fce2a45 commit b868cc1
Showing 1 changed file with 4 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,10 @@
print("Listing visible devices")
for i in range(torch.cuda.device_count()):
print(f"Device {i}: {torch.cuda.device(i)}")
# This assumes compact mapping of ranks to available hardware
# e.g. rank 0-x to node 1, rank x-y to node 2, etc
# Assuming the set_compact_process_binding hook from the EESSI testsuite is called,
# this condition should be satisfied
local_rank = rank - visible_gpus * (rank // visible_gpus)
torch.cuda.set_device(local_rank)
print("Listing visible devices after setting one")
Expand Down

0 comments on commit b868cc1

Please sign in to comment.