-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features/425 resplit rework #520
Conversation
Codecov Report
@@ Coverage Diff @@
## master #520 +/- ##
=======================================
Coverage 96.40% 96.40%
=======================================
Files 75 75
Lines 14834 14834
=======================================
Hits 14300 14300
Misses 534 534 Continue to review full report at Codecov.
|
…holtz-analytics/heat into features/425-resplit-rework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy that it makes resplit work. But I do have two concerns:
-
Complexity of this whole thing is O(p). Alltoall would optimally do it in O(log(p)). The performance difference we are seeing here is perhaps due to async operations, rather than the breakdown into separate sends.
-
The memory footprint of this operations is unfortunate. I mean we need to have two buffers anyway, i.e. in and out, but not a doubled in. If we are thinking about GPUs where memory is scarce anyway, a double input array might be too much
@@ -2425,17 +2428,34 @@ def __redistribute_shuffle(self, snd_pr, send_amt, rcv_pr, snd_dtype): | |||
if snd_pr > rcv_pr: # data passed from a higher rank (append to bottom) | |||
self.__array = torch.cat((self.__array, data), dim=self.split) | |||
|
|||
def resplit_(self, axis=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the default parameter removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because this function is determined to be inplace by the underscore. (i believe that this comment may be outdated though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i missed this, fixing now
w_size = arr.comm.size | ||
for ax in range(arr.numdims): | ||
if arr.split is None or not ax == arr.split: | ||
size = arr.gshape[ax] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we recycle splits_and_chunks from communication.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to use that function we would need to loop over the number of ranks as well. although that function could be changed to use this method, then it could be used
self.__tile_dims = tile_dims | ||
|
||
@staticmethod | ||
def set_tile_locations(split, tile_dims, arr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably be internal, two leading underscores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is external for the resplit function to generate where the tiles will go
heat/core/tiling.py
Outdated
lkey.extend([slice(0, None)] * (arr.numdims - len(key))) | ||
key = lkey | ||
for d in range(arr.numdims): | ||
# todo: implement advanced indexing (lists of positions to iterate through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please transform into issue if generally approved
[2] tensor([[19., 20.], | ||
[2] [26., 27.]]) | ||
""" | ||
# todo: strides can be implemented with using a list of slices for each dimension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please turn into issue if generally approved
heat/core/dndarray.py
Outdated
to_send = self.tiles[key] | ||
if spr == rank and spr != rpr: | ||
waits.append(self.comm.Isend(to_send.clone(), dest=rpr, tag=rank)) | ||
elif spr == rpr == rank: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that this is what you want here. This can only evaluate to True on rank 1 and above
…g oop resplit to prev commits
OK, I've fleetingly looked at the code, and run a few tests resplitting a 50,000x50,000 float tensor from 1 to 0. On 15 nodes this implementation isn't that much slower than the Alltoallv implementation. I'm for approving now, improving - esp. because of the space requirements - later. We need a working resplit. Thanks @coquelin77 ! |
Description
Changes the resplit function to work with a tiling function and not send tiles around. Will likely not be as efficient as an Alltoallv but it maintains the order
Issue/s resolved: #425 #476
Changes proposed:
Type of change
Remove irrelevant options:
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
no