Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/425 resplit rework #520

Merged
merged 38 commits into from
Apr 20, 2020
Merged

Conversation

coquelin77
Copy link
Member

Description

Changes the resplit function to work with a tiling function and not send tiles around. Will likely not be as efficient as an Alltoallv but it maintains the order

Issue/s resolved: #425 #476

Changes proposed:

  • create a tiling class which has tile divisions along what would be the split axis in every dimension
  • resplit between two axes now uses a home-brew approach where tiles are communicated (no alltoallv)
  • both In-place and out-of-place resplit now requires that the free space on node is equal at least twice the local data size.

Type of change

Remove irrelevant options:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Documentation update

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

@codecov
Copy link

codecov bot commented Apr 1, 2020

Codecov Report

Merging #520 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #520   +/-   ##
=======================================
  Coverage   96.40%   96.40%           
=======================================
  Files          75       75           
  Lines       14834    14834           
=======================================
  Hits        14300    14300           
  Misses        534      534           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc7248e...dc7248e. Read the comment docs.

Copy link
Member

@Markus-Goetz Markus-Goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy that it makes resplit work. But I do have two concerns:

  1. Complexity of this whole thing is O(p). Alltoall would optimally do it in O(log(p)). The performance difference we are seeing here is perhaps due to async operations, rather than the breakdown into separate sends.

  2. The memory footprint of this operations is unfortunate. I mean we need to have two buffers anyway, i.e. in and out, but not a doubled in. If we are thinking about GPUs where memory is scarce anyway, a double input array might be too much

@@ -2425,17 +2428,34 @@ def __redistribute_shuffle(self, snd_pr, send_amt, rcv_pr, snd_dtype):
if snd_pr > rcv_pr: # data passed from a higher rank (append to bottom)
self.__array = torch.cat((self.__array, data), dim=self.split)

def resplit_(self, axis=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the default parameter removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this function is determined to be inplace by the underscore. (i believe that this comment may be outdated though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i missed this, fixing now

w_size = arr.comm.size
for ax in range(arr.numdims):
if arr.split is None or not ax == arr.split:
size = arr.gshape[ax]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we recycle splits_and_chunks from communication.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to use that function we would need to loop over the number of ranks as well. although that function could be changed to use this method, then it could be used

heat/core/tiling.py Show resolved Hide resolved
self.__tile_dims = tile_dims

@staticmethod
def set_tile_locations(split, tile_dims, arr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably be internal, two leading underscores

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is external for the resplit function to generate where the tiles will go

CHANGELOG.md Show resolved Hide resolved
heat/core/dndarray.py Outdated Show resolved Hide resolved
heat/core/manipulations.py Show resolved Hide resolved
lkey.extend([slice(0, None)] * (arr.numdims - len(key)))
key = lkey
for d in range(arr.numdims):
# todo: implement advanced indexing (lists of positions to iterate through
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please transform into issue if generally approved

[2] tensor([[19., 20.],
[2] [26., 27.]])
"""
# todo: strides can be implemented with using a list of slices for each dimension
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please turn into issue if generally approved

to_send = self.tiles[key]
if spr == rank and spr != rpr:
waits.append(self.comm.Isend(to_send.clone(), dest=rpr, tag=rank))
elif spr == rpr == rank:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this is what you want here. This can only evaluate to True on rank 1 and above

@ClaudiaComito
Copy link
Contributor

OK, I've fleetingly looked at the code, and run a few tests resplitting a 50,000x50,000 float tensor from 1 to 0. On 15 nodes this implementation isn't that much slower than the Alltoallv implementation. I'm for approving now, improving - esp. because of the space requirements - later. We need a working resplit. Thanks @coquelin77 !

@Markus-Goetz Markus-Goetz merged commit f530f79 into master Apr 20, 2020
@Markus-Goetz Markus-Goetz deleted the features/425-resplit-rework branch April 20, 2020 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resplit from 1 -> 0 gives wrong order
3 participants