Features/425 resplit rework #520

coquelin77 · 2020-04-01T19:17:11Z

Description

Changes the resplit function to work with a tiling function and not send tiles around. Will likely not be as efficient as an Alltoallv but it maintains the order

Issue/s resolved: #425 #476

Changes proposed:

create a tiling class which has tile divisions along what would be the split axis in every dimension
resplit between two axes now uses a home-brew approach where tiles are communicated (no alltoallv)
both In-place and out-of-place resplit now requires that the free space on node is equal at least twice the local data size.

Type of change

Remove irrelevant options:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Documentation update

Due Diligence

All split configurations tested
Multiple dtypes tested in relevant functions
Documentation updated (if needed)
Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

…nted

codecov · 2020-04-01T19:25:24Z

Codecov Report

Merging #520 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #520   +/-   ##
=======================================
  Coverage   96.40%   96.40%           
=======================================
  Files          75       75           
  Lines       14834    14834           
=======================================
  Hits        14300    14300           
  Misses        534      534

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc7248e...dc7248e. Read the comment docs.

…holtz-analytics/heat into features/425-resplit-rework

Markus-Goetz

I am happy that it makes resplit work. But I do have two concerns:

Complexity of this whole thing is O(p). Alltoall would optimally do it in O(log(p)). The performance difference we are seeing here is perhaps due to async operations, rather than the breakdown into separate sends.
The memory footprint of this operations is unfortunate. I mean we need to have two buffers anyway, i.e. in and out, but not a doubled in. If we are thinking about GPUs where memory is scarce anyway, a double input array might be too much

Markus-Goetz · 2020-04-03T13:22:25Z

heat/core/dndarray.py

@@ -2425,17 +2428,34 @@ def __redistribute_shuffle(self, snd_pr, send_amt, rcv_pr, snd_dtype):
            if snd_pr > rcv_pr:  # data passed from a higher rank (append to bottom)
                self.__array = torch.cat((self.__array, data), dim=self.split)

-    def resplit_(self, axis=None):


Why was the default parameter removed?

because this function is determined to be inplace by the underscore. (i believe that this comment may be outdated though)

sorry i missed this, fixing now

Markus-Goetz · 2020-04-03T13:33:24Z

heat/core/tiling.py

+        w_size = arr.comm.size
+        for ax in range(arr.numdims):
+            if arr.split is None or not ax == arr.split:
+                size = arr.gshape[ax]


Can we recycle splits_and_chunks from communication.py?

to use that function we would need to loop over the number of ranks as well. although that function could be changed to use this method, then it could be used

heat/core/tiling.py

Markus-Goetz · 2020-04-03T13:34:13Z

heat/core/tiling.py

+        self.__tile_dims = tile_dims
+
+    @staticmethod
+    def set_tile_locations(split, tile_dims, arr):


should probably be internal, two leading underscores

it is external for the resplit function to generate where the tiles will go

CHANGELOG.md

heat/core/dndarray.py

heat/core/manipulations.py

Markus-Goetz · 2020-04-03T14:00:08Z

heat/core/tiling.py

+            lkey.extend([slice(0, None)] * (arr.numdims - len(key)))
+            key = lkey
+        for d in range(arr.numdims):
+            # todo: implement advanced indexing (lists of positions to iterate through


Please transform into issue if generally approved

Markus-Goetz · 2020-04-03T14:00:21Z

heat/core/tiling.py

+        [2] tensor([[19., 20.],
+        [2]         [26., 27.]])
+        """
+        # todo: strides can be implemented with using a list of slices for each dimension


Please turn into issue if generally approved

Markus-Goetz · 2020-04-03T14:02:58Z

heat/core/dndarray.py

+                to_send = self.tiles[key]
+                if spr == rank and spr != rpr:
+                    waits.append(self.comm.Isend(to_send.clone(), dest=rpr, tag=rank))
+                elif spr == rpr == rank:


I am not sure that this is what you want here. This can only evaluate to True on rank 1 and above

…g oop resplit to prev commits

ClaudiaComito · 2020-04-17T15:53:30Z

OK, I've fleetingly looked at the code, and run a few tests resplitting a 50,000x50,000 float tensor from 1 to 0. On 15 nodes this implementation isn't that much slower than the Alltoallv implementation. I'm for approving now, improving - esp. because of the space requirements - later. We need a working resplit. Thanks @coquelin77 !

coquelin77 added 12 commits March 29, 2020 13:13

created init function for SplitTiles

278a8f0

added call to create the SplitTiles in DNDarray

1266240

SplitTiles:added properties, setitem, and getitem, strides NOT implem…

d514af3

…nted

minor notes here about things to do in the future

34ad210

resplit working with initial tests

ceb230a

pre-commit changes

998b686

working on split=None changes

8fc6fbf

updating docs

cc72664

Merge branch 'master' into features/425-resplit-rework

f8ffaa1

more resplit tests

017391b

added raises to split tiles

a3bfe62

added raises and tests for set/get in tiles

e44fcce

coquelin77 requested review from Markus-Goetz and Cdebus April 1, 2020 19:17

coquelin77 added 2 commits April 1, 2020 21:17

doc updates and changelog entries

2fe092c

dead code removal

0f77dcb

coquelin77 added 6 commits April 1, 2020 21:50

splitTiles, indexing fix in getitem + coverage increase

5d989dd

fixed test syntax

80a1ead

updated links in changelog

5bc54df

Merge branch 'master' into features/425-resplit-rework

ee4caf4

dtype calls addedin sending in resplit

031e554

Merge branch 'features/425-resplit-rework' of https://github.com/helm…

0e9e0eb

…holtz-analytics/heat into features/425-resplit-rework

coquelin77 mentioned this pull request Apr 2, 2020

Datatype tiling for large communication #456

Open

coquelin77 added 2 commits April 3, 2020 08:35

resplit uses zeros_like again

c6c42a0

resplit uses nonblocking sends now

65680f1

Markus-Goetz requested changes Apr 3, 2020

View reviewed changes

coquelin77 added 3 commits April 5, 2020 18:24

resplit in-place working, oop broken while creating changes, resettin…

19f9c2e

…g oop resplit to prev commits

getting slices in getitem abstracted

a70f6b5

oop resplit reset

50d6317

coquelin77 added 12 commits April 5, 2020 18:33

cleanup before profiling

038de79

moved resplit to manipulations

5819569

added correctness tests for resplit and resplit_

c3eaff1

minor formating changes in setitem

8ce07fb

made SplitTiles its own object, not part of DNDarray

cc93d69

Merge branch 'master' into features/425-resplit-rework

cf49fcf

changelog update

4395bd3

changelog update

6d62960

PR requested changes

294a215

test repairs to SplitTiles

fb6c98e

reset default params for resplit and resplit_

4fd892b

Merge branch 'master' into features/425-resplit-rework

8016b51

Merge branch 'master' into features/425-resplit-rework

dc7248e

Markus-Goetz merged commit f530f79 into master Apr 20, 2020

Markus-Goetz deleted the features/425-resplit-rework branch April 20, 2020 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/425 resplit rework #520

Features/425 resplit rework #520

coquelin77 commented Apr 1, 2020

codecov bot commented Apr 1, 2020 •

edited

Loading

Markus-Goetz left a comment

Markus-Goetz Apr 3, 2020

coquelin77 Apr 6, 2020

coquelin77 Apr 6, 2020

Markus-Goetz Apr 3, 2020

coquelin77 Apr 6, 2020

Markus-Goetz Apr 3, 2020

coquelin77 Apr 6, 2020

Markus-Goetz Apr 3, 2020

Markus-Goetz Apr 3, 2020

Markus-Goetz Apr 3, 2020

ClaudiaComito commented Apr 17, 2020

Features/425 resplit rework #520

Features/425 resplit rework #520

Conversation

coquelin77 commented Apr 1, 2020

Description

Changes proposed:

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

codecov bot commented Apr 1, 2020 • edited Loading

Codecov Report

Markus-Goetz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ClaudiaComito commented Apr 17, 2020

codecov bot commented Apr 1, 2020 •

edited

Loading