Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ht.array, closed loophole allowing DNDarray construction with incompatible shapes of local arrays #1034

Merged
merged 59 commits into from
Nov 3, 2022

Conversation

Mystic-Slice
Copy link
Collaborator

@Mystic-Slice Mystic-Slice commented Oct 21, 2022

Description

The reuse of gshape array as the receiving buffer made the code a bit too difficult to understand. So, created a new array neighbour_shape to hold, well...., the neighbour's shape.

The use of MPI.SUM, to find if any process has encountered a mismatch in local shapes with its neighbour, leads to errors because of integer limitations. When two or more processes find a mismatch, the huge negative values are added together and I guess because of integer overflow, they disappear into the unknown and finally, we are left with a positive value. So, it does not create any errors when it should. This can be easily solved by using MPI.MIN instead.

One more thing that I don't understand is, why gshape[i] is allowed to be one less than lshape[i] in line 405. I know there must be a reason for it. I just don't understand why.

Code snippet:

import heat as ht
import torch

rank = ht.communication.MPI_WORLD.rank
dim = rank+1

larray = [[0]*dim]*dim
print("Rank", rank, larray)

d = ht.array(larray, is_split=0)

# literally any operation that involves communication 
# between processes will crash
# like this simple print statement
print(d)

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

skip ci

ClaudiaComito and others added 30 commits April 27, 2022 14:53
* wip: Initial release draft and changelog updater actions configuration

* doc: pr title style guide in contibuting.md

* ci: improved release draft templates

* ci: extra release draft categories

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* doc: parallel tutorial note metioning local and global printing

* doc: extenden local print note with ``ht.local_printing()``

* Fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Updated the tutorial document.

1. Corrected the spelling mistake -> (sigular to single)
2. Corrected the statement -> the number of dimensions is the rank of the array.
3. Made 2 more small changes.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates:
- [github.com/psf/black: 22.3.0 → 22.6.0](psf/black@22.3.0...22.6.0)
mtar and others added 11 commits September 30, 2022 08:26
* Check for split in `__reduce_op`

* Check whether x is distributed

Co-authored-by: mtar <m.tarnawa@fz-juelich.de>

Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
* Update ci worflow action

* Update codecov.yml
* Fix `all`

* Fix `any`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add distributed tests

* Expanded tests for combination of axis/split axis

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
updates:
- [github.com/psf/black: 22.8.0 → 22.10.0](psf/black@22.8.0...22.10.0)
…e-config

[pre-commit.ci] pre-commit autoupdate
@Mystic-Slice Mystic-Slice changed the title Clean up is split Fixed code that verifies lshape while using is_split Oct 21, 2022
@ghost
Copy link

ghost commented Oct 21, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@codecov
Copy link

codecov bot commented Oct 21, 2022

Codecov Report

Merging #1034 (aeb9ef3) into main (ae9dd50) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1034   +/-   ##
=======================================
  Coverage   91.68%   91.68%           
=======================================
  Files          65       65           
  Lines        9978     9978           
=======================================
  Hits         9148     9148           
  Misses        830      830           
Flag Coverage Δ
unit 91.68% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
heat/core/factories.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ClaudiaComito ClaudiaComito added bug Something isn't working and removed chore labels Oct 24, 2022
Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @Mystic-Slice for tackling this. For me it's good to go, just 2 small changes please:

  • Update PR title so that it makes sense in the automated changelog.
  • Update error message (see in-line comment)

Again, great job and thank you so much!

reduction_buffer = np.array(gshape[is_split])
comm.Allreduce(MPI.IN_PLACE, reduction_buffer, MPI.SUM)
reduction_buffer = np.array(neighbour_shape[is_split])
comm.Allreduce(MPI.IN_PLACE, reduction_buffer, MPI.MIN)
if reduction_buffer < 0:
raise ValueError("unable to construct tensor, shape of local data chunk does not match")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rephrase this. How about "Unable to construct DNDarray. Local data slices have inconsistent shapes or dimensions." ? Or something similar

Historical note: in the very early phase, our DNDarray class was called Tensor. Any reference to "tensor" that doesn't refer to torch.tensor is outdated, feel free to update to DNDarray it if you spot one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ClaudiaComito. I have made the change.

for i in range(length):
if i == is_split:
continue
elif lshape[i] != gshape[i] and lshape[i] - 1 != gshape[i]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @Mystic-Slice !

@Mystic-Slice Mystic-Slice changed the title Fixed code that verifies lshape while using is_split ht.array, closed loophole allowing DNDarray construction with incompatible shapes of local arrays Oct 24, 2022
@Mystic-Slice
Copy link
Collaborator Author

I have made the changes. @ClaudiaComito
Thanks for the review.

@ClaudiaComito ClaudiaComito changed the base branch from main to release/1.2.x October 28, 2022 07:05
@ClaudiaComito ClaudiaComito merged commit 771a557 into release/1.2.x Nov 3, 2022
@ClaudiaComito ClaudiaComito deleted the CleanUpIsSplit branch November 3, 2022 10:45
@mtar mtar removed the PR talk label Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working GSoC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants