-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ht.array
, closed loophole allowing DNDarray
construction with incompatible shapes of local arrays
#1034
Conversation
* wip: Initial release draft and changelog updater actions configuration * doc: pr title style guide in contibuting.md * ci: improved release draft templates * ci: extra release draft categories * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* doc: parallel tutorial note metioning local and global printing * doc: extenden local print note with ``ht.local_printing()`` * Fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Updated the tutorial document. 1. Corrected the spelling mistake -> (sigular to single) 2. Corrected the statement -> the number of dimensions is the rank of the array. 3. Made 2 more small changes. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates: - [github.com/psf/black: 22.3.0 → 22.6.0](psf/black@22.3.0...22.6.0)
* Check for split in `__reduce_op` * Check whether x is distributed Co-authored-by: mtar <m.tarnawa@fz-juelich.de> Co-authored-by: mtar <m.tarnawa@fz-juelich.de> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
* Update ci worflow action * Update codecov.yml
* Fix `all` * Fix `any` * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add distributed tests * Expanded tests for combination of axis/split axis Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
updates: - [github.com/psf/black: 22.8.0 → 22.10.0](psf/black@22.8.0...22.10.0)
…e-config [pre-commit.ci] pre-commit autoupdate
👇 Click on the image for a new way to code review
Legend |
Codecov Report
@@ Coverage Diff @@
## main #1034 +/- ##
=======================================
Coverage 91.68% 91.68%
=======================================
Files 65 65
Lines 9978 9978
=======================================
Hits 9148 9148
Misses 830 830
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @Mystic-Slice for tackling this. For me it's good to go, just 2 small changes please:
- Update PR title so that it makes sense in the automated changelog.
- Update error message (see in-line comment)
Again, great job and thank you so much!
heat/core/factories.py
Outdated
reduction_buffer = np.array(gshape[is_split]) | ||
comm.Allreduce(MPI.IN_PLACE, reduction_buffer, MPI.SUM) | ||
reduction_buffer = np.array(neighbour_shape[is_split]) | ||
comm.Allreduce(MPI.IN_PLACE, reduction_buffer, MPI.MIN) | ||
if reduction_buffer < 0: | ||
raise ValueError("unable to construct tensor, shape of local data chunk does not match") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rephrase this. How about "Unable to construct DNDarray. Local data slices have inconsistent shapes or dimensions." ? Or something similar
Historical note: in the very early phase, our DNDarray class was called Tensor
. Any reference to "tensor" that doesn't refer to torch.tensor
is outdated, feel free to update to DNDarray
it if you spot one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ClaudiaComito. I have made the change.
for i in range(length): | ||
if i == is_split: | ||
continue | ||
elif lshape[i] != gshape[i] and lshape[i] - 1 != gshape[i]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @Mystic-Slice !
ht.array
, closed loophole allowing DNDarray
construction with incompatible shapes of local arrays
I have made the changes. @ClaudiaComito |
Description
The reuse of gshape array as the receiving buffer made the code a bit too difficult to understand. So, created a new array
neighbour_shape
to hold, well...., the neighbour's shape.The use of
MPI.SUM
, to find if any process has encountered a mismatch in local shapes with its neighbour, leads to errors because of integer limitations. When two or more processes find a mismatch, the huge negative values are added together and I guess because of integer overflow, they disappear into the unknown and finally, we are left with a positive value. So, it does not create any errors when it should. This can be easily solved by usingMPI.MIN
instead.One more thing that I don't understand is, why
gshape[i]
is allowed to be one less thanlshape[i]
in line 405. I know there must be a reason for it. I just don't understand why.Code snippet:
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
no
skip ci