-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix edge-case contiguity mismatch for Allgatherv #1058
Fix edge-case contiguity mismatch for Allgatherv #1058
Conversation
👇 Click on the image for a new way to code review
Legend |
Codecov Report
@@ Coverage Diff @@
## release/1.2.x #1058 +/- ##
=================================================
+ Coverage 91.76% 91.80% +0.03%
=================================================
Files 65 65
Lines 10024 10075 +51
=================================================
+ Hits 9199 9249 +50
- Misses 825 826 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ClaudiaComito
You added extra lines for arrays with different splits when the number of MPI processes is one. It doesn't make much sense to pass an argument when the split won't take place on one process. What do you think about disallowing splits or automatically setting he value to None
when only a single process is involved at array creation time? It would save us some tests/checks.
# simple case, contiguous memory can be transmitted as is | ||
if is_contiguous is None: | ||
# determine local contiguity | ||
is_contiguous = obj.is_contiguous() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the value is different on the processes. How likely is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mtar, that's a great question. The obvious case in which this might happen is the permutation, and this is dealt with in this PR. Outside of that, we're simply falling back to the previous implementation.
I could add a global check that sets is_contiguous
to False if the local contiguities are dishomogeneous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ClaudiaComito
You added extra lines for arrays with different splits when the number of MPI processes is one. It doesn't make much sense to pass an argument when the split won't take place on one process. What do you think about disallowing splits or automatically setting he value to
None
when only a single process is involved at array creation time? It would save us some tests/checks.
This is a general discussion worth having, maybe not re: this bug fix.
My main argument against setting all splits to None when running on 1 MPI process, is that it will be confusing for users while they are testing their code (potentially on 1 process or even interactively).
Anyway, let's discuss it in a separate Issue.
As far as I'm concerned, I'm done with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mtar, that's a great question. The obvious case in which this might happen is the permutation, and this is dealt with in this PR. Outside of that, we're simply falling back to the previous implementation.
I could add a global check that sets
is_contiguous
to False if the local contiguities are dishomogeneous.
I've decided not to add (yet another) global check for contiguous status for now, as I can't think of the appropriate edge-case to test it. We are already testing for column-first memory layout operations. If anybody can think of something, let me know.
Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
Co-authored-by: mtar <m.tarnawa@fz-juelich.de>
Description
Function
communication.mpi_type_and_elements_of()
calculates type and number of elements that will be sent/received in anAllgatherv
call. How to calculate the number of elements depends on whether the input object (most likely a torch Tensor) is contiguous.In some edge cases, i.e. in case of singleton split dimension,
torch.Tensor.is_contiguous()
might returnTrue
on a process while it isFalse
on others. This result in a mismatch of the send/recv elements among processes in Allgatherv and resulting deadlock (see #1057 ).Issue/s resolved: #1057
Changes proposed:
is_contiguous
boolean as kwarg forcommunication.as_buffer()
andcommunication.mpi_type_and_elements_of()
. It is set to False on all processes if object dimensions have been permuted, independently on the size of the dimensiondndarray.resplit_()
,manipulations.resplit()
,linalg.matmul()
, and teststest_suites.basic_tests.TestCase.assert_array_equal
, local tensors are compared to the relevant slices of the numpy reference array, instead of gathering the distributed DNDarray every time. TODO: Same should be implemented in Avoid unnecessarygather
ing of distributed operand in mixed distributed/non-distributedlogical
functions #1064ht.allclose
now works on operands with different dtype as well. Related to ht.allclose should factor in the dtype of the inputs when determining the limits #889Type of change
Memory requirements
NA
Performance
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
no