Skip unnecessary assertions and enable non-blocking data transfers #195
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This backwards-compatible contribution enables higher performance by skipping assertions when they are unnecessary and allowing for truly non-blocking data transfers to the GPU.
Problem
When profiling the performance of a torch-geometric code using nvprof, I found out that setting
non_blocking=True
when doing data transfers still blocks the current CUDA stream. The underlying reason is that when theto
method is called to initiate a data transfer, we construct a newSparseStorage
object which repeats some assertions that have already been evaluated on this data. The particular assertions run when therow
orcol
tensors are present, checking that their content conforms withsparse_sizes
. The problem with such assertions is that they require a blocking round-trip of communication between the CPU and GPU, which defeats the purpose ofnon_blocking=True
.Solution
The solution to the assertion problem is to use a
trust_data
construction argument as an invariant that tells us whether we need to run these assertions. Data transfers or dtype changes do not need these assertions, because the matrix structure was already checked upon construction. Certain other operations such aseye
do not need them either, because they are correct by construction.In addition, I refactored the dtype and data transfer API to align with
torch.Tensor
and avoid the construction of dummy tensor objects, removing wasted time in the PyTorch allocator.Code changes
trust_data
invariant to skip blocking assertions, when unnecessary, during construction ofSparseStorage
objects.torch.Tensor
while maintaining backward compatibility.WITH_SYMBOLS
option to allow for linking without stripping symbols insetup.py