-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features/372 data layout #423
Conversation
…nderlying torch tensor and no longer gets the wrong info from from the numpyfied DNDarray
…of memory layout (row- or column-major).
…ng layout (e.g. from tensor creation with ndmin > tensor.shape).
…rides already matching requested order), removed "NotImplementedError" for order K.
… a given dimension
…t relevant for now
Codecov Report
@@ Coverage Diff @@
## master #423 +/- ##
==========================================
+ Coverage 98.07% 98.09% +0.02%
==========================================
Files 55 55
Lines 11414 11548 +134
==========================================
+ Hits 11194 11328 +134
Misses 220 220
Continue to review full report at Codecov.
|
heat/core/dndarray.py
Outdated
tuple of ints: bytes to step in each dimension when traversing a tensor. | ||
numpy-like usage: self.strides | ||
""" | ||
stride = np.array(self._DNDarray__array.stride()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use torch here instead of numpy? numpy doesnt like GPUs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is even no need to return a array (Numpy or PyTorch) at all as stride() already returns a tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krajsek this is the numpy-like version of stride (output in bytes), I need to be able to multiply it by the element size.
@coquelin77 if you expect problems I'll use list comprehension instead of numpy. Must we be wary of every numpy call then, or only when dealing with large arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o.k., I see
@krajsek this is the numpy-like version of stride (output in bytes), I need to be able to multiply it by the element size.
@coquelin77 if you expect problems I'll use list comprehension instead of numpy. Must we be wary of every numpy call then, or only when dealing with large arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we should always be wary of using numpy. we should aim to use primarily pytorch functions. Also, if i dont misunderstand, wrapping this in a numpy array doesnt really get us anything since its already a torch tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Daniel.
This:
self._DNDarray__array.stride()
is a tuple. In order to provide numpy-like information (output in bytes), this tuple has to be multiplied by the element size of the tensor (scalar, bytes). I need to change the tuple into something that can be multiplied by a scalar. I'll use list comprehension and be done with it.
@@ -22,3 +24,44 @@ def copy(a): | |||
return dndarray.DNDarray( | |||
a._DNDarray__array.clone(), a.shape, a.dtype, a.split, a.device, a.comm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a.order to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, but order is no tensor attribute. I'm not sure I see the need for this. The way to check the memory layout would be to call a.stride() (or a.strides if you want to be numpy-like)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the code and it looks good, but I did not make test myself.
heat/core/dndarray.py
Outdated
tuple of ints: bytes to step in each dimension when traversing a tensor. | ||
numpy-like usage: self.strides | ||
""" | ||
stride = np.array(self._DNDarray__array.stride()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o.k., I see
@krajsek this is the numpy-like version of stride (output in bytes), I need to be able to multiply it by the element size.
@coquelin77 if you expect problems I'll use list comprehension instead of numpy. Must we be wary of every numpy call then, or only when dealing with large arrays?
…row-major and column-major order
Description
Addresses #372. By default, the memory layout of torch tensors is row-major (C-style), where the data are stored first dimension first (rows first in 2 dimensions). Some algorithms may run faster on a column-major memory layout (Fortran-style), where data are stored last dimension first (columns first in 2 dimensions).
In NumPy, the user can define the memory layout of an array thanks to the keyword argument
order
:https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html
This PR introduces the keyword argument
order
for heat factories, hence the possibility of column-major memory layout for heat tensors. Currently, the default is order="C", with the possibility of specifying order="F". Because heat factories are using torch.clone() internally, and clone() does not preserve the memory layout of the tensor for the time being, I'm leaving the implementation of option order="K" for later. This issue will probably be fixed in one of the next PyTorch releases.I'm also introducing the DNDarray properties stride (torch-like) and strides (numpy-like, output in bytes).
Fixes: #372, #403, #404
Changes proposed:
order
for all factories potentially giving rise to tensors with ndim > 1Type of change
Select relevant options.
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Documentation update
Are all split configurations tested and accounted for?
[x] yes [ ] no
Does this change require a documentation update outside of the changes proposed?
[x] yes [ ] no
Does this change modify the behaviour of other functions?
[ ] yes [x] no
Are there code practices which require justification?
[ ] yes [x] no