-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU::DeviceArray] Expand the API of the DeviceArray to ease data transfer and allocations #4689
Comments
Instead of pointers to memory, what about using iterators? |
Thank you so much for taking a look at this @kunaltyagi - I very much appreciate it! I was also first thinking of using iterators instead of pointers, but the DeviceArray API does not offer iterators but does offer pointers. For comparison, the thrust device vector, for example, offers iterators, and it would be nice for our DeviceArray to have iterators too. Nonetheless, pointers themselves are, of course, sufficient. |
I'd have preferred iterators, but raw pointers is also ok (given existing API). Perhaps, instead of device pointer begin and end, we can offer device_offset begin and end? This will not require one check (validity of the device side pointers)
|
In contrast to my expectations, adding a resize functionality to the device array, |
Device side resize might be faster than data upload/download latency |
I was thinking of expanding the API of the DeviceArray class by two functions. Firstly, a member function to download only part of the device data to the host. Secondly, a function to resize the DeviceArray. Expanding the API is motivated by a performance analysis (#4677) we made and to bring the API of the device array in line with STL semantics.
In more detail, the signature of the download function could be:
One would pass the two points on the device array from which data should be downloaded and copied and a pointer to the host array at which location data is inserted. At the moment we only have two download functions
download function that downloads all the data which can be more than necessary.
The resize function is simply:
I think implementing a resize function is more involved as it also requires adding another member variable. Currently, the size and capacity of the device array are equal. I think they would have to be different for a resize function to work. What do other people think about this? Does this seem to be a good idea? How could the proposal be improved, and what else do we need to think about? I'd be happy to implement this, but wanted to discuss it first.
The text was updated successfully, but these errors were encountered: