Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU::DeviceArray] Expand the API of the DeviceArray to ease data transfer and allocations #4689

Open
FabianSchuetze opened this issue Apr 3, 2021 · 5 comments
Labels
kind: request Type of issue status: triage Labels incomplete

Comments

@FabianSchuetze
Copy link
Contributor

FabianSchuetze commented Apr 3, 2021

I was thinking of expanding the API of the DeviceArray class by two functions. Firstly, a member function to download only part of the device data to the host. Secondly, a function to resize the DeviceArray. Expanding the API is motivated by a performance analysis (#4677) we made and to bring the API of the device array in line with STL semantics.

In more detail, the signature of the download function could be:

DeviceArray<T>::download(const T* const device_begin, const T* const device_end, T* host_ptr);

One would pass the two points on the device array from which data should be downloaded and copied and a pointer to the host array at which location data is inserted. At the moment we only have two download functions

/** \brief Downloads data from internal buffer to CPU memory
* \param host_ptr pointer to buffer to download               
* */
void download(T *host_ptr) const;

/** \brief Downloads data from internal buffer to CPU memory
* \param data  host vector to download to                 
* */
template<typename A>
void download(std::vector<T, A>& data) const;

download function that downloads all the data which can be more than necessary.

The resize function is simply:

DeviceArray<T>::resize(int N);

I think implementing a resize function is more involved as it also requires adding another member variable. Currently, the size and capacity of the device array are equal. I think they would have to be different for a resize function to work. What do other people think about this? Does this seem to be a good idea? How could the proposal be improved, and what else do we need to think about? I'd be happy to implement this, but wanted to discuss it first.

@kunaltyagi
Copy link
Member

Instead of pointers to memory, what about using iterators?

@FabianSchuetze
Copy link
Contributor Author

Thank you so much for taking a look at this @kunaltyagi - I very much appreciate it!

I was also first thinking of using iterators instead of pointers, but the DeviceArray API does not offer iterators but does offer pointers.

For comparison, the thrust device vector, for example, offers iterators, and it would be nice for our DeviceArray to have iterators too. Nonetheless, pointers themselves are, of course, sufficient.

@kunaltyagi
Copy link
Member

I'd have preferred iterators, but raw pointers is also ok (given existing API).

Perhaps, instead of device pointer begin and end, we can offer device_offset begin and end? This will not require one check (validity of the device side pointers)

bool download(std::size_t device_begin_offset, std::size_t device_end_offset, T* host_ptr);
// returns false if end_offset < begin_offset

@FabianSchuetze
Copy link
Contributor Author

FabianSchuetze commented May 10, 2021

In contrast to my expectations, adding a resize functionality to the device array, DeviceArray<T>::resize(int N) to avoid recurring allocations did not substantially speed up some code I tested.

@kunaltyagi
Copy link
Member

Device side resize might be faster than data upload/download latency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: request Type of issue status: triage Labels incomplete
Projects
None yet
Development

No branches or pull requests

2 participants