-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenCL] Implement urEnqueueUSMMemcpy2D and allow large fill patterns. #976
Conversation
source/adapters/opencl/enqueue.cpp
Outdated
auto DeleteCallback = [](cl_event, cl_int, void *pUserData) { | ||
delete[] static_cast<uint64_t *>(pUserData); | ||
}; | ||
CL_RETURN_ON_FAILURE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call delete[] HostBuffer
if this call fails. Otherwise there is a potential memory leak. Probably we need to wait until clEnqueueWriteBuffer
is finished to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added here and in the usm one
source/adapters/opencl/usm.cpp
Outdated
auto Info = new DeleteCallbackInfo{USMFree, CLContext, HostBuffer}; | ||
|
||
auto DeleteCallback = [](cl_event, cl_int, void *pUserData) { | ||
static_cast<DeleteCallbackInfo *>(pUserData)->execute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this works, wouldn't it be more intuitive to specify a destructor in DeleteCallbackInfo and then calling delete on the object (instead of using delete this
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, this looks a lot nicer
source/adapters/opencl/usm.cpp
Outdated
cl_context CLContext; | ||
void *HostBuffer; | ||
void execute() { | ||
USMFree(CLContext, HostBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned with the fact that CLContext is required to call USMFree. What will happen if the user calls UrContextRelease
after this function returns?
Did you try to use clEnqueueMemcpyINTEL() with a host pointer allocated with new
? That would avoid having to use HostMemAlloc / USMFree. I'm under the impression that that might work but not 100%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I was adding a destructor anyway I gave the struct a proper constructor with does a retain on the context (with matching Release in the destructor), should prevent the released context scenario. Using a new
pointer would probably work but it seems sketchier.
source/adapters/opencl/usm.cpp
Outdated
Events.data(), cl_adapter::cast<cl_event *>(phEvent)); | ||
} | ||
for (const auto &E : Events) { | ||
clReleaseEvent(E); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should add CL_RETURN_ON_ERROR to this call as well. Same for line 441
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added here, I'm going to leave 441 as it is to prioritise returning the error from the memcpy
Normally OpenCL limits fill type operations to a max pattern size of 128, this patch includes a workaround to extend that.
e3ee65a
to
603dcfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Normally OpenCL limits fill type operations to a max pattern size of 128, this patch includes a workaround to extend that.