-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to share Big Array, like a lookup table among various kernel calls #55
Comments
Constant with __ prefix type arrays can have a maximum of 64k elements and they are not shared between different kernels anyway
Every kernel has its unique compilation unit. So to evade duplication of big data, you should use it as kernel parameter backed by ClArray. Since it is non-changing data, you should copy it only once or even initialize on gpu-side. Then use it with read/write disabled.
but make sure data is initialized before array-usage (with flags after this state change). This will make all kernels use same parameter data without duplicating it for each kernel but it will duplicate for each GPU which is a hardware issue. To overcome this, you can use zero copy flag.
As you guess, this is for "streaming" type calculations where each element is read only once (you're free to read/write multiple times but performance is bad for that) and data only moves to GPU when it is needed. This is a direct RAM access by all GPUs using it. Reading from same cell by multiple GPUs is legal but writing by any GPU and reading by another concurrently is illegal. Sorry for late reply. |
Using readonly flag makes an optimization for cases like:
This is for hardware performance, not software. hence, the "flag" setting. For software side, decorating the parameter with const (and maybe "restrict" too)
should enable Nvidia's fast data path optimizations or AMD's equivalent for the kernel-side data loading. |
No, there is no equivalent of setting constant arrays from CPU command(like cudaMemcpyToSymbol), I'm sorry. |
If there is "initialize once, use always" scenario, then I'd do this:
that for loop of yours could have a flag change in first step, maybe thats all needed. But, unique shared variables have to be used as "parameter" of kernel. If there is "initialize frequently, load it always" scenario:
If GPU data duplication is an issue (because of shared-distributed architecture going on background),
|
Lastly, only OpenCL 2.0 supports static variables on global scope and it was not tested. I guess it works but only for same kernel (a kernel2 would still see another copy of its own) but OpenCL 2.0 still limits it by "const initializer expression" . I think CUDA is much more advanced than OpenCL in this case as you can even change gpu constant arrays from host. You don't need to worry about equivalent of |
Thank you for your time. |
Hi Tugrul ,
is it possible to share some readonly array(bigsize) between same kernel calls ?
something like , cudaMemcpyToSymbol
in your Opencl 1.2 API
Or is it possible to use 1 kernel to load Bigarray and use second kernel to access that ?
Kindly look at the code below
The text was updated successfully, but these errors were encountered: