Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuLaunchCooperativeKernelMultiDevice #478

Closed
maximusgrey opened this issue Nov 8, 2017 · 11 comments
Closed

cuLaunchCooperativeKernelMultiDevice #478

maximusgrey opened this issue Nov 8, 2017 · 11 comments

Comments

@maximusgrey
Copy link

I have been working with CUDA9 / JavaCPP for a few days and got everything up and running very fast. Thank you!

However I cannot seems to get cuda.cuLaunchCooperativeKernelMultiDevice() working. It takes CUDA_LAUNCH_PARAMS as the first argument and second argument the array size, but what I need is an array of CUDA_LAUNCH_PARAMS. I tried via PointerPointer but that dit not fix things.

Does anyone have a solution on how to call cuLaunchCooperativeKernelMultiDevice for multiple devices?

@saudet saudet added the question label Nov 9, 2017
@saudet
Copy link
Member

saudet commented Nov 9, 2017

CUDA_LAUNCH_PARAMS is a Pointer, which can point to a native array. To allocate an array of size 10, for example, we can call new CUDA_LAUNCH_PARAMS(10).

@saudet saudet closed this as completed Nov 9, 2017
@maximusgrey
Copy link
Author

Thanks for the feedback. I got it to work!

@maximusgrey
Copy link
Author

maximusgrey commented Nov 9, 2017

@saudet , one more question. I have the array of CUDA_LAUNCH_PARAMS working. I can also set all grid and block variables and the kernels executes correctly on the gpus.

Next up is setting the kernel parameters. But each time when I set a kernel parameter like: launchParams.kernelParams(0, new LongPointer(new long[1]) I instantly get a SIGSEGV crash.

@saudet
Copy link
Member

saudet commented Nov 9, 2017 via email

@maximusgrey
Copy link
Author

maximusgrey commented Nov 9, 2017

NVIDIA doc says you have to make this struct:

typedef struct CUDA_LAUNCH_PARAMS_st {
CUfunction function; /**< Kernel to launch */

unsigned int gridDimX;       /**< Width of grid in blocks */

unsigned int gridDimY;       /**< Height of grid in blocks */

unsigned int gridDimZ;       /**< Depth of grid in blocks */

unsigned int blockDimX;      /**< X dimension of each thread block */

unsigned int blockDimY;      /**< Y dimension of each thread block */

unsigned int blockDimZ;      /**< Z dimension of each thread block */

unsigned int sharedMemBytes; /**< Dynamic shared-memory size per thread block in bytes */

CUstream hStream;            /**< Stream identifier */

void **kernelParams;         /**< Array of pointers to kernel parameters */

} CUDA_LAUNCH_PARAMS;`

So I want to set the "void **kernelParams;" pointer. However the cuda. java code only provides these options:

'public native Pointer kernelParams(int i);
public native CUDA_LAUNCH_PARAMS kernelParams(int i, Pointer kernelParams);
@MemberGetter public native @cast("void**") PointerPointer kernelParams();'

So how should I proceed?

@saudet
Copy link
Member

saudet commented Nov 9, 2017

You'll need to allocate your own PointerPointer and pass that...

@maximusgrey
Copy link
Author

Like this? All variants give a SIGSEGV
launchParams.kernelParams(0, new PointerPointer(new IntPointer(new int[1])));
launchParams.kernelParams(0, new PointerPointer(new Pointer()));
launchParams.kernelParams(0, new PointerPointer(new Pointer[] { new Pointer() }));
launchParams.kernelParams(0, new PointerPointer(new Pointer[] { new IntPointer(new int[1]) }));

I would also think that kernelParams(0, pointer) would suggest a normal pointer and when returning the entire array with kernelParams() then I would get a PointerPointer back?

@saudet saudet added the bug label Nov 9, 2017
@saudet
Copy link
Member

saudet commented Nov 9, 2017

That is indeed an issue. We'll have to fix this.

@saudet saudet reopened this Nov 9, 2017
@saudet
Copy link
Member

saudet commented Nov 9, 2017

In the meantime, we can work around that by using Loader.sizeof(CUDA_LAUNCH_PARAMS.class) and Loader.offsetof(CUDA_LAUNCH_PARAMS.class, "kernelParams") with new BytePointer(launchParams).putPointer(..., kernelParams).

@maximusgrey
Copy link
Author

Thanks for the feedback and yes it works!

saudet added a commit to bytedeco/javacpp that referenced this issue Nov 13, 2017
@saudet
Copy link
Member

saudet commented Jan 17, 2018

The fix is included in version 1.4, providing wrappers for CUDA 9.1 now though:
http://search.maven.org/#search%7Cga%7C1%7Cbytedeco%20cuda
Thanks for reporting and testing this out!

@saudet saudet closed this as completed Jan 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants