Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support #4

Open
adayton1 opened this issue Nov 13, 2024 · 6 comments
Open

GPU support #4

adayton1 opened this issue Nov 13, 2024 · 6 comments

Comments

@adayton1
Copy link

adayton1 commented Nov 13, 2024

Are there any plans for adding GPU support to verdict? My GPU accelerated code is hitting a big slowdown when I have to switch to host only execution to call some verdict functions. It looks to me as if porting verdict would mostly involve adding __host__ __device__ specifiers to functions and switching from std:: math functions to the corresponding c versions.

@clintonstimpson
Copy link
Collaborator

Hi @adayton1, we do not currently have plans to add GPU support to Verdict. Though there is a chance we could soon use Verdict on the GPU ourselves. Do you have specific changes you want to see?

@adayton1
Copy link
Author

I use the hex and quad functions. But it's trivial enough to port the whole library. I put up pull request #5

@dtaller
Copy link

dtaller commented Dec 16, 2024

Hi. Do you know if the verdict pull request with GPU support, #5 will get merged in?

@clintonstimpson
Copy link
Collaborator

Yes, it will be merged, and was merged last week internally. We'll be pushing a new update to the verdict source to this github repo soon.

@dtaller
Copy link

dtaller commented Dec 16, 2024

Thanks!

@liu15
Copy link

liu15 commented Dec 19, 2024

I've been testing Alan's implementation with ROCM on mi300a (rzadams) and found that the stack memory was getting exhausted for the hex_distortion (and quad_distortion) calls unless I specified 128K(!) gpu stack memory. This was due to the arrays that are sized maxTotalNumberGaussPointsmaxNumberNodes and maxNumberNodesmaxNumberNodes.

For our use, we only have linear elements, so setting maxTotalNumberGaussPoints and maxNumberNodes to 8 fixed the issue for us, but obviously that is not a general solution.

I'm not sure what is the preferred "correct" fix: compile-time switch, multiple implementations, or a templated API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants