Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local memory in cuda band-matrix solve #1738

Closed
Tracked by #2632
charleskawczynski opened this issue May 21, 2024 · 0 comments · Fixed by #1735
Closed
Tracked by #2632

Use local memory in cuda band-matrix solve #1738

charleskawczynski opened this issue May 21, 2024 · 0 comments · Fixed by #1735
Assignees

Comments

@charleskawczynski
Copy link
Member

Our cuda implementation of the band matrix solve currently accesses global memory on the backward and forward solve, which is slow. We should at least use some form of local memory (MArray/CuStaticSharedMemory) to improve memory access speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant