Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit GPU memory usage? #75

Open
sef43 opened this issue Jan 9, 2024 · 4 comments
Open

Limit GPU memory usage? #75

sef43 opened this issue Jan 9, 2024 · 4 comments

Comments

@sef43
Copy link

sef43 commented Jan 9, 2024

Hello,

When running on a GPU that might be doing something else I am sometimes seeing out of memory errors:
CUDA Error of GINTint2e_jk_kernel: out of memory

Is it possible to specify a hard limit on the amount of memory used by these kernels?

@wxj6000
Copy link
Collaborator

wxj6000 commented Jan 9, 2024

GPU memory is mostly allocated via CuPy. You can set the memory limit via CuPy if you hope GPU can do something else. https://docs.cupy.dev/en/stable/user_guide/memory.html#limiting-gpu-memory-usage

Although GINT* kernels do not allocate global memory explicitly, those kernels allocate a lot of local memory for high angular momentums. Those local memory are eventually allocated on global memory. So for high angular momentums, you probably still have the 'out of memory' issue.

@sef43
Copy link
Author

sef43 commented Jan 9, 2024

thank you for the explanation

@sef43
Copy link
Author

sef43 commented Oct 2, 2024

Hello, I am reopening this issue.

I have found that if I turn on CUDA_MPS and limit the number of active threads with this command:
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=50
I find that a calculation that usually fails with the CUDA Error of GINTint2e_jk_kernel: out of memory will succeed (taking only 1.5x longer, not 2x longer)

My understanding is that this reduces the local/shared memory in use at once, stopping the errors, at the expense of runtime.

Is it possible to do a similar modification at runtime, or compile time, in the code?

Maybe these values:?

// threads for GPU
#define THREADSX 16
#define THREADSY 16
#define THREADS (THREADSX * THREADSY)
#define MAX_STREAMS 16

@wxj6000
Copy link
Collaborator

wxj6000 commented Oct 3, 2024

This is a good suggestion. If you turn off some threads, there is no need to allocate local memory for those threads. We can take it as one of the possible solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants