Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: fix --split-mode row race condition #9413

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

JohannesGaessler
Copy link
Collaborator

Fixes #8801 (comment) .

The problem is that --split-mode row uses multiple CUDA streams to overlap data transfer with computation but on CUDA GPUs that are Volta or newer each of these streams allocates a temporary buffer for the stream-k fixup. And because it's not safe to allocate temporary buffers with multiple CUDA streams in parallel this leads to a race condition. However, because both stream-k decomposition and --split-mode row mitigates tail effects it's fine to just not use stream-k if multiple parallel CUDA streams are in use; the performance difference with 3x RTX 4090 is ~1%.

Long-term it may make sense to think about how we can safely handle multiple parallel CUDA streams. I think a good approach would be to move the parallelism from something CUDA-specific to something at the level of GGML compute graphs. That would also enable the reuse of the code for other backends.

@JohannesGaessler JohannesGaessler added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Sep 10, 2024
@slaren
Copy link
Collaborator

slaren commented Sep 10, 2024

You could use a different pool per stream. Since the pools are objects now, it should be a simple change. I thought about moving the implementation of tensor parallelism to ggml_backend_sched, but it is a lot of work for no clear benefit outside of datacenter GPUs with very fast interconnects, and I think it will make backend-specific optimizations harder to implement.

@github-actions github-actions bot added the Nvidia GPU Issues specific to Nvidia GPUs label Sep 10, 2024
@JohannesGaessler JohannesGaessler merged commit 5af118e into ggerganov:master Sep 11, 2024
52 checks passed
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nvidia GPU Issues specific to Nvidia GPUs Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: b3188 breaks row split mode for multiple GPUs
2 participants