Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda : fix tensor size calculation for non-split buffer #5145

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Jan 26, 2024

Fixes #5137

Copy link
Contributor

@ikawrakow ikawrakow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works now. Great!

So, the issue was the async memset?

@slaren
Copy link
Collaborator Author

slaren commented Jan 26, 2024

Not quite, the async memset was also not good, but it shouldn't cause issues. I wrote an explanation of the problem in the issue #5137 (comment).

@slaren slaren merged commit 62fead3 into master Jan 26, 2024
53 of 54 checks passed
@slaren slaren deleted the sl/cuda-alloc-size-fix branch January 26, 2024 17:59
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jan 31, 2024
@LostRuins
Copy link
Collaborator

Hi @slaren,

@Nexesenex reported that this specific commit caused a major speed regression with multigpu cuda.
I personally have not noticed any speed reductions on my single GPU, but a few others have mentioned a recent speed regression too.

I'm wondering if there's any possibility that the change from cudaMemsetAsync to cudaMemset could have had an adverse impact in performance.

@slaren
Copy link
Collaborator Author

slaren commented Jan 31, 2024

The simple answer is no.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Feb 5, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Partial GPU offload broken for certain number of offloaded layers
3 participants