cuda : fix tensor size calculation for non-split buffer #5145

slaren · 2024-01-26T14:48:36Z

ggml-ci

ikawrakow

It works now. Great!

So, the issue was the async memset?

slaren · 2024-01-26T17:54:50Z

Not quite, the async memset was also not good, but it shouldn't cause issues. I wrote an explanation of the problem in the issue #5137 (comment).

…ganov#5145)" This reverts commit 62fead3.

LostRuins · 2024-01-31T15:18:37Z

Hi @slaren,

@Nexesenex reported that this specific commit caused a major speed regression with multigpu cuda.
I personally have not noticed any speed reductions on my single GPU, but a few others have mentioned a recent speed regression too.

I'm wondering if there's any possibility that the change from cudaMemsetAsync to cudaMemset could have had an adverse impact in performance.

slaren · 2024-01-31T15:21:23Z

The simple answer is no.

…er (ggerganov#5145)"" This reverts commit 3d83ce9.

cuda : fix tensor size calculation for non-split buffer

fbe2045

ggml-ci

slaren mentioned this pull request Jan 26, 2024

Partial GPU offload broken for certain number of offloaded layers #5137

Closed

ggerganov requested a review from ikawrakow January 26, 2024 17:19

ikawrakow approved these changes Jan 26, 2024

View reviewed changes

slaren merged commit 62fead3 into master Jan 26, 2024
53 of 54 checks passed

slaren deleted the sl/cuda-alloc-size-fix branch January 26, 2024 17:59

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jan 31, 2024

Revert "cuda : fix tensor size calculation for non-split buffer (gger…

3d83ce9

…ganov#5145)" This reverts commit 62fead3.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024

cuda : fix tensor size calculation for non-split buffer (ggerganov#5145)

0bbb99a

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Feb 5, 2024

Revert "Revert "cuda : fix tensor size calculation for non-split buff…

f864fa4

…er (ggerganov#5145)"" This reverts commit 3d83ce9.

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

cuda : fix tensor size calculation for non-split buffer (ggerganov#5145)

bd79518

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda : fix tensor size calculation for non-split buffer #5145

cuda : fix tensor size calculation for non-split buffer #5145

slaren commented Jan 26, 2024

ikawrakow left a comment

slaren commented Jan 26, 2024

LostRuins commented Jan 31, 2024

slaren commented Jan 31, 2024

cuda : fix tensor size calculation for non-split buffer #5145

cuda : fix tensor size calculation for non-split buffer #5145

Conversation

slaren commented Jan 26, 2024

ikawrakow left a comment

Choose a reason for hiding this comment

slaren commented Jan 26, 2024

LostRuins commented Jan 31, 2024

slaren commented Jan 31, 2024