-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Command R Plus support #6491
Changes from all commits
2efcd87
fbab984
e4b2e2d
c354db7
553b09b
f3532ff
ec613b8
c2658c3
6745ea7
26e8f23
ce9413d
78819c0
8b6577b
d292407
ea1aeba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1225,7 +1225,7 @@ static void ggml_cuda_op_mul_mat_cublas( | |
|
||
// the main device has a larger memory buffer to hold the results from all GPUs | ||
// ldc == nrows of the matrix that cuBLAS writes into | ||
int ldc = id == ctx.device ? ne0 : row_diff; | ||
int64_t ldc = id == ctx.device ? ne0 : row_diff; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be great to update the PR description to summarize why we upcasted all int params to int64 in this cntext. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @phymbert I reverted that one in dranger003@835d702 but it looks like the PR got merged before it could be pulled. My level of knowledge of these is no where near to be on par with those that created the code and so I definitely rely on your reviews. I looked at some of the values through the debugger but since we have so many overflowing I had to change them in batches, so this means I most likely changed some that don't need to be changed. Hopefully this makes some sense. I can submit another PR to master with that last commit, otherwise perplexity was still broken using CUDA for this model without that commit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the explanation, please raise with @ggerganov as I am out of the subject regarding CommandR+ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I opened a PR as a follow-up (#6563) |
||
|
||
const int compute_capability = ggml_cuda_info().devices[id].cc; | ||
|
||
|
@@ -1377,8 +1377,8 @@ static void ggml_cuda_op_mul_mat( | |
const int64_t ne0 = dst->ne[0]; | ||
const int64_t ne1 = dst->ne[1]; | ||
|
||
const int nb2 = dst->nb[2]; | ||
const int nb3 = dst->nb[3]; | ||
const int64_t nb2 = dst->nb[2]; | ||
const int64_t nb3 = dst->nb[3]; | ||
|
||
GGML_ASSERT(ggml_backend_buffer_is_cuda(dst->buffer)); | ||
GGML_ASSERT(ggml_backend_buffer_is_cuda(src1->buffer)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to update the comment