Replies: 1 comment
-
Pinging @0cc4m as our discussion led me to investigate this :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
On my 7900XTX I found that setting MMVQ_MAX_BATCH_SIZE in to 3 mmvq.cuh leads to a sizable performance increase for batchsizes 4 through 8. I would like to implement this optimization and create a PR for this, but I am not quite sure how to proceed:
What would be the best way to implement this? Would setting this conditionally based on the device ID be acceptable? Should I apply this to just the 7900XTX, all RDNA3 cards or even all AMD cards? Looking forwards to some pointers on how to proceed. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions