-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD support is completly broken - no load is placed on GPU #1684
Comments
@Expro did you set up gpu_layers in the model file? https://localai.io/features/gpu-acceleration/#model-configuration Maybe we should just expose that option from the CLI, or we should default to a high number as if there is no GPU seems harmless. |
I did setup gpu_layers in model file, despite documentation stating in at least 2 places that gpu_layers is only used with cublast, so not for AMD. I swapped between build for single backend vs multiple backends without touching models and one of them ran on GPU, another one didn't/ Defaulting to high number of layers seems like good idea, so does exposure though environment variable and CLI. |
I'd also set threads to 0, and ensure your GPU layers is set to something like 100-120 in your case, to keep it off the CPU. I don't have an AMD GPU to reproduce this issue, so taking a bit of a stab in the dark. |
This is the same sort of issue that I have had for a while, this issue is adjacent to #1592
I will be rebuilding this workstation soon and may have time after next week to do some more build tests and debugging |
This issue should be considered resolved, i have been continuously using my RadeonVII for at least 2 months now without issue |
Right - closing then, and thanks @jtwolfe for confirming! |
LocalAI version:
Tested from 2.6.x to most current commit.
Environment, CPU architecture, OS, and Version:
x64, Fedora 39
Describe the bug
Successfully compiled LocalAI for both CLBlast and HipBlast for lama.cpp backend. DEBUG=true logs says layers are offloaded to GPU, but that's lie - GPU monitoring tools shows 0% load all the time. Placing other workloads on GPU shows load on GPU in monitoring tools, so that's not an issue with monitoring tools.
To Reproduce
Expected behavior
Load is place placed on GPU.
EDIT: Seems like bug only triggers when build only for single backend. I rebuild LocalAI for all backends and this time it works, despite using same backend as was build when building for single backend.
Considering that AMD GPUs gives cheaper access to bigger pool of VRAM it would be very beneficial to have it properly supported.
The text was updated successfully, but these errors were encountered: