-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Support for AMD and ROCM via docker containers. #1592
Comments
ps. love the work @mudler |
it should be noted
|
it should be noted that while i do see models load onto the card whenever there is an api call and there are computations being performed pushing the card to 200W of consumption there is never any return from the api call and the apparent inference never terminates |
I don't have an AMD card to test, so this card is up-for-grabs. Things are moving fast, right, but building-wise this is a good time window, there aren't plans to do changes in that code area in the short-term.
A good starting point would be in this section: Line 159 in 9c2d264
|
@jamiemoller you could use https://github.com/wuxxin/aur-packages/blob/main/localai-git/PKGBUILD as a starting point, its a (feature limited) archlinux package of localai for CPU, CUDA and ROCM. There are binaries available via arch4edu. See #1437 |
Please do work on that. I'm trying to put any load on AMD GPU for week now. Building from source on Ubuntu for clBlast fails in so many ways it's not even funny. |
i have a feeling that it will be better to start from here (or something) |
did some progress on #1595 (thanks to @fenfir to have started this up) but I don't have an AMD video card, however CI seems to pass and container images are being built just fine. I will merge as soon as the v2.8.2 images are out - @jamiemoller @Expro could you give the images a shot as soon as they are on master? |
Sure, I will take them for spin. Thanks for working on that. |
hipblas images are pushed by now:
|
Unfortunately, not working as intended. GPU was detected, but nothing was offloaded:
Tested with integrated phi-2 model with gpu_layers specified: `
usage: | |
the rocm docker image does appear to load the model however there is a grpc error that I have encountered that causes the call to terminate before inference, i am moving to 22.04 with rocm 6.0.0 on the host make sure there are no version compatibility issues. Note: the new vulkan implementation of llama.cpp seems to work flawlessly |
Im trying to work on the hipblas version but I am confused on where the Dockerfiles are located that are used to generate the latest images such as "quay.io/go-skynet/local-ai:master-hipblas" . One thing I noticed is that the latest hipblas images are still using rocm v6.0.0 while v6.0.3 is now out. But I have been unable to locate a Dockerfile in the git repo that is installing any version of rocm. So it would appear the Dockerfle being used is hosted elsewhere? Would appreciate if someone could point me to the latest Dockerfile being used to generate the hipblas images. Thank you |
newer does not equal better, this said, x.x.Y versions of Y variation are usually hotfixes and usually only apply to some very specific edge cases, can you clarify any issues you may have with 6.0.0 that are resolved with 6.0.3? |
I think I just discovered the cause of my issue... I have yet to test if a tailored build including gfx906 will work but this may be a good candidate for inclusion in the next hipblas build details for reference currently under 6.0.0 the following llbm targets are supported I may not have time to test an amendment to the @fenfir might you be able to test this? |
ok so fyi
EDIT: 'waaaaaaiiiiit a second' I think im retarded... |
@Expro take a look at my previous posts, maybe they will help you solve this, ping me if you like, maybe I can help |
@mudler before i spend the time, are there any immediate plans for expanded k8s docs or AMD specific docs? |
Hey @jtwolfe , thanks for deep diving into this, I don't have an AMD card to test things out so I refrained to write documentation that I couldn't test with. Any help on that area is greatly appreciated. |
ack. |
@bunder2015 @mudler please note that neither rocm5 nor rocm6 officially support the 6k series chips. I'm not saying that they wont work, just make sure to consider this. edit; This may be a good time to dig a bit further into Vulkan implementation, its definitely WAY more compatible and may be a good way of including arm chips with vulkan capable gpu cores. I'm specifically thinking about all of these new fancy arm laptops that everyone is producing now |
😮💨 AMD's not making this easy... Thanks for the charts, I only suggested the 6 series because it was PCIE gen4 and not the latest (ie expensive) chips. Looks like Radeon VII's are still out there for the same price bracket, even though it will be deprecated by rocm eventually. |
Just for some farther clarification, this is only true for ROCm on Linux, they are supported for ROCm for Windows. Also out of personal experience with my 6950XT*, I have not had issues that I can pin on ROCm when trying to use anything that advertised ROCm support (Text-Gen-UI, Ollama, Llamafile, Comfy-UI, SD-Gen-UI, SD-Next, etc.), and even some that don't. Edit: devices:
- /dev/dri
- /dev/kfd |
@Airradda TRUE! I forget that Windows exists sometimes :P The question would then be which approach is better? Use windows with older cards to host; or ensure uniform compatibility with newer cards on windows or linux platforms /shrug This should be continued as a background discussion, (eg for containers and k8s, if there was a windows-core based container then maybe this could work, they need a windows host tho and the k8s on windows problem is a whole other kettle of fish) @bunder2015 R-VII are pretty solid options as long as you can get a good one (aka non-mining rig card, they were often configured at lower mem voltage and higher freqs and had a habit of hard-locking if not under load) I don't begrudge AMD for their flippy-floppy nonsense and lack of support of older cards given how much has changed in their architecture recently, but it does seem like they kinda settled recently with whatever makes the gfx1100 llvm target work, I just hope RDNA3 is a little more extensible going forward so we get like a good 5y of support for current cards. I expect that the reason they haven't been able to cap. more of the market share at their current price point is that CUDA has been so over developed that you could almost shake it up and have a fully functional OS fall out with all the spare code that's floating around in it. AMD have had to figure out what to do first, then how best to do it, and do it cheaper.... I don't envy them. ps. wow @mudler that was quick XD |
If you have time it would be appreciated if you could add to the documentation on compatibility <3 |
and sent. I hope that it's enough to get you something usable. Cheers |
just saw it now - cool man, thank you! getting my hands on one of these, what do you suggest for the range? I don't have much experience in the AMD series but will have a look over this weekend
What are you running? |
No problem, glad I can help somehow.
I'm running localai on a Threadripper 2950x, 128gb of memory, and a Radeon VII... it's only capable of PCIE gen3, so the VII seemed the most I could use until I upgrade the whole system. Cheers |
Howdy! So I'm trying to figure out building hipblas with support for whisper.cpp but I seem to be getting an error on running any audio transcriptions. I have a Radeon 7600xt, which is gfx1102. Ollama is working great, and I've been able to compile and run whisper.cpp and get it to offload to the card just great. I can get localai working with the card on llama3 and the default function model, and I can get piper working, but I think that's still on the CPU. I'm running the quay.io/go-skynet/local-ai:v2.18.1-aio-gpu-hipblas container With that setup I get the following error when I try to run the example gb1.ogg from the startup docs: root@7c141b63533a:/build/models/localai/localai# curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@$PWD/gb1.ogg" -F model="whisper-1"
{"error":{"code":500,"message":"rpc error: code = Unavailable desc = error reading from server: EOF","type":""}} With REBUILD unset I got the following error:
Same error with more context:
And so I tried a bunch of things, and I keep getting this identical error. I tried a bunch of different flags, I tried setting external environment variables (which I didn't keep good records of) I tried replacing rocm with my local copy, I tried bumping the version of whisper to the latest one I can build separately. The next thing I haven't tried is placing the libraries from the working whisper.cpp build into the localai container. I just am always getting this error, and I seem to have no ability to affect it if my build is successful. right now my environment is: DEBUG=true
BUILD_PARALLELISM=12
BUILD_TYPE=hipblas
ROCM_HOME=/opt/rocm-6.1.2
ROCM_PATH=/opt/rocm-6.1.2
REBUILD=true
GO_TAGS=tts
GPU_TARGETS=gfx1102
HSA_OVERRIDE_GFX_VERSION=11.0.0
BUILD_SHARED_LIBS=ON
WHISPER_CPP_VERSION=1c31f9d4a8936aec550e6c4dc9ca5cae3b4f304a
MODELS_PATH=/build/models/localai/localai/ And with all those changes I still get the same exact error message.
How can I help? |
Facing the same issue with 19.2. and an AMD Ryzen Pro 8700GE with it's GFX1103 Radeon 870M:
Docker compose:
rom-smi on the host:
|
Some of the dependencies in `requirements.txt`, even if generic, pulls down the line CUDA libraries. This changes moves mostly all GPU-specific libs to the build-type, and tries a safer approach. In `requirements.txt` now are listed only "first-level" dependencies, for instance, grpc, but libs-dependencies are moved down to the respective build-type `requirements.txt` to avoid any mixin. This should fix #2737 and #1592. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Some of the dependencies in `requirements.txt`, even if generic, pulls down the line CUDA libraries. This changes moves mostly all GPU-specific libs to the build-type, and tries a safer approach. In `requirements.txt` now are listed only "first-level" dependencies, for instance, grpc, but libs-dependencies are moved down to the respective build-type `requirements.txt` to avoid any mixin. This should fix #2737 and #1592. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Some of the dependencies in `requirements.txt`, even if generic, pulls down the line CUDA libraries. This changes moves mostly all GPU-specific libs to the build-type, and tries a safer approach. In `requirements.txt` now are listed only "first-level" dependencies, for instance, grpc, but libs-dependencies are moved down to the respective build-type `requirements.txt` to avoid any mixin. This should fix #2737 and #1592. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Morning @mudler, I think I finally figured out building images, I was able to build sha-f7ffa9c-hipblas-ffmpeg from scratch without quay... we seem to be on the right track with the recent fixes, but there seems to be one more issue with diffusers and torchvision... 😅 |
that's good feedback, thanks for testing it! Attempted a fix for it in #3202 as we used to pin to nightly before the switch to |
Thanks, I gave that a shot, but it's throwing an error with |
Hi, I tried again without the |
that should be added to the |
Looks like we're still having issues building... Cheers |
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
ok - seems about:
It seems it was caused because
which obviously is not working as torchvision is too old (and not from rocm6x). I've pinned now the packages to specific versions in here: 0c0bc18 and that now pulls things off correctly. |
Thanks for the work on this, hopefully we got it this time... 😅 I'll pull again and see how far things go... I'll try to report back in ~4 hours... Cheers |
Presently it is very hard to get a docker container to build with the rocm backend, some elements seem to fail independently during the build process.
There are other related projects with functional docker implementations that do work with rocm out of the box (aka llama.cpp).
I would like to work on this myself however between the speed at which things change in this project and the amount of time I have free to work on this, I am left only to ask for this.
If there are good 'stable' methods for building a docker implementation with rocm underneath already it would be very appreciated if this could be better documented. 'arch' helps nobody that wants to run on a more enterprisy os like rhel or sles.
Presently I have defaulted back to using textgen as it has a mostly functional api but its featureset is kinda woeful. (better than running llama.cpp directly imo)
The text was updated successfully, but these errors were encountered: