-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running locally will not use GPU #269
Comments
Same problem here Expected behavior Screenshots Open Interpreter version Python version Operating System name and version Additional context GPU user dedicated memory (12% of 12288MB) Maybe a missing parameter to llama-cpp api like "--gpu-layers"? |
Same problem, CUDA drivers 12.0 works with oobagooba but cant run llama code 34B on open-interpreter, I can see the model is loaded in RAM even if GPU is selected. Nvidia A6000, Ubuntu22.04 |
Same here @guillaumenaud works fine on oobabooga with several different.models. |
Had same issue. You need to ensure this command comes back with 'True': python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)" Here's what I had to do: pip uninstall llama-cpp-python The output of last command should be: |
Thanks @vrijsinghani I tried your code, but it came back with False. No mention of Thanks anyway for the advice. |
thanks for the reply. Just reinstalled fresh nvidia driver to 535 and cuda to 12.2 which nvcc shows correct bin, now, i can see my card in nvidia-smi too. just followed your instructions (inside venv activated) it was indeed showing false and still is after (still not working). |
For Linux, please run the following command, which will create
|
Here it is Jordan (took ages to create) |
Can you please run |
(base) bob@BobLinuxMint:~$ which python Is this what you mean? |
Yes, thanks! and just for good measure, can you also do:
|
(base) bob@BobLinuxMint:~$ which interpreter |
This is a tough one. Your logs indicate that GPU support was compiled for Your paths to Just to make sure, can you please create a conda env to test this all in.
|
Thanks for all your time and effort Jordan. Here's the final few lines... Downloading typing_extensions-4.7.1-py3-none-any.whl (33 kB) |
Well, I don't like to do this, but if you can't get it to work in a virtual environment like conda, then I'm out of ideas. You might be able to get support from the https://github.com/abetlen/llama-cpp-python or https://github.com/ggerganov/llama.cpp repos. |
Thanks Jordan |
Lol, i forgot to install the cuda-toolkitX-) sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc That works for me:-) |
Thanks @ahoepf nearly worked! Still no joy, but getting closer... |
Make a check with: |
hrm that was interesting. It failed when I was in my env but I tried it out of it and it succeeded, so ran interpreter and it used my gpu >__< |
@Videoteq So, I've got a PR that should fix this, but there's also a way to fix this with conda. In a conda env, run:
That should make GPU work. My PR will eliminate the need for conda, and if you're interested I'd appreciate if you're able to test the steps in the PR and give some feedback. |
@jordanbtucker Following previous attempts I switched to Windows (I have 2 SSDs on my PC, one Linux, the other Windows). When I rebooted into Linux only one of my two screens was working, so had to reinstall Nvidia drivers and generally faff around with the settings for a while. Anyway, once back to two screens I tried running --local and this time it all worked and installed llama-pcc. But - then the terminal quit so I tried again - all went well, but again Terminal quit at the last step. Tried your code above, but this time it failed with the error shown below Open Interpreter will use Code Llama for local execution. Use your arrow keys to [?] Parameter count (smaller is faster, larger is more capable): 7B
[?] Quality (smaller is faster, larger is more capable): Small | Size: 2.6 GB, Estimated RAM usage: 5.1 GB
[?] Use GPU? (Large models might crash on GPU, but will run more quickly) (...: Y Model found at /home/bob/.local/share/Open Anyhow, thanks for all your help Jordan. |
@jordanbtucker The plot thickens... Ran your code but in a new env conda create -n oitest python=3.11 python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)" Result....... Successfully built llama-cpp-python Good so far..... (oitest) bob@BobLinuxMint:~$ pip install open-interpreter Then..... (oitest) bob@BobLinuxMint:~$ interpreter --local Open Interpreter will use Code Llama for local execution. Use your arrow keys to [?] Parameter count (smaller is faster, larger is more capable): 7B
[?] Quality (smaller is faster, larger is more capable): Small | Size: 2.6 GB, Estimated RAM usage: 5.1 GB
[?] Use GPU? (Large models might crash on GPU, but will run more quickly) (...: Y Model found at /home/bob/.local/share/Open Ran your code in the oitest env Result...... Successfully installed diskcache-5.6.3 llama-cpp-python-0.1.85+cu122 numpy-1.25.2 typing-extensions-4.7.1 (oitest) bob@BobLinuxMint:~$ interpreter --local Open Interpreter will use Code Llama for local execution. Use your arrow keys to [?] Parameter count (smaller is faster, larger is more capable): 7B
[?] Quality (smaller is faster, larger is more capable): Small | Size: 2.6 GB, Estimated RAM usage: 5.1 GB
[?] Use GPU? (Large models might crash on GPU, but will run more quickly) (...: Y Model found at /home/bob/.local/share/Open Requirement already satisfied: llama-cpp-python in ./anaconda3/envs/oitest/lib/python3.11/site-packages (0.1.85+cu122) During handling of the above exception, another exception occurred: Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): ▌ Failed to install TheBloke/CodeLlama-7B-Instruct-GGUF. Common Fixes: You can follow our simple setup docs at the link below to resolve https://github.com/KillianLucas/open-interpreter/tree/main/docs If you've tried that and you're still getting an error, we have likely not built ( Running language models locally is a difficult task! If you have insight into Press enter to switch to GPT-4 (recommended). ● Welcome to Open Interpreter. ──────────────────────────────────────────────────────────────────────────────── ▌ OpenAI API key not found To use GPT-4 (recommended) please provide an OpenAI API key. To use Code-Llama (free but less capable) press enter. |
Thanks for testing. It could be an issue with AVX2 support. Can you please run the following command and post its output.
|
Here is it Jordan - a very long outpu! (base) bob@BobLinuxMint:~$ grep flags /proc/cpuinfo |
@jordanbtucker I have an ancient (but updated) PC which might cause snags. Here's the spec System: |
@Videoteq Yep, your CPU doesn't support AVX2. Run the following command in your conda env.
I updated my PR to account for this. Hopefully I can merge it before the next release. |
(base) bob@BobLinuxMint:~$ conda activate oitest (oitest) bob@BobLinuxMint:~$ pip install --force-reinstall llama-cpp-python --prefer-binary --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX/cu122 Looking in indexes: https://pypi.org/simple, https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX/cu122 (oitest) bob@BobLinuxMint:~$ interpreter --local Open Interpreter will use Code Llama for local execution. Use your arrow keys to [?] Parameter count (smaller is faster, larger is more capable): 7B
[?] Quality (smaller is faster, larger is more capable): Small | Size: 2.6 GB, Estimated RAM usage: 5.1 GB
[?] Use GPU? (Large models might crash on GPU, but will run more quickly) (...: Y Model found at /home/bob/.local/share/Open Requirement already satisfied: llama-cpp-python in ./anaconda3/envs/oitest/lib/python3.11/site-packages (0.1.85+cu122) During handling of the above exception, another exception occurred: Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): ▌ Failed to install TheBloke/CodeLlama-7B-Instruct-GGUF. Common Fixes: You can follow our simple setup docs at the link below to resolve https://github.com/KillianLucas/open-interpreter/tree/main/docs If you've tried that and you're still getting an error, we have likely not built ( Running language models locally is a difficult task! If you have insight into Press enter to switch to GPT-4 (recommended). |
Thanks. In my case, with an FX-8350 CPU, which does NOT support AVX2, and an RTX 4090 GPU, running Linux Mint 21 (ubuntu 22), these commands can be used to install open-interpreter to run on the GPU (note that I keep my conda in a path rather than the default name; most people probably don't want to do this):
|
Hey @iateadonut it worked! Here's a little test run with a small model TheBloke/CodeLlama-7B-Instruct-GGUF (/home/bob/conda-oitest) bob@BobLinuxMint:~$ interpreter --local Open Interpreter will use Code Llama for local execution. Use your arrow keys to [?] Parameter count (smaller is faster, larger is more capable): 7B
[?] Quality (smaller is faster, larger is more capable): Small | Size: 2.6 GB, Estimated RAM usage: 5.1 GB
[?] Use GPU? (Large models might crash on GPU, but will run more quickly) (...: Y Model found at /home/bob/.local/share/Open ▌ Model set to TheBloke/CodeLlama-7B-Instruct-GGUF Open Interpreter will require approval before running code. Use interpreter -y Press CTRL-C to exit. Plan: 1 Check if the user is in the correct directory. Let me know if you have any questions or need clarification on this plan.
ls -l | grep "^d" | wc -l Would you like to run this code? (y/n) y ls -l | grep "^d" | wc -l█ Traceback (most recent call last): ls -l | grep "^d" | wc -l Traceback (most recent call last): ls -l | grep "^d" | wc -l Traceback (most recent call last): it looks like there was a syntax error in the code. The error message To fix this issue, I would suggest using a different approach to achieve the import os Get a list of all files on the desktopfiles = os.listdir('/home/bob/Desktop') Filter the list to only include directoriesdirectories = [file for file in files if print(len(directories)) Would you like to run this code? (y/n) y import os Get a list of all files on the desktopfiles = os.listdir('/home/bob/Desktop') Filter the list to only include directoriesdirectories = [file for file in files if print(len(directories)) 3 3 directories are found on the desktop.
ls -l | grep "^l" | wc -l Would you like to run this code? (y/n) y ls -l | grep "^l" | wc -l█ Traceback (most recent call last): ls -l | grep "^l" | wc -l Traceback (most recent call last): ls -l | grep "^l" | wc -l Traceback (most recent call last): 3 directories are found on the desktop. However, there seems to be a syntax To fix this issue, I would suggest using a different approach to achieve the import os Get a list of all files on the desktopfiles = os.listdir('/home/bob/Desktop') Filter the list to only include shortcutsshortcuts = [file for file in files if print(len(shortcuts)) Would you like to run this code? (y/n) y import os Get a list of all files on the desktopfiles = os.listdir('/home/bob/Desktop') Filter the list to only include shortcutsshortcuts = [file for file in files if print(len(shortcuts)) 3 3 shortcuts are found on the desktop.
3 shortcuts are found on the desktop. Here they are: import os Get a list of all files on the desktopfiles = os.listdir('/home/bob/Desktop') Filter the list to only include shortcutsshortcuts = [file for file in files if print(shortcuts) Would you like to run this code? (y/n) y import os Get a list of all files on the desktopfiles = os.listdir('/home/bob/Desktop') Filter the list to only include shortcutsshortcuts = [file for file in files if print(shortcuts) ['oobabooga_linux', 'Documents', 'AI'] 3 shortcuts are found on the desktop. Here they are: ['oobabooga_linux', 'Documents', 'AI'] |
Installing the toolkit did it for me. Thanks @ahoepf now the command python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)" does return true. however how when i runt interpreter -l i get to go through the options, to finally choose the GPU, and then the interpreter just exists silently. |
OK, after running In Summary:
|
Does anyone know if the same strategy will work with Intel GPUs? Ie. Install Intell drivers and toolkit, recompile llama.cpp |
Describe the bug
Running codellama .gguf models locally on Linux: the interpeter prompt asks whether to use the GPU (in my case RTX 3060) but then doesn't use it, so it's really slow. :-(
Reproduce
N/A
Expected behavior
Expect the GPU to be used.
Screenshots
No response
Open Interpreter version
0.1.3
Python version
3.11.4
Operating System name and version
Linux Mint 21.2 Cinnamon 5.8.4 Kernel 5.15.0-83-generic
Additional context
All 8 cores in i7 running at very high utilization.
GPU user dedicated memory (12% of 12288MB)
GPU Utilization from 4% to 30%
The text was updated successfully, but these errors were encountered: