feat: support metal GPU acceleration (#3)

* feat: support metal gpu acceleration * fix: fix ci * fix: fix ci * fix: fix cuda dependencies * chore: tune ruff * chore: update readme and gifs * chore: update gifs
umbertogriffo · May 22, 2024 · 011fa5a · 011fa5a
1 parent 915c805
commit 011fa5a
Show file tree

Hide file tree

Showing 10 changed files with 286 additions and 70 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -37,12 +37,17 @@ jobs:
         id: llama-cpp-version
         run: echo "llama-cpp-version=$(cat version/llama_cpp)" >> "$GITHUB_OUTPUT"
 
+      - name: Get ctransformers version
+        id: ctransformers-version
+        run: echo "ctransformers-version=$(cat version/ctransformers)" >> "$GITHUB_OUTPUT"
+
       # Installing dependencies and llama-cpp-python without NVIDIA CUDA acceleration.
       - name: Setup environment
         run: |
           poetry lock --check
           poetry install --no-root --no-ansi
-          . .venv/bin/activate && pip3 install llama-cpp-python~=${{ steps.llama-cpp-version.outputs.llama-cpp-version }}
+          . .venv/bin/activate && pip3 install llama-cpp-python==${{ steps.llama-cpp-version.outputs.llama-cpp-version }}
+          . .venv/bin/activate && pip3 install ctransformers==${{ steps.ctransformers-version.outputs.ctransformers-version }}
 
       - name: Run tests
         run: |

diff --git a/.gitignore b/.gitignore
@@ -68,6 +68,9 @@ instance/
 # Scrapy stuff:
 .scrapy
 
+# Ruff stuff:
+.ruff_cache
+
 # Sphinx documentation
 docs/_build/
 

diff --git a/Makefile b/Makefile
@@ -1,25 +1,39 @@
 .PHONY: check install setup update test clean
 
-file=version/llama_cpp
-llama_cpp_version=`cat $(file)`
+llama_cpp_file=version/llama_cpp
+llama_cpp_version=`cat $(llama_cpp_file)`
+
+ctransformers_file=version/ctransformers
+ctransformers_version=`cat $(ctransformers_file)`
 
 check:
 	which pip3
 	which python3
 
-install:
+install_cuda:
 	echo "Installing..."
 	mkdir -p .venv
 	poetry config virtualenvs.in-project true
-	poetry install --no-root --no-ansi
-	echo "Installing llama-cpp-python with pip to get NVIDIA CUDA acceleration"
+	poetry install --extras "cuda-acceleration" --no-root --no-ansi
+	echo "Installing llama-cpp-python and ctransformers with pip to get NVIDIA CUDA acceleration"
 	. .venv/bin/activate && CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip3 install llama-cpp-python==$(llama_cpp_version)
+	. .venv/bin/activate && pip3 install ctransformers[cuda]==$(ctransformers_version)
+
+install_metal:
+	echo "Installing..."
+	mkdir -p .venv
+	poetry config virtualenvs.in-project true
+	poetry install --no-root --no-ansi
+	echo "Installing llama-cpp-python and ctransformers with pip to get Metal GPU acceleration for macOS systems only (it doesn't install CUDA dependencies)"
+	. .venv/bin/activate && CMAKE_ARGS="-DLLAMA_METAL=on" pip3 install llama-cpp-python==$(llama_cpp_version)
+	. .venv/bin/activate && CT_METAL=1 pip install ctransformers==$(ctransformers_version) --no-binary ctransformers
 
 install_pre_commit:
 	poetry run pre-commit install
 	poetry run pre-commit install --hook-type pre-commit
 
-setup: install install_pre_commit
+setup_cuda: install_cuda install_pre_commit
+setup_metal: install_metal install_pre_commit
 
 update:
 	poetry lock --no-update

diff --git a/README.md b/README.md
@@ -4,11 +4,14 @@
 [![Code style: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 
 > [!IMPORTANT]
-> Disclaimer: The code has been tested on `Ubuntu 22.04.2 LTS` running on a Lenovo Legion 5 Pro
-> with twenty `12th Gen Intel® Core™ i7-12700H` and an `NVIDIA GeForce RTX 3060`.
+> Disclaimer:
+> The code has been tested on
+>   * `Ubuntu 22.04.2 LTS` running on a Lenovo Legion 5 Pro with twenty `12th Gen Intel® Core™ i7-12700H` and an `NVIDIA GeForce RTX 3060`.
+>   * `MacOS Sonoma 14.3.1` running on a MacBook Pro M1 (2020).
+>
 > If you are using another Operating System or different hardware, and you can't load the models, please
-> take a look either at the official CTransformers's GitHub [issue](https://github.com/marella/ctransformers/issues).
-> or at the official Llama Cpp Python's GitHub [issue](https://github.com/abetlen/llama-cpp-python/issues)
+> take a look either at the official Llama Cpp Python's GitHub [issue](https://github.com/abetlen/llama-cpp-python/issues).
+> or at the official CTransformers's GitHub [issue](https://github.com/marella/ctransformers/issues)
 
 > [!WARNING]
 > Note: it's important to note that the large language model sometimes generates hallucinations or false information.
@@ -32,9 +35,8 @@
 ## Introduction
 
 This project combines the power of [CTransformers](https://github.com/marella/ctransformers), [Lama.cpp](https://github.com/abetlen/llama-cpp-python),
-[LangChain](https://python.langchain.com/docs/get_started/introduction.html) (only used for document chunking and
-querying the Vector Database, and we plan to eliminate it entirely), [Chroma](https://github.com/chroma-core/chroma) and
-[Streamlit](https://discuss.streamlit.io/) to build:
+[LangChain](https://python.langchain.com/docs/get_started/introduction.html) (only used for document chunking and querying the Vector Database, and we plan to eliminate it entirely),
+[Chroma](https://github.com/chroma-core/chroma) and [Streamlit](https://discuss.streamlit.io/) to build:
 * a Conversation-aware Chatbot (ChatGPT like experience).
 * a RAG (Retrieval-augmented generation) ChatBot.
 
@@ -65,7 +67,7 @@ To deal with context overflows, we implemented two approaches:
 ## Prerequisites
 
 * Python 3.10+
-* GPU supporting CUDA 12 and up.
+* GPU supporting CUDA 12 and up
 * Poetry 1.7.0
 
 ### Install Poetry
@@ -90,8 +92,11 @@ To easily install the dependencies we created a make file.
 
 * Check: ```make check```
   * Use it to check that `which pip3` and `which python3` points to the right path.
-* Setup: ```make setup```
-  * Creates an environment and installs all dependencies.
+* Setup:
+  * Setup with NVIDIA CUDA acceleration: ```make setup_cuda```
+    * Creates an environment and installs all dependencies with NVIDIA CUDA acceleration.
+  * Setup with Metal GPU acceleration: ```make setup_metal```
+    * Creates an environment and installs all dependencies with Metal GPU acceleration for macOS system only.
 * Update: ```make update```
   * Update an environment and installs all updated dependencies.
 * Tidy up the code: ```make tidy```
@@ -149,7 +154,6 @@ streamlit run chatbot/rag_chatbot_app.py -- --model openchat --k 2 --synthesis-s
 
 ![rag_chatbot_example.gif](images%2Frag_chatbot_example.gif)
 
-
 ## How to debug the Streamlit app on Pycharm
 
 ![debug_streamlit.png](images/debug_streamlit.png)

diff --git a/demo.md b/demo.md
@@ -2,7 +2,7 @@
 
 - Tell me something about Italy
 - How many people live there?
-- How can I get a Golden visa?
+- Can you tell me the names of the countries that share a border with Italy?
 - Could you please remind me about the topic we were discussing earlier?
 
 # Story Chatbot - 2
@@ -38,3 +38,9 @@ is ensured that the atomicAdd operation is run after the execution of ids[idx] =
 Insult me. I want you to call me the worst curse words and insults you can.
 Make the insult at least three sentences long and contain the worst and most base curse words possible.
 Make it X-rated and disgusting.
+
+# Story Rag Chatbot - 1
+
+- Tell me something about the Blendle Social Code
+- What is the party budget?
+- Could you please remind me about the topic we were discussing earlier?
diff --git a/images/conversation-aware-chatbot.gif b/images/conversation-aware-chatbot.gif
diff --git a/images/rag_chatbot_example.gif b/images/rag_chatbot_example.gif