Skip to content

Commit

Permalink
refactor: enabled fash attention for OpenChat 3.6
Browse files Browse the repository at this point in the history
  • Loading branch information
umbertogriffo committed Jun 29, 2024
1 parent b14cd85 commit 21cc6de
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 14 deletions.
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,18 +133,18 @@ format.

### Supported Models

| 🤖 Model | Supported | Model Size | Notes and link to the model |
|--------------------------------------------------------|-----------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `llama-3` Meta Llama 3 Instruct || 8B | Less accurate than OpenChat - [link](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF) |
| `openchat-3.6` **Recommended** - OpenChat 3.6 20240522 || 8B | [link](https://huggingface.co/bartowski/openchat-3.6-8b-20240522-GGUF) |
| `openchat-3.5` - OpenChat 3.5 0106 || 7B | [link](https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF) |
| `starling` Starling Beta || 7B | Is trained from `Openchat-3.5-0106`. It's recommended if you prefer more verbosity over OpenChat - [link](https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF) |
| `neural-beagle` NeuralBeagle14 || 7B | [link](https://huggingface.co/TheBloke/NeuralBeagle14-7B-GGUF) |
| `dolphin` Dolphin 2.6 Mistral DPO Laser || 7B | [link](https://huggingface.co/TheBloke/dolphin-2.6-mistral-7B-dpo-laser-GGUF) |
| `zephyr` Zephyr Beta || 7B | [link](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF) |
| `mistral` Mistral OpenOrca || 7B | [link](https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF) |
| `phi-3` Phi-3 Mini 4K Instruct || 3.8B | [link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) |
| `stablelm-zephyr` StableLM Zephyr OpenOrca || 3B | [link](https://huggingface.co/TheBloke/stablelm-zephyr-3b-GGUF) |
| 🤖 Model | Supported | Model Size | Notes and link to the model |
|-----------------------------------------------|-----------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `llama-3` Meta Llama 3 Instruct || 8B | Less accurate than OpenChat - [link](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF) |
| `openchat-3.6` **Recommended** - OpenChat 3.6 || 8B | [link](https://huggingface.co/bartowski/openchat-3.6-8b-20240522-GGUF). Flash attention enabled by default. |
| `openchat-3.5` - OpenChat 3.5 || 7B | [link](https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF) |
| `starling` Starling Beta || 7B | Is trained from `Openchat-3.5-0106`. It's recommended if you prefer more verbosity over OpenChat - [link](https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF) |
| `neural-beagle` NeuralBeagle14 || 7B | [link](https://huggingface.co/TheBloke/NeuralBeagle14-7B-GGUF) |
| `dolphin` Dolphin 2.6 Mistral DPO Laser || 7B | [link](https://huggingface.co/TheBloke/dolphin-2.6-mistral-7B-dpo-laser-GGUF) |
| `zephyr` Zephyr Beta || 7B | [link](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF) |
| `mistral` Mistral OpenOrca || 7B | [link](https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF) |
| `phi-3` Phi-3 Mini 4K Instruct || 3.8B | [link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) |
| `stablelm-zephyr` StableLM Zephyr OpenOrca || 3B | [link](https://huggingface.co/TheBloke/stablelm-zephyr-3b-GGUF) |

## Example Data

Expand All @@ -165,7 +165,7 @@ python chatbot/memory_builder.py --chunk-size 1000
To interact with a GUI type:

```shell
streamlit run chatbot/chatbot_app.py -- --model openchat
streamlit run chatbot/chatbot_app.py -- --model openchat-3.6 --max-new-tokens 1024
```

![conversation-aware-chatbot.gif](images/conversation-aware-chatbot.gif)
Expand All @@ -175,7 +175,7 @@ streamlit run chatbot/chatbot_app.py -- --model openchat
To interact with a GUI type:

```shell
streamlit run chatbot/rag_chatbot_app.py -- --model openchat --k 2 --synthesis-strategy async_tree_summarization
streamlit run chatbot/rag_chatbot_app.py -- --model openchat-3.6 --k 2 --synthesis-strategy async_tree_summarization
```

![rag_chatbot_example.gif](images%2Frag_chatbot_example.gif)
Expand Down
1 change: 1 addition & 0 deletions chatbot/bot/model/settings/openchat.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ class OpenChat36Settings(Model):
"n_ctx": 4096, # The max sequence length to use - note that longer sequence lengths require much more resources
"n_threads": 8, # The number of CPU threads to use, tailor to your system and the resulting performance
"n_gpu_layers": 50, # The number of layers to offload to GPU, if you have GPU acceleration available
"flash_attn": True, # Use flash attention.
}
config_answer = {"temperature": 0.7, "stop": []}
system_template = (
Expand Down

0 comments on commit 21cc6de

Please sign in to comment.