- Chatting with tinyLlama chat model
- Extremely fast throughput through vLLM
- simple UI/UX
- Configarble llm values (Temp, top p, max tokens)
- Import Export conversations
- Websearch/RAG
- Support CPU mode for tinyLlama chat GGUF
Colab --> https://colab.research.google.com/drive/1OaWYiHBt-nkSNCik6H0lhAWcpLCYvauq#scrollTo=oHb4LKvLy5aD You need an Ngrok auth token, as this is based off flask in google colab, if you want it completely local, copy the code into a default flask template and run it!