Skip to content

Latest commit

 

History

History
 
 

serge-cpu

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Serge - LLaMA made easy 🦙

A chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU.

  • SvelteKit frontend
  • Redis for storing chat history & parameters
  • FastAPI + langchain for the API, wrapping calls to llama.cpp using the python bindings
demo.webm

Getting started

Deploy and click the URI in Cloudmos.

The API documentation can be found at http://localhost:8008/api/docs

Models

Currently the following models are supported:

  • GPT4-Alpaca-LoRA-30B
  • Alpaca-LoRA-65B
  • OpenAssistant-30B
  • GPT4All-13B
  • Stable-Vicuna-13B
  • Guanaco-7B
  • Guanaco-13B
  • Guanaco-33B
  • Guanaco-65B

If you have existing weights from another project you can add them to the serge_weights volume using docker cp.

⚠️ A note on memory usage

LLaMA will just crash if you don't have enough available memory for your model.

  • 7B requires about 4.5GB of free RAM
  • 13B requires about 12GB free
  • 30B requires about 20GB free

Support

Feel free to join the discord if you need help with the setup: https://discord.gg/62Hc6FEYQH

Contributing

Serge is always open for contributions! If you catch a bug or have a feature idea, feel free to open an issue or a PR.

What's next

  • Front-end to interface with the API
  • Pass model parameters when creating a chat
  • Manager for model files
  • Support for other models
  • LangChain integration
  • User profiles & authentication

And a lot more!