Skip to content

Latest commit

 

History

History
82 lines (50 loc) · 1.58 KB

README.md

File metadata and controls

82 lines (50 loc) · 1.58 KB

Llama-v2-GPU-GTX-1650

Running Llama v2 with Llama.cpp in a 4GB VRAM GTX 1650.

Setup

To extend your Nvidia GPU resource and drivers to a docker container.

You need to install NVIDA CUDA Container Toolkit

Results

Llama.cpp recognizing cuBLAS optimizer

image

After optimizing values for inference

N_GPU_LAYERS=35
N_BATCH=4096
N_THREADS=4

image

Streaming support

gradio+llama_cpp-streaming.mp4

Generation Paramaters

image

Usage

Build APP Image

docker compose build

Get everything up and running

docker compose down && docker compose up -d

Have fun

Visit: http://localhost:7861/ to access the Gradio Chatbot UI.

Contributing

Installing pre-commit

Pre-commit is already part of this project dependencies. If you would like to installed it as standalone run:

pip install pre-commit

To activate pre-commit run the following commands:

  • Install Git hooks:
pre-commit install
  • Update current hooks:
pre-commit autoupdate

To test your installation of pre-commit run:

pre-commit run --all-files