Skip to content

kevinknights29/Llama-v2-GPU-GTX-1650

Repository files navigation

Llama-v2-GPU-GTX-1650

Running Llama v2 with Llama.cpp in a 4GB VRAM GTX 1650.

Setup

To extend your Nvidia GPU resource and drivers to a docker container.

You need to install NVIDA CUDA Container Toolkit

Results

Llama.cpp recognizing cuBLAS optimizer

image

After optimizing values for inference

N_GPU_LAYERS=35
N_BATCH=4096
N_THREADS=4

image

Streaming support

gradio+llama_cpp-streaming.mp4

Generation Paramaters

image

Usage

Build APP Image

docker compose build

Get everything up and running

docker compose down && docker compose up -d

Have fun

Visit: http://localhost:7861/ to access the Gradio Chatbot UI.

Contributing

Installing pre-commit

Pre-commit is already part of this project dependencies. If you would like to installed it as standalone run:

pip install pre-commit

To activate pre-commit run the following commands:

  • Install Git hooks:
pre-commit install
  • Update current hooks:
pre-commit autoupdate

To test your installation of pre-commit run:

pre-commit run --all-files

About

Running Llama v2 with Llama.cpp in a 4GB VRAM GTX 1650.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published