Llama-v2-GPU-GTX-1650

Running Llama v2 with Llama.cpp in a 4GB VRAM GTX 1650.

Setup

To extend your Nvidia GPU resource and drivers to a docker container.

You need to install NVIDA CUDA Container Toolkit

Results

Llama.cpp recognizing cuBLAS optimizer

After optimizing values for inference

N_GPU_LAYERS=35
N_BATCH=4096
N_THREADS=4

Streaming support

gradio+llama_cpp-streaming.mp4

Generation Paramaters

Usage

Build APP Image

docker compose build

Get everything up and running

docker compose down && docker compose up -d

Have fun

Visit: http://localhost:7861/ to access the Gradio Chatbot UI.

Contributing

Installing pre-commit

Pre-commit is already part of this project dependencies. If you would like to installed it as standalone run:

pip install pre-commit

To activate pre-commit run the following commands:

Install Git hooks:

pre-commit install

Update current hooks:

pre-commit autoupdate

To test your installation of pre-commit run:

pre-commit run --all-files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Llama-v2-GPU-GTX-1650

Setup

Results

Llama.cpp recognizing cuBLAS optimizer

After optimizing values for inference

Streaming support

Generation Paramaters

Usage

Build APP Image

Get everything up and running

Have fun

Contributing

Installing pre-commit

Files

README.md

Latest commit

History

README.md

File metadata and controls

Llama-v2-GPU-GTX-1650

Setup

Results

Llama.cpp recognizing cuBLAS optimizer

After optimizing values for inference

Streaming support

Generation Paramaters

Usage

Build APP Image

Get everything up and running

Have fun

Contributing

Installing pre-commit