... Continual integration benchmarking for LLM inference engine llama.cpp

⚠️ WARNING

Spinning up GPU virtual machines can get expensive 💰 please proceed carefully...

About

ToDo...

Tech Stack

Azure cloud by Microsoft - we are targeting MS Azure exclusively (at this point)
Azure CLI - az is command line tool to managed Azure infrastructure
node.js application logic, llama.cpp process control and workflow
InfluxDB - time series database (TSDB) for logs and metrics
CouchDB - simple, robust, noSQL database (distributable) - for config and test result summaries
Telegraf - for gathering machine and process telemetry
[Later] Grafana for telemetry visualisation
[Later] Python & Jupyter for benchmark data analysis

Architecture

❗️ Tests Results Export

Test results are collated and pushed to llama.cpp GitHub repo after each CI benchmarking run.

⚙️ conductor VM

⚙️ conductor Overview

always on small cloud VM
Ubuntu 22.04
node.js, nvm & PM2 to manage cron & processes
InfluxDB database for logs & telemetry
CouchDB database for config & bench result summaries
Telegraf to gather machine telemetry

⚙️ conductor Functionality

polls Gituhb API periodically for latest llama.cpp release
instructs Azure to create a GPU VM (⚙️bench-runner) and configures infrastructure etc az CLI commands
exports test result extracts and commits them to llama.cpp Github repo

📂 CouchDB - Config database

bench test data
Azure machine image data
result summaries

📂 InfluxDB - Logging Database

llama.cpp stdout stream with timestamps
nvidia-smi metrics
cpu, gpu, ram metrics & other machine metrics

⚙️ bench-runner VM

⚙️ bench-runner overview

emphemeral VM
Ubuntu, Debian or Windows VM
node.js, nvm & PM2
nvidia drivers etc
make, gcc etc
Telegraf to gather machine telemetry & nvidia telemetry

⚙️ bench- runner functionality

node.js runs a managed node.js sub-process;

node.js pulls git code - eg git pull master-d7d2e6a
node.s builds the code - eg make clean && make -j
node.js runs llama.cpp - eg time ./perplexity -m ./models/3B/open-llama-3b-q4_0.bin -f build/wiki.test.raw.406 -t 8
node.js sends stdout to 📂InfuxDB
node.js sends llama.cpp process results to 📂CouchDB
[optionally] node.js switches to a different branch - GOTO #1
node.js sends "bench session end" signal to ⚙️conductor / 📂CouchDB / 📂InfluxDB
node.js shutsdown the VM

Setup

To Do

env.sample
az scripts to build ⚙️conductor
az scripts to get Azure (a) regions (b) sub-regions (c) image-types...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

... Continual integration benchmarking for LLM inference engine llama.cpp

⚠️ WARNING

About

Tech Stack

Architecture

❗️ Tests Results Export

⚙️ conductor VM

⚙️ conductor Overview

⚙️ conductor Functionality

📂 CouchDB - Config database

📂 InfluxDB - Logging Database

⚙️ bench-runner VM

⚙️ bench-runner overview

⚙️ bench- runner functionality

Setup

To Do

Files

README.md

Latest commit

History

README.md

File metadata and controls

... Continual integration benchmarking for LLM inference engine llama.cpp

⚠️ WARNING

About

Tech Stack

Architecture

❗️ Tests Results Export

⚙️ conductor VM

⚙️ conductor Overview

⚙️ conductor Functionality

📂 CouchDB - Config database

📂 InfluxDB - Logging Database

⚙️ bench-runner VM

⚙️ bench-runner overview

⚙️ bench- runner functionality

Setup

To Do