Skip to content

Latest commit

 

History

History
114 lines (74 loc) · 3.54 KB

README.md

File metadata and controls

114 lines (74 loc) · 3.54 KB

Alt text

... Continual integration benchmarking for LLM inference engine llama.cpp


⚠️ WARNING

Spinning up GPU virtual machines can get expensive 💰 please proceed carefully...


About

ToDo...


Tech Stack

  • Azure cloud by Microsoft - we are targeting MS Azure exclusively (at this point)
  • Azure CLI - az is command line tool to managed Azure infrastructure
  • node.js application logic, llama.cpp process control and workflow
  • InfluxDB - time series database (TSDB) for logs and metrics
  • CouchDB - simple, robust, noSQL database (distributable) - for config and test result summaries
  • Telegraf - for gathering machine and process telemetry
  • [Later] Grafana for telemetry visualisation
  • [Later] Python & Jupyter for benchmark data analysis

Architecture



❗️ Tests Results Export

Test results are collated and pushed to llama.cpp GitHub repo after each CI benchmarking run.


⚙️ conductor VM

⚙️ conductor Overview

  • always on small cloud VM
  • Ubuntu 22.04
  • node.js, nvm & PM2 to manage cron & processes
  • InfluxDB database for logs & telemetry
  • CouchDB database for config & bench result summaries
  • Telegraf to gather machine telemetry

⚙️ conductor Functionality

  1. polls Gituhb API periodically for latest llama.cpp release
  2. instructs Azure to create a GPU VM (⚙️bench-runner) and configures infrastructure etc az CLI commands
  3. exports test result extracts and commits them to llama.cpp Github repo

📂 CouchDB - Config database

  • bench test data
  • Azure machine image data
  • result summaries

📂 InfluxDB - Logging Database

  • llama.cpp stdout stream with timestamps
  • nvidia-smi metrics
  • cpu, gpu, ram metrics & other machine metrics

⚙️ bench-runner VM

⚙️ bench-runner overview

  • emphemeral VM
  • Ubuntu, Debian or Windows VM
  • node.js, nvm & PM2
  • nvidia drivers etc
  • make, gcc etc
  • Telegraf to gather machine telemetry & nvidia telemetry

⚙️ bench- runner functionality

node.js runs a managed node.js sub-process;

  1. node.js pulls git code - eg git pull master-d7d2e6a
  2. node.s builds the code - eg make clean && make -j
  3. node.js runs llama.cpp - eg time ./perplexity -m ./models/3B/open-llama-3b-q4_0.bin -f build/wiki.test.raw.406 -t 8
  4. node.js sends stdout to 📂InfuxDB
  5. node.js sends llama.cpp process results to 📂CouchDB
  6. [optionally] node.js switches to a different branch - GOTO #1
  7. node.js sends "bench session end" signal to ⚙️conductor / 📂CouchDB / 📂InfluxDB
  8. node.js shutsdown the VM


Setup


To Do

  1. env.sample
  2. az scripts to build ⚙️conductor
  3. az scripts to get Azure (a) regions (b) sub-regions (c) image-types...