A helper project to run perplexity tests for llama.cpp. llama.cpp is a leading LLM (Large Language Model) inference engine. llama.cpp runs LLMs like Llama2.
Perplexity is the most commonly used measure of a language model's performance on a given text corpus. It is a measure of how well a model is able to predict the contents of a dataset. Lower perplexity scores are better.
See background discussions in the llama.cpp discussions on the needs and motives for this project here and here
This python app wraps the llama.cpp ./perplexity
executable and uploads perplexity scores and test results as JSON to an Amazon S3 bucket for analysis.
The standard llama.cpp perplexity test uses wiki.test.raw.406 - ie 406 lines from wiki.test.raw
pip install -r requirements.txt
Copy .env.example
and update the config variables to suit your system.
You can use an existing wiki.test.raw if you want. The script will download the test corpus if required.
python3 perplexity_scorecard.py
Coming soon... the llama.cpp perplexity leaderboard and Jupyter (.ipynb) analysis and charting examples.
PRs are welcome 😀