LLM Model Perfromance comparision.

Abhishek Gupta

Video Link :

Project Architecture:

Backend :

FLASK API : Flask API is used for calling psutill python library to monitor the process ollama-runner.
Java Spring Boot : Java Spring boot rest api is used to make CRUD operation on the my-sqldatabase.
Ollama : All my LLMs' are available through ollama application installed on my local machine. I can access all model via end point to send a query data.

Database : MySql data base is used to store the execution metrics like:

executionId : Primary key for the each execution.
modelName : Model Name of the LLM
executionTime_avg_plus_sd : average execution time + standard deviation
cpuUsage_avg_plus_sd : average cpu usage in percentage + standard deviation
gpuUsage_avg_plus_sd : average gpu usage in Mb + standard deviation
energyUsage_avg_sd : Average energy usage in joules.

LLMs' (Ollama Models) and description:

**llama2:13** : llama2 model with 13b parameter.
llama2:latest: llama2 latest is the 7b parameter model.
mistral:latest : Mistral latest is the 7b paramater model.
wizard-vicuna-uncensored:13b: Wizard vicuna is a 13b parameter model.
orca-mini:3b : Orca-mini is a 3b parameter.
zephyr:7b : Zephyr 7b parameter model.

Questions set for LLM evaluation:

I have picked 10 question from different sources:

HellaSwag MMLU, HumanEval General Knowledge and Logical Reasoning.
1. A person sets a folded towel on the edge of a sink. What happens next?
- a) They turn on the faucet and the towel falls in.
- b) They leave the room.
- c) They adjust the towel and fold it again.
- d) They wet the towel.
1. A person goes to a vending machine and puts in a dollar bill. What happens next?
- a) They select a snack and it gets stuck.
- b) They receive their change and walk away.
- c) They push the button and receive a soda.
- d) The machine does not accept the bill.
1. Who was the first Roman emperor?
- a) Julius Caesar
- b) Augustus
- c) Nero
- d) Caligula
1. In "Moby Dick", who is the narrator of the story?
- a) Captain Ahab
- b) Ishmael
- c) Queequeg
- d) Herman Melville
1. Write a function in Python that takes a list of numbers and returns the largest number in the list.
2. Create a Python function that reverses a string.
3. What is the primary gas found in the Earth's atmosphere?
4. Which is the largest ocean on Earth?
5. If you have a bowl with six apples and you take away four, how many do you have?
  
  a) Two b) Four c) Six d) None
6. A farmer had 17 sheep, all but nine died. How many are left?
  
  a) 8 b) 9 c) 17 d) None

Program flow and process:

Frontend: On the chat ui select a model and click on load question button.
The questions from questions.json will be loaded to the application and will be executed one by one on the model you have selected.
For each question when it is triggered, the flask api start_monitor end point will be triggered when the execution starts. This will use psutill to monitor the status of process **llama-runne**. The output will be cpu_usage in % and rss(resident set size) value in bytes. Which will be converted to Mb later for calculation. I monitor the process in every 300 ms.
Once the query is completed or the process is no longer in the system. The stop_monitoring api will be triggered.
After the set of 10 questions get executed, the calculation for mean and standard deviation will be calculated for each metrics.

Energy calculation is done based on my laptop specification :

Taking in the following parameters:


    // Idle CPU power
    const P_idle = 5;
    // Max CPU power
    const P_max = 25;
    // Idle memory power
    const P_mem_idle = 3;
    // Max memory power
    const P_mem_max = 8;

    // Total RAM memory in MB (16GB RAM)
    const totalMemoryMb = 16000;

With all parameters collected with mean and standard deviation, we now post this data to mySql database using java-spring-boot rest api.
Finally, we can see the histograms for each parameter updated in Dashboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LLM Model Perfromance comparision.

Abhishek Gupta

Video Link :

Project Architecture:

LLMs' (Ollama Models) and description:

Questions set for LLM evaluation:

Program flow and process:

Files

README.md

Latest commit

History

README.md

File metadata and controls

LLM Model Perfromance comparision.

Abhishek Gupta

Video Link :

Project Architecture:

LLMs' (Ollama Models) and description:

Questions set for LLM evaluation:

Program flow and process: