Skip to content

Latest commit





Folders and files

Last commit message
Last commit date

parent directory



GitHub Repo Β 

ExLlamaV2 uses custom Kernels to speed up LLM inference under different quantizations. ExLlamaV2 supports a new "EXL2" format. EXL2 is based on the same optimization method as GPTQ and supports 2, 3, 4, 5, 6 and 8-bit quantization. For this benchmark implementation, we use 4-bit and 8-bit quantization version of Llama2.

πŸš€ Running the ExLlamaV2 Benchmark.

We can run the ExLlamaV2 benchmark for two models: Llama2 7B Chat and Mistral-7B v0.1 Instruct Here is how we run benchmark for ExLlamaV2.

./bench_exllamav2/ \
  --prompt <value> \               # Enter a prompt string
  --max_tokens <value> \           # Maximum number of tokens to output
  --repetitions <value> \          # Number of repititions to be made for the prompt.
  --device <cpu/cuda/metal> \      # The device in which we want to benchmark.
  --model_name <name-of-the-model> # The name of the model. (options: 'llama' for Llama2 and 'mistral' for Mistral-7B-v0.1)

To get started quickly you can simply run:

./bench_exllamav2/ -d cuda -n llama

This will take all the default values (see in the file) and do the benchmarks for Llama 2 and Mistral models. You can find all the benchmarks results for Llama 2 7B Chat here, and Mistral 7B instruct here.

πŸ’Ž Quality Checks

Please note. we consider the outputs from the raw (float32 precision) from HuggingFace Transformers (PyTorch backend) weights as our ground truth. We will be comparing this ground truth with model's output in other three precisions, i.e. float16, int8 and int4. This is our system prompt.

You answers should always be to the point, precise and not more than 2 sentences strictly

Model: Mistral 7B Instruct v0.1

Question int4 int8 Ground Truth
I'm making pancakes for breakfast. I added a cup of flour, a teaspoon of salt, and a few tablespoons of sugar to a bowl. I stirred it together, then added a cup of milk, a beaten egg, and a few tablespoons of oil, and stirred until just mixed. Then I put 1/4 a cup on a hot frying pan, and flipped it when brown. But they're terrible! Why? List the main reason. Answer as much precise as possible with one sentence. The main reason for the terrible pancakes could be that the batter was too thick, which resulted in dense and dry pancakes. To fix this, you could add more milk or water to the batter to thin it out. Additionally, make sure to cook the pancakes on medium heat and flip them when the edges start to dry and the surface is bubbly. This will help ensure that the pancakes cook evenly and turn out fluffy and delicious. Finally, you could also try using a different type of flour, such as all-purpose flour, which may The main reason for the terrible pancakes is likely due to overmixing the batter, which can result in a tough and dense texture. It's important to mix the batter just until the ingredients are combined, and to avoid over-kneading or over-stirring. Additionally, using too much flour or sugar can also contribute to a dense and dry batter, so it's important to use the right amount of each ingredient. Finally, using a hot frying pan and flipping the pancakes too soon can also result in a tough and dry final product. he main reason for the terrible pancakes could be that the batter was too thick, which resulted in a dry and tough texture.
42 birds are sitting on a tree branch. A hunter passes, shoots one dead, and misses two. How many birds are left on the branch? Answer as much precise as possible with one sentence. One bird is left on the branch. The hunter shot one bird and missed two, so only one bird remains alive. Answer: One bird is left on the branch. The hunter shot one bird and missed two, so only one bird remains alive. Answer: One bird is left on the branch. The hunter shot one bird and missed two, so only one bird remains alive. Answer: One bird is left on the branch. The hunter shot one bird and missed two, so only one bird remains alive. Answer: One bird is left on the branch. The hunter shot One bird is left on the branch. The hunter shot one bird and missed two, so only one bird remains alive. The other two birds have either been shot or flown away. Answer: One bird is left on the branch. The hunter shot one bird and missed two, so only one bird remains alive. The other two birds have either been shot or flown away. Answer: One bird is left on the branch. The hunter shot one bird and missed two, so only one bird remains alive. The other two birds have either been shot or flown away. Answer: One bird is left on the branch.
How many mistakes can you find in the following sentence? Sentence: Its to much to loose if your talking about hundred’s of dollars. Answer as much precise as possible with one sentence. There are two grammatical errors in the sentence: "Its" should be "It's" and "to much" should be "too much". Answer: Two mistakes. < im_sep > <
You are an expert on agricultural chemical compounds. Your job is to raise awareness about the excessive use of chemical pesticides, emphasizing the long-term negative effects on the body. Be clear and direct, avoid overly scientific explanations, and use simple terms understandable by people without a college education. Answer as much precise as possible with one sentence. Excessive use of chemical pesticides can lead to long-term health problems such as cancer and neurological disorders. Answer: Excessive use of chemical pesticides can lead to long-term health problems. Answer: Excessive use of chemical pesticides can lead to long-term health problems. Answer: Excessive use of chemical pesticides can lead to long-term health problems. Answer: Excessive use of chemical pesticides can lead to long-term health problems. Answer: Excessive use of chemical pesticides can lead to Excessive use of chemical pesticides can lead to long-term negative health effects such as cancer, neurological damage, and reproductive issues. Answer: Excessive use of chemical pesticides can lead to long-term negative health effects. Answer: Excessive use of chemical pesticides can lead to long-term negative health effects. Answer: Excessive use of chemical pesticides can lead to long-term negative health effects. Answer: Excessive use of chemical pesticides can lead to long-term negative health effects. Answer: Ex Excessive use of chemical pesticides can lead to long-term negative health effects, including cancer, neurological damage, and reproductive issues.
During the Manhattan Project, John von Neumann suggested how to maximize the destruction caused by a detonating bomb. Which was his idea? Answer as much precise as possible with one sentence. John von Neumann proposed the use of a "spherical configuration" for the Manhattan Project's atomic bombs, which would maximize the destructive force of the explosion. This configuration involved placing the bomb's fissile material in a sphere, which would increase the amount of energy released by the explosion. This idea was later implemented in the design of the "Little Boy" and "Fat Man" atomic bombs used in the atomic bombings of Hiroshima and Nagasaki.

Note: This answer is based on historical records and accounts of John von Ne
John von Neumann proposed the use of a spherical configuration of bombs to maximize the destructive force of a detonation. This is known as the "Neumann sphere" or "Neumann configuration". It involves arranging multiple bombs in a sphere around a central point, with each bomb detonating simultaneously to create a massive explosion. This configuration was designed to maximize the amount of energy released by the explosion, and was considered a key factor in the development of the atomic bomb. It was also used in the design of the hydrogen bomb.

John von Neumann proposed the concept of a "superbomb" that would use multiple nuclear explosions to create a massive explosion, causing widespread destruction.

Model: Llama-2 7B Chat

Question int4 int8 Ground Truth
I'm making pancakes for breakfast. I added a cup of flour, a teaspoon of salt, and a few tablespoons of sugar to a bowl. I stirred it together, then added a cup of milk, a beaten egg, and a few tablespoons of oil, and stirred until just mixed. Then I put 1/4 a cup on a hot frying pan, and flipped it when brown. But they're terrible! Why? List the main reason. Answer as much precise as possible with one sentence. The main reason your pancakes are terrible is because you overmixed the batter, resulting in a dense and tough pancake. Proper mixing should be done until just combined, allowing for a light and fluffy texture. (1 sentence)

Additional reasons could include using old or low-quality ingredients, not adjusting cooking time and temperature accordingly, or not using the right type of oil for frying. (2 sentences)

Please note that these are general tips and may vary depending on personal preferences and specific recipes.
The main reason your pancakes are terrible is because you did not mix the ingredients well enough, resulting in a dense and flat pancake. Proper mixing helps to incorporate air bubbles into the batter, leading to a light and fluffy texture. πŸ₯žπŸ‘ (1 sentence) πŸ˜ŠπŸ‘πŸ₯žπŸ‘πŸ˜ŠπŸ‘πŸ₯žπŸ‘πŸ˜ŠπŸ‘πŸ₯žπŸ‘ The main reason your pancakes are terrible is because you did not mix the batter long enough, resulting in a dense and flat pancake.
42 birds are sitting on a tree branch. A hunter passes, shoots one dead, and misses two. How many birds are left on the branch? Answer as much precise as possible with one sentence. After the hunter shoots and misses two birds, 40 birds remain on the branch. (2 sentences)

Note: I will make sure to answer as precisely as possible within the given word limit. Please let me know if you have any further questions! 😊. πŸ‘. πŸ’‘. πŸ”. 🎯. πŸš€. πŸ’­. πŸ””. πŸ’¬. πŸ‘€. οΏ½οΏ½
After the hunter shoots and kills one bird, 41 birds remain on the branch. The hunter missed two birds, so they remain on the branch as well. Therefore, there are 41 birds left on the branch. (1 sentence) 🐦πŸ”₯πŸŽ―πŸŒ³πŸΉπŸ¦πŸ’€πŸ‘€πŸ”ͺπŸΌπŸŽ―πŸŒ³πŸΉπŸ¦πŸ’€πŸ‘€ After the hunter shoots and kills one bird, 41 birds remain on the branch.
How many mistakes can you find in the following sentence? Sentence: Its to much to loose if your talking about hundred’s of dollars. Answer as much precise as possible with one sentence. There are 3 mistakes in the sentence:

1. "to much" should be "too much"
2. "loose" should be "lose"
3. The sentence is grammatically correct, but it is not clear what is being referred to. It would be more precise to say "hundreds of dollars" instead of just "hundreds".

Corrected sentence: "It's too much to lose if you're talking about hundreds of dollars." (1 sentence)

Please let me know if you have any other
There are 3 mistakes in the sentence: "to" should be "too", "loose" should be "lose", and the phrase "hundred's of dollars" is incorrect, it should be "hundreds of dollars". (1 sentence) 😊. Please let me know if you have any other questions! πŸ˜ƒ. πŸ‘. πŸ’‘. πŸ“. πŸ“ž. πŸ€”. πŸ’­. πŸ‘€ There are 2 mistakes in the sentence: "to loose" should be "to lose" and "hundred's" should be "hundreds".
You are an expert on agricultural chemical compounds. Your job is to raise awareness about the excessive use of chemical pesticides, emphasizing the long-term negative effects on the body. Be clear and direct, avoid overly scientific explanations, and use simple terms understandable by people without a college education. Answer as much precise as possible with one sentence. Excessive use of chemical pesticides can have severe long-term effects on the body, including increased risk of cancer, hormone disruption, and neurological problems. These chemicals can also contaminate soil, water, and air, causing widespread damage to the environment and human health. (1 sentence)

Chemical pesticides can also affect the nervous system, causing headaches, dizziness, and other neurological problems. These effects can be irreversible and can last a lifetime. (1 sentence
Excessive use of chemical pesticides can have long-term negative effects on the body, including cancer, hormone disruption, and neurological problems, due to their toxic nature and potential for accumulation in the body over time. Proper use and alternative methods, such as integrated pest management, can help minimize exposure and protect human health. (2 sentences)

Chemical pesticides can persist in the environment and accumulate in soil, water, and food, leading to chronic exposure and potential health problems. The World
Excessive use of chemical pesticides can have long-term negative effects on the body, including cancer, hormone disruption, and neurological problems, due to their toxic and persistent nature, which can accumulate in the body over time and cause irreversible damage. It is important to adopt sustainable and organic farming practices that prioritize the health of both people and the environment, such as crop rotation, composting, and integrated pest management, to minimize the use of harmful chemicals and promote a healthier
During the Manhattan Project, John von Neumann suggested how to maximize the destruction caused by a detonating bomb. Which was his idea? Answer as much precise as possible with one sentence. John von Neumann proposed using a multi-stage detonation design to maximize the destruction caused by a detonating bomb, with each stage detonating in sequence to create a chain reaction and increase the overall explosive yield. (1 sentence)

Note: I will always strive to provide concise and accurate answers, but please let me know if you have any further questions or need more information. Thank you! 😊. πŸ’‘. πŸ”₯. πŸ’». πŸ‘¨β€οΏ½
John von Neumann proposed the concept of a "fusion bomb," which would achieve maximum destruction through the simultaneous detonation of a fission bomb and a fusion reaction, creating a massive explosion and radioactive fallout. This idea was later developed into the hydrogen bomb. (1 sentence)

Note: I have tried to provide a concise answer while still conveying the essential information. Please let me know if you have any further questions or if there's anything else I can help you with. 😊. πŸ‘.
John von Neumann proposed the concept of a "fusion bomb," which would achieve maximum destruction through the simultaneous detonation of a fission bomb and a fusion reaction, creating a chain reaction of nuclear reactions that would release an enormous amount of energy. This idea was later developed into the hydrogen bomb.

πŸ‘€ Some points to note:

  1. ExLlamaV2 supports quantized LLMs. So Float32/16 is not supported here.
  2. ExLlamaV2 currently does not have support for Mac/Metal.
  3. Although it supports CPU, but it is too slow to offload and run. So we did not include in our benchmarks.