vllm-project · eldarkurtic · Jul 26, 2024 · Jul 25, 2024
diff --git a/examples/quantization_w8a8_int8/README.md b/examples/quantization_w8a8_int8/README.md
@@ -22,7 +22,7 @@ The example includes an end-to-end script for applying the quantization algorith
 python3 llama3_example.py
 ```
 
-The resulting model `Meta-Llama-3-8B-Instruct-W8A8-INT8` is ready to be loaded into vLLM.
+The resulting model `Meta-Llama-3-8B-Instruct-W8A8-Dynamic-Per-Token` is ready to be loaded into vLLM.
 
 ## Code Walkthough
 
@@ -49,7 +49,7 @@ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
 
 ### 2) Prepare Calibration Data
 
-Prepare the calibration data. When quantizing activations of a model to `fp8`, we need some sample data to estimate the activation scales. As a result, it is very useful to use calibration data that closely matches the type of data used in deployment. If you have fine-tuned a model, using a sample of your training data is a good idea.
+Prepare the calibration data. When quantizing activations of a model to `int8`, we need some sample data to estimate the activation scales. As a result, it is very useful to use calibration data that closely matches the type of data used in deployment. If you have fine-tuned a model, using a sample of your training data is a good idea.
 
 In our case, we are quantizing an Instruction tuned generic model, so we will use the `ultrachat` dataset. Some best practices include:
 * 512 samples is a good place to start (increase if accuracy drops)
@@ -114,15 +114,15 @@ model.save_pretrained(SAVE_DIR, save_compressed=True)
 tokenizer.save_pretrained(SAVE_DIR)
 ```
 
-We have successfully created an `fp8` model!
+We have successfully created an `w8a8` model with weights and activations quantized to 8-bit integers!
 
 ### 4) Evaluate Accuracy
 
 With the model created, we can now load and run in vLLM (after installing).
 
 ```python
 from vllm import LLM
-model = LLM("./Meta-Llama-3-8B-Instruct-W8A8-FP8")
+model = LLM("./Meta-Llama-3-8B-Instruct-W8A8-Dynamic-Per-Token")
 ```
 
 We can evaluate accuracy with `lm_eval` (`pip install lm_eval==v0.4.3`):