revert llm doc change

xorbitsai · Jan 26, 2025 · da7beeb · da7beeb
1 parent e4ce8da
commit da7beeb
Show file tree

Hide file tree

Showing 67 changed files with 362 additions and 410 deletions.
diff --git a/doc/source/models/builtin/llm/baichuan-2-chat.rst b/doc/source/models/builtin/llm/baichuan-2-chat.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** baichuan-inc/Baichuan2-7B-Chat
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat>`__, `ModelScope <https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat>`__
 
@@ -36,7 +36,7 @@ Model Spec 2 (pytorch, 13 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 13
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** baichuan-inc/Baichuan2-13B-Chat
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat>`__, `ModelScope <https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat>`__
 

diff --git a/doc/source/models/builtin/llm/c4ai-command-r-v01.rst b/doc/source/models/builtin/llm/c4ai-command-r-v01.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 35 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 35
 - **Quantizations:** none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** CohereForAI/c4ai-command-r-v01
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/CohereForAI/c4ai-command-r-v01>`__, `ModelScope <https://modelscope.cn/models/mirror013/c4ai-command-r-v01-4bit>`__
 
@@ -68,7 +68,7 @@ Model Spec 4 (pytorch, 104 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 104
 - **Quantizations:** none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** CohereForAI/c4ai-command-r-plus
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/CohereForAI/c4ai-command-r-plus>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/c4ai-command-r-plus>`__
 
@@ -100,7 +100,7 @@ Model Spec 6 (gptq, 104 Billion)
 - **Model Format:** gptq
 - **Model Size (in billions):** 104
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** alpindale/c4ai-command-r-plus-GPTQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ>`__
 

diff --git a/doc/source/models/builtin/llm/code-llama-instruct.rst b/doc/source/models/builtin/llm/code-llama-instruct.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** codellama/CodeLlama-7b-Instruct-hf
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/CodeLlama-7b-Instruct-hf>`__
 
@@ -36,7 +36,7 @@ Model Spec 2 (pytorch, 13 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 13
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** codellama/CodeLlama-13b-Instruct-hf
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/CodeLlama-13b-Instruct-hf>`__
 
@@ -52,7 +52,7 @@ Model Spec 3 (pytorch, 34 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 34
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** codellama/CodeLlama-34b-Instruct-hf
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/CodeLlama-34b-Instruct-hf>`__
 

diff --git a/doc/source/models/builtin/llm/code-llama-python.rst b/doc/source/models/builtin/llm/code-llama-python.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** TheBloke/CodeLlama-7B-Python-fp16
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/CodeLlama-7B-Python-fp16>`__, `ModelScope <https://modelscope.cn/models/Xorbits/CodeLlama-7B-Python-fp16>`__
 
@@ -36,7 +36,7 @@ Model Spec 2 (pytorch, 13 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 13
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** TheBloke/CodeLlama-13B-Python-fp16
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/CodeLlama-13B-Python-fp16>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/CodeLlama-13b-Python-hf>`__
 
@@ -52,7 +52,7 @@ Model Spec 3 (pytorch, 34 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 34
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** TheBloke/CodeLlama-34B-Python-fp16
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/CodeLlama-34B-Python-fp16>`__
 

diff --git a/doc/source/models/builtin/llm/code-llama.rst b/doc/source/models/builtin/llm/code-llama.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** TheBloke/CodeLlama-7B-fp16
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/CodeLlama-7B-fp16>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/CodeLlama-7b-hf>`__
 
@@ -36,7 +36,7 @@ Model Spec 2 (pytorch, 13 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 13
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** TheBloke/CodeLlama-13B-fp16
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/CodeLlama-13B-fp16>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/CodeLlama-13b-hf>`__
 
@@ -52,7 +52,7 @@ Model Spec 3 (pytorch, 34 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 34
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** TheBloke/CodeLlama-34B-fp16
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/CodeLlama-34B-fp16>`__, `ModelScope <https://modelscope.cn/models/AI-ModelScope/CodeLlama-34b-hf>`__
 

diff --git a/doc/source/models/builtin/llm/codegeex4.rst b/doc/source/models/builtin/llm/codegeex4.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 9 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 9
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** THUDM/codegeex4-all-9b
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/codegeex4-all-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b>`__
 

diff --git a/doc/source/models/builtin/llm/codeqwen1.5-chat.rst b/doc/source/models/builtin/llm/codeqwen1.5-chat.rst
@@ -36,7 +36,7 @@ Model Spec 2 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** Qwen/CodeQwen1.5-7B-Chat
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat>`__, `ModelScope <https://modelscope.cn/models/qwen/CodeQwen1.5-7B-Chat>`__
 
@@ -52,7 +52,7 @@ Model Spec 3 (awq, 7 Billion)
 - **Model Format:** awq
 - **Model Size (in billions):** 7
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** Qwen/CodeQwen1.5-7B-Chat-AWQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat-AWQ>`__, `ModelScope <https://modelscope.cn/models/qwen/CodeQwen1.5-7B-Chat-AWQ>`__
 

diff --git a/doc/source/models/builtin/llm/codeqwen1.5.rst b/doc/source/models/builtin/llm/codeqwen1.5.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** Qwen/CodeQwen1.5-7B
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/Qwen/CodeQwen1.5-7B>`__, `ModelScope <https://modelscope.cn/models/qwen/CodeQwen1.5-7B>`__
 

diff --git a/doc/source/models/builtin/llm/codestral-v0.1.rst b/doc/source/models/builtin/llm/codestral-v0.1.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 22 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 22
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** mistralai/Codestral-22B-v0.1
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/mistralai/Codestral-22B-v0.1>`__
 
@@ -51,7 +51,7 @@ Model Spec 3 (mlx, 22 Billion)
 
 - **Model Format:** mlx
 - **Model Size (in billions):** 22
-- **Quantizations:** 4bit
+- **Quantizations:** 4-bit
 - **Engines**: MLX
 - **Model ID:** mlx-community/Codestral-22B-v0.1-4bit
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/Codestral-22B-v0.1-4bit>`__
@@ -67,7 +67,7 @@ Model Spec 4 (mlx, 22 Billion)
 
 - **Model Format:** mlx
 - **Model Size (in billions):** 22
-- **Quantizations:** 8bit
+- **Quantizations:** 8-bit
 - **Engines**: MLX
 - **Model ID:** mlx-community/Codestral-22B-v0.1-8bit
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/Codestral-22B-v0.1-8bit>`__

diff --git a/doc/source/models/builtin/llm/deepseek-chat.rst b/doc/source/models/builtin/llm/deepseek-chat.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-llm-7b-chat
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-chat>`__
 
@@ -36,7 +36,7 @@ Model Spec 2 (pytorch, 67 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 67
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-llm-67b-chat
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-llm-67b-chat>`__
 

diff --git a/doc/source/models/builtin/llm/deepseek-coder-instruct.rst b/doc/source/models/builtin/llm/deepseek-coder-instruct.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 1_3 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 1_3
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-coder-1.3b-instruct
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-coder-1.3b-instruct>`__
 
@@ -36,7 +36,7 @@ Model Spec 2 (pytorch, 6_7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 6_7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-coder-6.7b-instruct
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-coder-6.7b-instruct>`__
 
@@ -52,7 +52,7 @@ Model Spec 3 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-coder-7b-instruct-v1.5
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5>`__
 
@@ -68,7 +68,7 @@ Model Spec 4 (pytorch, 33 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 33
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-coder-33b-instruct
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-instruct>`__
 
@@ -148,7 +148,7 @@ Model Spec 9 (gptq, 1_3 Billion)
 - **Model Format:** gptq
 - **Model Size (in billions):** 1_3
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-1.3b-instruct-GPTQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-1.3b-instruct-GPTQ>`__
 
@@ -164,7 +164,7 @@ Model Spec 10 (gptq, 6_7 Billion)
 - **Model Format:** gptq
 - **Model Size (in billions):** 6_7
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-6.7B-instruct-GPTQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-GPTQ>`__
 
@@ -180,7 +180,7 @@ Model Spec 11 (gptq, 33 Billion)
 - **Model Format:** gptq
 - **Model Size (in billions):** 33
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-33B-instruct-GPTQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-33B-instruct-GPTQ>`__
 
@@ -196,7 +196,7 @@ Model Spec 12 (awq, 1_3 Billion)
 - **Model Format:** awq
 - **Model Size (in billions):** 1_3
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-1.3b-instruct-AWQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-1.3b-instruct-AWQ>`__
 
@@ -212,7 +212,7 @@ Model Spec 13 (awq, 6_7 Billion)
 - **Model Format:** awq
 - **Model Size (in billions):** 6_7
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-6.7B-instruct-AWQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-AWQ>`__
 
@@ -228,7 +228,7 @@ Model Spec 14 (awq, 33 Billion)
 - **Model Format:** awq
 - **Model Size (in billions):** 33
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-33B-instruct-AWQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-33B-instruct-AWQ>`__
 

diff --git a/doc/source/models/builtin/llm/deepseek-coder.rst b/doc/source/models/builtin/llm/deepseek-coder.rst
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 1_3 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 1_3
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-coder-1.3b-base
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-coder-1.3b-base>`__
 
@@ -36,7 +36,7 @@ Model Spec 2 (pytorch, 6_7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 6_7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-coder-6.7b-base
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-coder-6.7b-base>`__
 
@@ -52,7 +52,7 @@ Model Spec 3 (pytorch, 7 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 7
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-coder-7b-base-v1.5
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-coder-7b-base-v1.5>`__
 
@@ -68,7 +68,7 @@ Model Spec 4 (pytorch, 33 Billion)
 - **Model Format:** pytorch
 - **Model Size (in billions):** 33
 - **Quantizations:** 4-bit, 8-bit, none
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers (vLLM only available for quantization none)
 - **Model ID:** deepseek-ai/deepseek-coder-33b-base
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-coder-33b-base>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-base>`__
 
@@ -148,7 +148,7 @@ Model Spec 9 (gptq, 1_3 Billion)
 - **Model Format:** gptq
 - **Model Size (in billions):** 1_3
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-1.3b-base-GPTQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-1.3b-base-GPTQ>`__
 
@@ -164,7 +164,7 @@ Model Spec 10 (gptq, 6_7 Billion)
 - **Model Format:** gptq
 - **Model Size (in billions):** 6_7
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-6.7B-base-GPTQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-6.7B-base-GPTQ>`__
 
@@ -180,7 +180,7 @@ Model Spec 11 (gptq, 33 Billion)
 - **Model Format:** gptq
 - **Model Size (in billions):** 33
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-33B-base-GPTQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-33B-base-GPTQ>`__
 
@@ -196,7 +196,7 @@ Model Spec 12 (awq, 1_3 Billion)
 - **Model Format:** awq
 - **Model Size (in billions):** 1_3
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-1.3b-base-AWQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-1.3b-base-AWQ>`__
 
@@ -212,7 +212,7 @@ Model Spec 13 (awq, 6_7 Billion)
 - **Model Format:** awq
 - **Model Size (in billions):** 6_7
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-6.7B-base-AWQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-6.7B-base-AWQ>`__
 
@@ -228,7 +228,7 @@ Model Spec 14 (awq, 33 Billion)
 - **Model Format:** awq
 - **Model Size (in billions):** 33
 - **Quantizations:** Int4
-- **Engines**: Transformers
+- **Engines**: vLLM, Transformers
 - **Model ID:** TheBloke/deepseek-coder-33B-base-AWQ
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/TheBloke/deepseek-coder-33B-base-AWQ>`__