tllm: A Journey to create your own llm inference server

在这个旅程中，您将学习关于LLM推理、KV缓存、静态批处理和连续批处理...的内容，并最终学会如何创建自己的LLM推理服务器。

You will learn about LLM inference, KV cache, static batching, and continuous batching... in this journey, and how to create your own LLM inference server eventually.

目录概要 Course Outline

gpt2推理
LLM采样方法：top-P和top-K
KV cache加速推理
静态批处理
连续批处理
Tensor设备并行
PD设备分离
推测解码

gpt2 infer
LLM sampling methods: Top-p and Top-k
KV cache accelerates inference
Static batching
Continous batching
Tensor Device Parallelis
PD Device Parallelism
Speculative decoding

模型下载 Download model

访问魔搭社区并下载 pytorch_model.bin 文件。
将下载的文件移动到 model/ 文件夹中。
将文件重命名为 gpt2_pytorch_model.bin。

Go to Huggging Face and download the pytorch_model.bin file.
Move the downloaded file to the model/ folder.
Rename the fileto gpt2_pytorch_model.bin.

运行项目示例 Run the project example

conda create --name tllm python=3.10

conda activate tllm

pip install -r requirement.txt

cd chapter1_gpt2_infer/

python main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

tllm: A Journey to create your own llm inference server

目录概要 Course Outline

模型下载 Download model

运行项目示例 Run the project example

Files

README.md

Latest commit

History

README.md

File metadata and controls

tllm: A Journey to create your own llm inference server

目录概要 Course Outline

模型下载 Download model

运行项目示例 Run the project example