GitHub - woodx9/tllm: create your own llm inference server from scratch

tllm: A Journey to create your own llm inference server

在这个旅程中，您将学习关于LLM推理、KV缓存、静态批处理和连续批处理...的内容，并最终学会如何创建自己的LLM推理服务器。

You will learn about LLM inference, KV cache, static batching, and continuous batching... in this journey, and how to create your own LLM inference server eventually.

目录概要 Course Outline

gpt2推理
LLM采样方法：top-P和top-K
KV cache加速推理
静态批处理
连续批处理
Tensor设备并行
PD设备分离
推测解码

gpt2 infer
LLM sampling methods: Top-p and Top-k
KV cache accelerates inference
Static batching
Continous batching
Tensor Device Parallelis
PD Device Parallelism
Speculative decoding

模型下载 Download model

访问魔搭社区并下载 pytorch_model.bin 文件。
将下载的文件移动到 model/ 文件夹中。
将文件重命名为 gpt2_pytorch_model.bin。

Go to Huggging Face and download the pytorch_model.bin file.
Move the downloaded file to the model/ folder.
Rename the fileto gpt2_pytorch_model.bin.

运行项目示例 Run the project example

conda create --name tllm python=3.10

conda activate tllm

pip install -r requirement.txt

cd chapter1_gpt2_infer/

python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
chapter1_gpt2_infer		chapter1_gpt2_infer
chapter2_topP_topK_sampling		chapter2_topP_topK_sampling
chapter3_with_kv_cache		chapter3_with_kv_cache
chapter4_static_batch		chapter4_static_batch
chapter5_continuous_batch		chapter5_continuous_batch
chapter6_DP_device_parallelism		chapter6_DP_device_parallelism
chapter7_prefill_decode_device_separation		chapter7_prefill_decode_device_separation
chapter8_speculative_decoding		chapter8_speculative_decoding
img		img
model		model
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tllm: A Journey to create your own llm inference server

目录概要 Course Outline

模型下载 Download model

运行项目示例 Run the project example

About

Releases

Packages

Languages

woodx9/tllm

Folders and files

Latest commit

History

Repository files navigation

tllm: A Journey to create your own llm inference server

目录概要 Course Outline

模型下载 Download model

运行项目示例 Run the project example

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages