Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize the user guide and update the get_started section #2038

Merged
merged 30 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 14 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ ______________________________________________________________________

- \[2024/08\] 🔥🔥 LMDeploy is integrated into [modelscope/swift](https://github.com/modelscope/swift) as the default accelerator for VLMs inference
- \[2024/07\] 🎉🎉 Support Llama3.1 8B, 70B and its TOOLS CALLING
- \[2024/07\] Support [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) full-series models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/serving/api_server_tools.md) of InternLM2.5
- \[2024/07\] Support [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) full-series models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/llm/api_server_tools.md) of InternLM2.5
- \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
- \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
- \[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5, LLaVa, InternLMXComposer2
Expand All @@ -39,8 +39,8 @@ ______________________________________________________________________
- \[2024/03\] Support DeepSeek-VL offline inference pipeline and serving.
- \[2024/03\] Support VLM offline inference pipeline and serving.
- \[2024/02\] Support Qwen 1.5, Gemma, Mistral, Mixtral, Deepseek-MOE and so on.
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE) seamless integration with [LMDeploy Serving Service](./docs/en/serving/api_server.md).
- \[2024/01\] Support for multi-model, multi-machine, multi-card inference services. For usage instructions, please refer to [here](./docs/en/serving/proxy_server.md)
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE) seamless integration with [LMDeploy Serving Service](docs/en/llm/api_server.md).
- \[2024/01\] Support for multi-model, multi-machine, multi-card inference services. For usage instructions, please refer to [here](docs/en/llm/proxy_server.md)
- \[2024/01\] Support [PyTorch inference engine](./docs/en/inference/pytorch.md), developed entirely in Python, helping to lower the barriers for developers and enable rapid experimentation with new features and technologies.

</details>
Expand Down Expand Up @@ -167,19 +167,16 @@ They differ in the types of supported models and the inference data type. Please

## Installation

Install lmdeploy with pip ( python 3.8+) or [from source](./docs/en/build.md)
It is recommended installing lmdeploy using pip in a conda environment (python 3.8 - 3.12):

```shell
conda create -n lmdeploy python=3.8 -y
conda activate lmdeploy
pip install lmdeploy
```

Since v0.3.0, The default prebuilt package is compiled on **CUDA 12**. However, if CUDA 11+ is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.5.3
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
For more information on installing on CUDA 11+ platform, or for instructions on building from source, please refer to the [installation guide](./docs/en/installation.md).

## Offline Batch Inference

Expand All @@ -195,7 +192,7 @@ print(response)
>
> `export LMDEPLOY_USE_MODELSCOPE=True`

For more information about inference pipeline, please refer to [here](./docs/en/inference/pipeline.md).
For more information about inference pipeline, please refer to [here](docs/en/llm/pipeline.md).

# Tutorials

Expand All @@ -204,10 +201,10 @@ Please review [getting_started](./docs/en/get_started.md) section for the basic
For detailed user guides and advanced guides, please refer to our [tutorials](https://lmdeploy.readthedocs.io/en/latest/):

- User Guide
- [LLM Inference pipeline](./docs/en/inference/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM Inference pipeline](./docs/en/inference/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLM Serving](docs/en/serving/api_server.md)
- [VLM Serving](docs/en/serving/api_server_vl.md)
- [LLM Inference pipeline](docs/en/llm/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM Inference pipeline](docs/en/multi_modal/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLM Serving](docs/en/llm/api_server.md)
- [VLM Serving](docs/en/multi_modal/api_server_vl.md)
- [Quantization](docs/en/quantization)
- Advance Guide
- [Inference Engine - TurboMind](docs/en/inference/turbomind.md)
Expand All @@ -216,7 +213,7 @@ For detailed user guides and advanced guides, please refer to our [tutorials](ht
- [Add a new model](docs/en/advance/pytorch_new_model.md)
- gemm tuning
- [Long context inference](docs/en/advance/long_context.md)
- [Multi-model inference service](docs/en/serving/proxy_server.md)
- [Multi-model inference service](docs/en/llm/proxy_server.md)

# Third-party projects

Expand Down
31 changes: 14 additions & 17 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ ______________________________________________________________________

- \[2024/08\] 🔥🔥 LMDeployは[modelscope/swift](https://github.com/modelscope/swift)に統合され、VLMs推論のデフォルトアクセラレータとなりました
- \[2024/07\] 🎉🎉 Llama3.1 8B、70Bおよびそのツールコールをサポート
- \[2024/07\] [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e)全シリーズモデル、[InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md)およびInternLM2.5の[ファンクションコール](docs/en/serving/api_server_tools.md)をサポート
- \[2024/07\] [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e)全シリーズモデル、[InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md)およびInternLM2.5の[ファンクションコール](docs/en/llm/api_server_tools.md)をサポート
- \[2024/06\] PyTorchエンジンはDeepSeek-V2およびいくつかのVLMs、例えばCogVLM2、Mini-InternVL、LlaVA-Nextをサポート
- \[2024/05\] 複数のGPUでVLMsをデプロイする際にビジョンモデルをバランスさせる
- \[2024/05\] InternVL v1.5、LLaVa、InternLMXComposer2などのVLMsで4ビットの重みのみの量子化と推論をサポート
Expand All @@ -39,8 +39,8 @@ ______________________________________________________________________
- \[2024/03\] DeepSeek-VLのオフライン推論パイプラインとサービングをサポート
- \[2024/03\] VLMのオフライン推論パイプラインとサービングをサポート
- \[2024/02\] Qwen 1.5、Gemma、Mistral、Mixtral、Deepseek-MOEなどをサポート
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE)が[LMDeployサービングサービス](./docs/en/serving/api_server.md)とシームレスに統合されました
- \[2024/01\] 複数モデル、複数マシン、複数カードの推論サービスをサポート。使用方法は[こちら](./docs/en/serving/proxy_server.md)を参照してください
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE)が[LMDeployサービングサービス](./docs/en/llm/api_server.md)とシームレスに統合されました
- \[2024/01\] 複数モデル、複数マシン、複数カードの推論サービスをサポート。使用方法は[こちら](./docs/en/llm/proxy_server.md)を参照してください
- \[2024/01\] [PyTorch推論エンジン](./docs/en/inference/pytorch.md)をサポートし、完全にPythonで開発されており、開発者の障壁を下げ、新機能や技術の迅速な実験を可能にします

</details>
Expand Down Expand Up @@ -168,19 +168,16 @@ LMDeployは、[TurboMind](./docs/en/inference/turbomind.md)および[PyTorch](./

## インストール

pip(python 3.8+)を使用してlmdeployをインストールするか、[ソースからインストール](./docs/en/build.md)します
クリーンなconda環境(Python 3.8 - 3.12)でlmdeployをインストールすることをお勧めします。

```shell
conda create -n lmdeploy python=3.8 -y
conda activate lmdeploy
pip install lmdeploy
```

v0.3.0以降、デフォルトのプリビルドパッケージは**CUDA 12**でコンパイルされています。ただし、CUDA 11+が必要な場合は、次のコマンドでlmdeployをインストールできます:

```shell
export LMDEPLOY_VERSION=0.5.3
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
v0.3.0から、デフォルトの事前構築済みパッケージはCUDA 12でコンパイルされています。
CUDA 11+プラットフォームでのインストールに関する情報、またはソースからのビルド手順については、[インストールガイドを](docs/en/installation.md)参照してください。

## オフラインバッチ推論

Expand All @@ -196,7 +193,7 @@ print(response)
>
> `export LMDEPLOY_USE_MODELSCOPE=True`

推論パイプラインに関する詳細情報は[こちら](./docs/en/inference/pipeline.md)を参照してください。
推論パイプラインに関する詳細情報は[こちら](./docs/en/llm/pipeline.md)を参照してください。

# チュートリアル

Expand All @@ -205,10 +202,10 @@ LMDeployの基本的な使用方法については、[getting_started](./docs/en
詳細なユーザーガイドと高度なガイドについては、[チュートリアル](https://lmdeploy.readthedocs.io/en/latest/)を参照してください:

- ユーザーガイド
- [LLM推論パイプライン](./docs/en/inference/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM推論パイプライン](./docs/en/inference/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLMサービング](docs/en/serving/api_server.md)
- [VLMサービング](docs/en/serving/api_server_vl.md)
- [LLM推論パイプライン](./docs/en/llm/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM推論パイプライン](./docs/en/multi_modal/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLMサービング](docs/en/llm/api_server.md)
- [VLMサービング](docs/en/multi_modal/api_server_vl.md)
- [量子化](docs/en/quantization)
- 高度なガイド
- [推論エンジン - TurboMind](docs/en/inference/turbomind.md)
Expand All @@ -217,7 +214,7 @@ LMDeployの基本的な使用方法については、[getting_started](./docs/en
- [新しいモデルの追加](docs/en/advance/pytorch_new_model.md)
- gemmチューニング
- [長文推論](docs/en/advance/long_context.md)
- [マルチモデル推論サービス](docs/en/serving/proxy_server.md)
- [マルチモデル推論サービス](docs/en/llm/proxy_server.md)

# サードパーティプロジェクト

Expand Down
30 changes: 13 additions & 17 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ ______________________________________________________________________

- \[2024/08\] 🔥🔥 LMDeploy现已集成至 [modelscope/swift](https://github.com/modelscope/swift),成为 VLMs 推理的默认加速引擎
- \[2024/07\] 🎉🎉 支持 Llama3.1 8B 和 70B 模型,以及工具调用功能
- \[2024/07\] 支持 [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) 全系列模型,[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/serving/api_server_tools.md)
- \[2024/07\] 支持 [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) 全系列模型,[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/llm/api_server_tools.md)
- \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2,Mini-InternVL,LlaVA-Next
- \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上
- \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
Expand All @@ -39,8 +39,8 @@ ______________________________________________________________________
- \[2024/03\] 支持 DeepSeek-VL 的离线推理 pipeline 和推理服务
- \[2024/03\] 支持视觉-语言模型(VLM)的离线推理 pipeline 和推理服务
- \[2024/02\] 支持 Qwen 1.5、Gemma、Mistral、Mixtral、Deepseek-MOE 等模型
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE) 发布,支持无缝接入[LMDeploy Serving Service](./docs/zh_cn/serving/api_server.md)
- \[2024/01\] 支持多模型、多机、多卡推理服务。使用方法请参考[此处](./docs/zh_cn/serving/proxy_server.md)
- \[2024/01\] [OpenAOE](https://github.com/InternLM/OpenAOE) 发布,支持无缝接入[LMDeploy Serving Service](docs/zh_cn/llm/api_server.md)
- \[2024/01\] 支持多模型、多机、多卡推理服务。使用方法请参考[此处](docs/zh_cn/llm/proxy_server.md)
- \[2024/01\] 增加 [PyTorch 推理引擎](./docs/zh_cn/inference/pytorch.md),作为 TurboMind 引擎的补充。帮助降低开发门槛,和快速实验新特性、新技术

</details>
Expand Down Expand Up @@ -168,19 +168,15 @@ LMDeploy 支持 2 种推理引擎: [TurboMind](./docs/zh_cn/inference/turbomin

## 安装

使用 pip ( python 3.8+) 安装 LMDeploy,或者[源码安装](./docs/zh_cn/build.md)
我们推荐在一个干净的conda环境下(python3.8 - 3.12),安装 lmdeploy:

```shell
conda create -n lmdeploy python=3.8 -y
conda activate lmdeploy
pip install lmdeploy
```

自 v0.3.0 起,LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,请执行以下命令:

```shell
export LMDEPLOY_VERSION=0.5.3
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
自 v0.3.0 起,LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,或者源码安装 LMDeploy,请参考[安装文档](./docs/zh_cn/installation.md)

## 离线批处理

Expand All @@ -196,7 +192,7 @@ print(response)
>
> `export LMDEPLOY_USE_MODELSCOPE=True`

关于 pipeline 的更多推理参数说明,请参考[这里](./docs/zh_cn/inference/pipeline.md)
关于 pipeline 的更多推理参数说明,请参考[这里](docs/zh_cn/llm/pipeline.md)

# 用户教程

Expand All @@ -205,10 +201,10 @@ print(response)
为了帮助用户更进一步了解 LMDeploy,我们准备了用户指南和进阶指南,请阅读我们的[文档](https://lmdeploy.readthedocs.io/zh-cn/latest/):

- 用户指南
- [LLM 推理 pipeline](./docs/zh_cn/inference/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM 推理 pipeline](./docs/zh_cn/inference/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLM 推理服务](./docs/zh_cn/serving/api_server.md)
- [VLM 推理服务](./docs/zh_cn/serving/api_server_vl.md)
- [LLM 推理 pipeline](docs/zh_cn/llm/pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Dh-YlSwg78ZO3AlleO441NF_QP2shs95#scrollTo=YALmXnwCG1pQ)
- [VLM 推理 pipeline](docs/zh_cn/multi_modal/vl_pipeline.md) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nKLfnPeDA3p-FMNw2NhI-KOpk7-nlNjF?usp=sharing)
- [LLM 推理服务](docs/zh_cn/llm/api_server.md)
- [VLM 推理服务](docs/zh_cn/multi_modal/api_server_vl.md)
- [模型量化](./docs/zh_cn/quantization)
- 进阶指南
- [推理引擎 - TurboMind](./docs/zh_cn/inference/turbomind.md)
Expand All @@ -217,7 +213,7 @@ print(response)
- [支持新模型](./docs/zh_cn/advance/pytorch_new_model.md)
- gemm tuning
- [长文本推理](./docs/zh_cn/advance/long_context.md)
- [多模型推理服务](./docs/zh_cn/serving/proxy_server.md)
- [多模型推理服务](docs/zh_cn/llm/proxy_server.md)

# 社区项目

Expand Down
2 changes: 1 addition & 1 deletion docs/en/advance/debug_turbomind.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Turbomind is implemented in C++, which is not as easy to debug as Python. This d

## Prerequisite

First, complete the local compilation according to the commands in [Build in localhost](../build.md).
First, complete the local compilation according to the commands in [Install from source](../installation.md).

## Configure Python debug environment

Expand Down
6 changes: 1 addition & 5 deletions docs/en/benchmark/evaluate_with_opencompass.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,7 @@ In this part, we are going to setup the environment for evaluation.

### Install lmdeploy

Install lmdeploy through pip (python 3.8+). If you want to install from source, you can refer to [build.md](../build.md).

```shell
pip install lmdeploy
```
Please follow the [installation guide](../installation.md) to install lmdeploy.

### Install OpenCompass

Expand Down
2 changes: 1 addition & 1 deletion docs/en/benchmark/profile_api_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ In this section, we take [internlm/internlm-7b](https://huggingface.co/internlm/
lmdeploy serve api_server internlm/internlm-7b
```

If you would like to change the server's port or other parameters, such as inference engine, max batch size and etc., please run `lmdeploy serve api_server -h` or read [this](../serving/api_server.md) guide to get the detailed explanation.
If you would like to change the server's port or other parameters, such as inference engine, max batch size and etc., please run `lmdeploy serve api_server -h` or read [this](../llm/api_server.md) guide to get the detailed explanation.

### Profile

Expand Down
Loading
Loading