Skip to content

Commit

Permalink
14B模型+DeepSpeed流水线并行
Browse files Browse the repository at this point in the history
  • Loading branch information
Coobiw committed Feb 23, 2024
1 parent fd451c2 commit 1fa02d0
Show file tree
Hide file tree
Showing 13 changed files with 685 additions and 60 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# pycharm
.idea/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
64 changes: 57 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

- [Minigpt4Qwen](#minigpt4qwen)
- [](#)
- [附属项目](#附属项目)
- [Introduction](#introduction)
- [所需计算资源](#所需计算资源)
Expand All @@ -13,23 +14,34 @@
- [数据准备](#数据准备)
- [config文件的书写](#config文件的书写)
- [运行train.py](#运行trainpy)
- [DeepSpeed](#deepspeed)
- [DeepSpeed](#deepspeed)
- [DeepSpeed训练](#deepspeed训练)
- [DeepSpeed推理](#deepspeed推理)
- [MiniGPT4Qwen-14B的训练](#minigpt4qwen-14b的训练)
- [2张3090 24GB + DeepSpeed流水线并行](#2张3090-24gb--deepspeed流水线并行)
- [MiniGPT4Qwen-14B的推理](#minigpt4qwen-14b的推理)
- [权重转换](#权重转换)
- [CPU推理](#cpu推理)
- [Minigpt4Qwen对话示例](#minigpt4qwen对话示例)
- [命令行demo(cli\_demo)](#命令行democli_demo)
- [webui demo](#webui-demo)
- [Acknowledgement](#acknowledgement)
- [FAQ](#faq)
- [复现时比checkpoint中的log的loss大一个数量级的问题](#复现时比checkpoint中的log的loss大一个数量级的问题)
- [License](#license)

(似乎被爱可可老师转发了🥹,感谢大家关注!后续有空会加入更强LLM(先试试14B吧)和更多的数据,具体会根据后续的资源来看
(似乎被爱可可老师转发了🥹,感谢大家关注!后续有空会加入更强LLM(先试试Qwen-14B吧)和更多的数据,具体会根据后续的资源来看


# Minigpt4Qwen

知乎博客:https://zhuanlan.zhihu.com/p/664612306

已经支持DeepSpeed!
**已经支持Qwen-14B模型在2张RTX3090 24GB上的deepspeed流水线并行训练!**

![](./assets/maimai.png)
========
![](./assets/image-20240223030335224.png)

## 附属项目

Expand All @@ -42,7 +54,8 @@
- deepspeed tutorials:https://github.com/Coobiw/MiniGPT4Qwen/tree/master/deepspeed_tutorials
- 知乎:https://zhuanlan.zhihu.com/p/673359684

- 现在已经支持deepspeed的训练(使用deepspeed runner)
- 支持deepspeed的训练(使用deepspeed runner)
- 支持Qwen-14B模型在2张RTX3090 24GB上的deepspeed流水线并行训练

## Introduction

Expand All @@ -58,10 +71,10 @@

## TODO LIST

- [ ] 支持Qwen-14B-Chat的训练
- [x] 支持deepspeed的流水线并行
- [x] 支持Qwen-14B-Chat的训练
- [ ] 支持MME Benchmark的测评
- [x] 支持deepspeed
- [ ] 支持pytorch原生FSDP(可能搁置,因为实现了deepspeed,而且fsdp个人认为不怎么好用)
- [x] 开放gradio WebUI demo
- [X] 开放所用数据集和checkpoint
- [X] 开放源代码
Expand Down Expand Up @@ -111,9 +124,10 @@ wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BL
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_flant5xxl.pth
```

2.下载Qwen7B-chat的权重
2.下载Qwen-7B-chat/Qwen-14B-Chat的权重

[Qwen-7B-chat huggingface](https://huggingface.co/Qwen/Qwen-7B-Chat)
[Qwen-14B-chat huggingface](https://huggingface.co/Qwen/Qwen-14B-Chat)

3.下载本模型的checkpoint(建议放入 `lavis/output/`)

Expand Down Expand Up @@ -242,6 +256,42 @@ python deepspeed2pth.py --ckpt_dir lavis/output/deepspeed/lr1e-4_4x3090/20231220

接着就可以用该`.pth`文件去使用`cli_demo.py``webui_demo.py`进行聊天啦~

## MiniGPT4Qwen-14B的训练
本项目使用3090显卡,每张24GB的显存,14GB的模型,不计算其他任何开销,在16bits(fp16/bf16)的情况下,也至少需要14 $\times$ 2 = 28 GB的显存,并不能符合现有的硬件条件

**方案:流水线并行(模型按layer粒度进行划分,一些layer在GPU0,一些layer在GPU1,是串行进行计算的,也是一种模型并行的方案)**

### 2张3090 24GB + DeepSpeed流水线并行
p.s.:如今暂时只支持并行在2张显卡上

训练命令:
```
# num_stages代表并行的卡数,如今只支持2
python -m torch.distributed.run --nproc_per_node=2 train_pipeline.py --cfg-path lavis/projects/pp_qwen14b/train_pp.yaml --num-stages 2
```

## MiniGPT4Qwen-14B的推理

### 权重转换
`llm_proj`层的参数提取出来,转换成pth
```
python pipe_proj2pth.py --ckpt_dir xxx
```

### CPU推理
由于3090无法放下14B模型,所以这里采用CPU进行推理

命令行demo:
```
python cli_demo.py --model-type qwen14b_chat -c xxx.pth --cpu-only
```

gradio webui demo:
```
python webui_demo.py --model-type qwen14b_chat -c xxx.pth --cpu-only
```


## Minigpt4Qwen对话示例

### 命令行demo(cli_demo)
Expand Down
Binary file added assets/image-20240223030335224.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion cli_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def _load_model_processor(args):
global load_model_and_preprocess
load_model_and_preprocess = partial(load_model_and_preprocess,is_eval=True,device=device_map)

model, vis_processors, _ = load_model_and_preprocess("minigpt4qwen", "qwen7b_chat")
model, vis_processors, _ = load_model_and_preprocess("minigpt4qwen", args.model_type)
model.load_checkpoint(args.checkpoint_path)

generation_config = {
Expand Down Expand Up @@ -128,6 +128,7 @@ def _get_image_input():
def main():
parser = argparse.ArgumentParser(
description='QWen-Chat command-line interactive chat demo.')
parser.add_argument("--model-type",type=str,default='qwen7b_chat',choices=['qwen7b_chat','qwen14b_chat'])
parser.add_argument("-c", "--checkpoint-path", type=str,
help="Checkpoint name or path, default to %(default)r")
parser.add_argument("-s", "--seed", type=int, default=42, help="Random seed")
Expand Down
60 changes: 60 additions & 0 deletions lavis/configs/models/minigpt4qwen/minigpt4qwen-14b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Copyright (c) 2022, salesforce.com, inc.
# All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
# For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause

model:
arch: minigpt4_qwen7b-chat
load_finetuned: False
load_pretrained: True

# pretrained: "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/InstructBLIP/blip2_pretrained_flant5xxl.pth"
pretrained: "ckpt/blip2/blip2_pretrained_flant5xxl.pth"
finetuned: ""

# vit encoder
vit_model: "eva_clip_g"
image_size: 224
drop_path_rate: 0
use_grad_checkpoint: False
vit_precision: "fp16"
freeze_vit: True
unfreeze_pos_embed: False

# Q-Former
num_query_token: 32
qformer_text_input: False
freeze_qformer: True
freeze_queries: True

# projection
freeze_proj: False

# path to Vicuna checkpoint
llm_model: "ckpt/Qwen-14B-Chat"

# lora config
get_lora: False
lora_alpha: 32
lora_r: 8
lora_dropout: 0.05

# text length when training
max_txt_len: 512


preprocess:
vis_processor:
train:
name: "blip2_image_train"
image_size: 224
eval:
name: "blip_image_eval"
image_size: 224
text_processor:
train:
name: "blip_caption"
max_words: 100
eval:
name: "blip_caption"
max_words: 100
5 changes: 4 additions & 1 deletion lavis/models/minigpt4qwen_models/minigpt4qwen.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ class Minigpt4Qwen(Blip2Base):

PRETRAINED_MODEL_CONFIG_DICT = {
"qwen7b_chat": "configs/models/minigpt4qwen/minigpt4qwen.yaml",
"qwen14b_chat": "configs/models/minigpt4qwen/minigpt4qwen-14b.yaml",
}

def __init__(
Expand Down Expand Up @@ -142,8 +143,10 @@ def __init__(
config=llm_config,
cache_dir=registry.get_path("cache_root"),
trust_remote_code=True,
device_map='cuda',
# device_map='cuda',
device_map='cpu',
)
# self.llm_model.transformer.gradient_checkpointing = True # 打开llm的gradient checkpointing

self.llm_tokenizer.pad_token_id = self.llm_tokenizer.eod_id
self.replace_image_token_id = self.llm_tokenizer("<|extra_0|>").input_ids[0]
Expand Down
Loading

0 comments on commit 1fa02d0

Please sign in to comment.