Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mplug-owl3-7b-chat fine-tuning document #1969

Open
Jintao-Huang opened this issue Sep 7, 2024 · 17 comments
Open

mplug-owl3-7b-chat fine-tuning document #1969

Jintao-Huang opened this issue Sep 7, 2024 · 17 comments
Labels
good first issue Good for newcomers

Comments

@Jintao-Huang
Copy link
Collaborator

Jintao-Huang commented Sep 7, 2024

Model:

Usually, fine-tuning a multimodal large model involves using a custom dataset for fine-tuning. Here, we will demonstrate a runnable demo.

Fine-tuned Dataset:

Before starting the fine-tuning, please ensure that your environment is properly prepared.

git clone https://github.com/modelscope/ms-swift.git
cd swift
pip install -e .[llm]

pip install decord icecream

Inference

# ModelScope
CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type mplug-owl3-7b-chat \
  --model_id_or_path iic/mPLUG-Owl3-7B-240728 \

# HuggingFace
USE_HF=1 CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type mplug-owl3-7b-chat \
  --model_id_or_path mPLUG/mPLUG-Owl3-7B-240728 \

Results

<<< who are you
I am an AI language model, designed to assist with a variety of tasks such as answering questions and providing information. I do not have a physical form, but rather exist as a program running on a computer. Is there anything specific you would like me to help you with?
--------------------------------------------------
<<< <image>describe the image
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
[INFO:swift] Setting max_num_frames: 16. You can adjust this hyperparameter through the environment variable: `MAX_NUM_FRAMES`.
This is a very cute photo of a kitten! The kitten has beautiful blue eyes and a very fluffy coat. It's adorable to see how it looks at the camera. The colors in the photo are very natural and well-balanced, which adds to the overall cuteness of the image. Great job capturing this adorable moment!
--------------------------------------------------
<<< clear
<<< <video>describe the video
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
The video captures a young child's interest in reading and learning, as they are seen sitting on a bed and flipping through the pages of a book while wearing glasses. The child appears to be engaged and curious about the content of the book.

GPU Memory:
截屏2024-09-07 13 11 33

@Jintao-Huang
Copy link
Collaborator Author

image fine-tuning

The format of the custom dataset is as follows (single image, multiple images, and no image):

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee<image>eeeee<image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "images": []}

Fine-tuning script:

# ModelScope
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type mplug-owl3-7b-chat \
  --model_id_or_path iic/mPLUG-Owl3-7B-240728 \
  --sft_type lora \
  --dataset coco-en-mini#20000 \
  --deepspeed default-zero2 \
  --output_dir output \
  --num_train_epochs 5

If you want to use a custom dataset, simply specify as follows:

  --dataset train.jsonl \
  --val_dataset val.jsonl \

Here is the inference script after fine-tuning, we perform inference on the automatically segmented validation set:

# If using HuggingFace, please add: `USE_HF=1`
# inference only
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/mplug-owl3-7b-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --show_dataset_sample 10

# merge-lora & inference
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/mplug-owl3-7b-chatt/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true --show_dataset_sample 10

video fine-tuning

The format of the custom dataset is as follows:

{"query": "<video>55555", "response": "66666", "videos": ["video_path"]}
{"query": "eeeee<video>eeeee<video>eeeee", "response": "fffff", "history": [], "videos": ["video_path1", "video_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "videos": []}

Fine-tuning script:

# ModelScope
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type mplug-owl3-7b-chat \
  --model_id_or_path iic/mPLUG-Owl3-7B-240728 \
  --sft_type lora \
  --dataset video-chatgpt \
  --deepspeed default-zero2 \
  --output_dir output \
  --num_train_epochs 5

@Jintao-Huang Jintao-Huang added the good first issue Good for newcomers label Sep 7, 2024
@Jintao-Huang Jintao-Huang changed the title mplug-owl3-7b-chat fine-tuning best practices. mplug-owl3-7b-chat fine-tuning document Sep 7, 2024
@ozhyo
Copy link

ozhyo commented Sep 13, 2024

你好,试了下image fine-tuning的示例代码,发现训练过程中模型并没有用到image和media offset,data_collator貌似忽略了这两个值,导致模型在forward的过程中并没有用到图像

@Jintao-Huang
Copy link
Collaborator Author

我修复一下

@Jintao-Huang Jintao-Huang added the bug Something isn't working label Sep 14, 2024
@Jintao-Huang Jintao-Huang mentioned this issue Sep 14, 2024
1 task
@Jintao-Huang Jintao-Huang removed the bug Something isn't working label Sep 17, 2024
@Jintao-Huang
Copy link
Collaborator Author

你好,试了下image fine-tuning的示例代码,发现训练过程中模型并没有用到image和media offset,data_collator貌似忽略了这两个值,导致模型在forward的过程中并没有用到图像

fixed

@ozhyo
Copy link

ozhyo commented Sep 19, 2024

你好,感谢修复。
我测试了下,现在已经能利用image和media offset进行模型forward和训练了,但是只能开启batch_size=1,貌似因为media offset并没有进行padding,导致长度不匹配从而无法组成batch

@goodstudent9
Copy link

确实,我也试了,不能开起batchsize = 2
[rank0]: Original Traceback (most recent call last): [rank0]: File "/home/project/tools/anaconda3/envs/owl3/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop [rank0]: data = fetcher.fetch(index) # type: ignore[possibly-undefined] [rank0]: File "/home/project/tools/anaconda3/envs/owl3/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch [rank0]: return self.collate_fn(data) [rank0]: File "/home/project/ruohangxu/ms-swift/swift/llm/utils/template.py", line 3318, in data_collator [rank0]: res['media_offset'] = torch.concat(media_offset) [rank0]: RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 122 but got size 124 for tensor number 1 in the list.

@Jintao-Huang
Copy link
Collaborator Author

确实,我也试了,不能开起batchsize = 2 [rank0]: Original Traceback (most recent call last): [rank0]: File "/home/project/tools/anaconda3/envs/owl3/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop [rank0]: data = fetcher.fetch(index) # type: ignore[possibly-undefined] [rank0]: File "/home/project/tools/anaconda3/envs/owl3/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch [rank0]: return self.collate_fn(data) [rank0]: File "/home/project/ruohangxu/ms-swift/swift/llm/utils/template.py", line 3318, in data_collator [rank0]: res['media_offset'] = torch.concat(media_offset) [rank0]: RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 122 but got size 124 for tensor number 1 in the list.

是的 不支持batch_size=2, 我也不知道该如何支持。我尝试使用padding,但会在owl3的代码中抛错误.

@goodstudent9
Copy link

goodstudent9 commented Sep 22, 2024 via email

@goodstudent9
Copy link

goodstudent9 commented Sep 22, 2024 via email

@Jintao-Huang
Copy link
Collaborator Author

我把代码改成可以支持并行了,但是我不太确信效果如何,我正在验证我的代码和batch size=1时候在你提供的那个训练集上面的效果,如果可以的话,我再来和你交流哈

如果发现效果可以,欢迎提供PR哈

@goodstudent9
Copy link

goodstudent9 commented Sep 23, 2024 via email

@ozhyo
Copy link

ozhyo commented Sep 23, 2024

有图的时候padding [0, 0],无图的时候则全为[0, -1000000](这里任意负值都可以),是这样吗?

@goodstudent9
Copy link

goodstudent9 commented Sep 23, 2024 via email

@Jintao-Huang
Copy link
Collaborator Author

PR: #2100

@goodstudent9
Copy link

Found some bugs when I do full parameters finetune.
Mplug-owl3 means a lot to me, thank you so much for your valuable help!

#2158

@goodstudent9
Copy link

Found some bugs when I do full parameters finetune. Mplug-owl3 means a lot to me, thank you so much for your valuable help!

#2158

solved!

@goodstudent9
Copy link

#2172 (comment)
Error will occur when inference with images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants