PaddlePaddle · westfish · Jun 6, 2024 · May 15, 2024 · May 20, 2024 · May 20, 2024
diff --git a/ppdiffusers/examples/Open-Sora/README.md b/ppdiffusers/examples/Open-Sora/README.md
@@ -0,0 +1,121 @@
+# hpcAI/Open-Sora训练与推理支持
+## 1. 简介
+
+[hpcAI/Open-Sora](https://github.com/hpcAI/Open-Sora)为Sora复现版本之一, 其支持不同时长和分辨率的视频生成，并提供模型训练与多任务推理，包括图生视频，视频拼接，视频编辑。
+
+## 2. 环境准备
+
+通过 `git clone` 命令拉取 PaddleMIX 源码，并安装ppdiffusers以及必要的依赖库。请确保你的 PaddlePaddle 框架版本在 2.6.0 之后，PaddlePaddle 框架安装可参考 [飞桨官网-安装](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)。
+
+```bash
+# 克隆 PaddleMIX 仓库
+git clone https://github.com/PaddlePaddle/PaddleMIX
+
+# 安装2.6.1版本的paddlepaddle-gpu，当前我们选择了cuda12.0的版本，可以查看 https://www.paddlepaddle.org.cn/ 寻找自己适合的版本
+python -m pip install paddlepaddle-gpu==2.6.1.post120 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+
+# 进入ppdiffusers目录
+cd PaddleMIX/ppdiffusers
+
+# 安装ppdiffusers，若提示权限不够，请在最后增加 --user 选项
+pip install -e .
+
+# 进入Open-Sora目录
+cd examples/Open-Sora/
+
+# 安装其他所需的依赖, 若提示权限不够，请在最后增加 --user 选项
+pip install -r requirements.txt
+```
+
+## 3. 模型训练
+### 3.1 训练样本数据准备
+在该示例中，提供了少量训练样本用于跑通Open-Sora训练流程，可通过以下方式进行下载，也可根据[hpcaitech/Open-Sora](https://github.com/hpcaitech/Open-Sora/blob/main/docs/data_processing.md)自行准备训练数据。
+```bash
+# Open-Sora训练样本数据下载
+wget https://bj.bcebos.com/paddlenlp/models/community/tsaiyue/OpenSoraData/OpenSoraData.tar.gz
+
+# 文件解压
+tar -xzvf OpenSoraData.tar.gz
+```
+
+### 3.2 单机多卡训练
+训练脚本基于 paddlenlp.trainer 实现，可通过 `--gpus` 指定训练使用的GPU卡号，在多卡环境上支持开启分组切片技术`--sharding`以降低显存占用。
+```bash
+ppdiffusers_path=PaddleMIX/ppdiffusers
+export PYTHONPATH=$ppdiffusers_path:$PYTHONPATH
+python -u -m paddle.distributed.launch --gpus "0,1,2,3" scripts/trainer_opensora.py \
+    --do_train \
+    --output_dir ./exp_output \
+    --save_strategy 'steps' \
+    --save_total_limit 2 \
+    --save_steps 2000 \
+    --per_device_train_batch_size 1 \
+    --gradient_accumulation_steps 1 \
+    --learning_rate 2.0e-5 \
+    --max_steps 30000 \
+    --seed 42 \
+    --sharding "stage1" \
+    --report_to all \
+    --fp16 True \
+    --fp16_opt_level O1
+```
+训练流程相关参数详见 [paddlenlp.trainer](https://github.com/PaddlePaddle/PaddleNLP/blob/a5f69e4543a5371ceb28106b7aa2ea93208620b9/paddlenlp/trainer/training_args.py)，模型与数据相关参数详见 `trainer/trainer_args.py`。开发者可以使用默认参数进行训练，也可以根据需要修改参数。
+
+## 4. 前向推理
+### 4.1 Text to video
+运行以下命令，指定生成视频的帧数、分辨率以及提示词进行视频生成(推理相关参数设置详见`./utils/config_utils.py`)，以下例子提示词可从 `assets/texts`获取，可根据算力条件以生成更长分辨率更大的视频：
+```bash
+ppdiffusers_path=PaddleMIX/ppdiffusers
+export PYTHONPATH=$ppdiffusers_path:$PYTHONPATH
+python scripts/inference.py --prompt "A beautiful sunset over the city" --num-frames 16 --image-size 256 256
+```
+生成效果如下:
+| **16×280×280**     | **16×224×400**        | **16×400×224**      |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/e2730235-e09e-4a65-bf27-604b13535dbd) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/c51c54a9-63a0-4708-99da-ee7fcd017762) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/ab6e32fc-d7e6-448d-bd4a-9b9e72b0b1f0) |
+| A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow [...]        | The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals. [...]    | A majestic beauty of a waterfall cascading down a cliff into a serene lake. The waterfall, with its powerful flow [...]       |
+
+### 4.2 Image as condition
+
+运行以下命令，以图像作为条件进行视频生成：
+```bash
+ppdiffusers_path=PaddleMIX/ppdiffusers
+export PYTHONPATH=$ppdiffusers_path:$PYTHONPATH
+python scripts/inference-long.py --num-frames 20 --image-size 224 300 --sample-name image-cond --prompt 'A breathtaking sunrise scene.{"reference_path": "assets/images/condition/wave.png","mask_strategy": "0"}'
+```
+生成效果如下:
+| **Prompts**     | **Image as condition**        | **20×224×300**      |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| A breathtaking sunrise scene. | ![demo](./assets/images/condition/wave.png) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/9094d9f5-b70d-4f41-91e2-10f37d1c96ba) |
+
+### 4.3 Video connecting
+
+运行以下命令，将首尾帧图像进行拼接，以获取对应视频：
+```bash
+ppdiffusers_path=PaddleMIX/ppdiffusers
+export PYTHONPATH=$ppdiffusers_path:$PYTHONPATH
+python scripts/inference-long.py --num-frames 18 --image-size 224 300 --sample-name connect --prompt 'A breathtaking sunrise scene.{"reference_path": "assets/images/condition/sunset1.png;assets/images/condition/sunset2.png","mask_strategy": "0;0,1,0,-1,1"}'
+```
+生成效果如下:
+| **Prompts**     | **First frame**        | **Last frame**        |  18×224×300      |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------ |------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| A breathtaking sunrise scene. | ![demo](./assets/images/condition/sunset1.png) | ![demo](./assets/images/condition/sunset2.png) |![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/86fe5d88-6622-424e-bea4-95cdacf0888f) |
+
+
+### 4.4  Video extending and editting
+此外支持以视频作为条件进行视频生成，包括视频扩展和视频编辑，运行脚本如下：
+```bash
+ppdiffusers_path=PaddleMIX/ppdiffusers
+export PYTHONPATH=$ppdiffusers_path:$PYTHONPATH
+# video extending
+python scripts/inference-long.py --num-frames 12 --image-size 240 240 --sample-name video_extend  --prompt 'A car driving on the ocean.{"reference_path": "./assets/videos/d0_proc.mp4","mask_strategy": "0,0,0,-6,6"}'
+
+# video editting
+python scripts/inference-long.py --num-frames 16 --image-size 256 256 --sample-name edit --prompt 'A cyberpunk-style car at New York city.{"reference_path": "./assets/videos/d0_proc.mp4","mask_strategy": "0,0,0,0,16,0.4"}'
+```
+
+
+**___Note: 多任务推理相关配置和原理详见[hpcAI/Open-Sora](https://github.com/hpcaitech/Open-Sora/blob/main/docs/config.md#advanced-inference-config)。___**
+
+## 5. 参考资料
+- [Open-Sora](https://github.com/hpcAI/Open-Sora)
diff --git a/ppdiffusers/examples/Open-Sora/assets/images/condition/sunset1.png b/ppdiffusers/examples/Open-Sora/assets/images/condition/sunset1.png
diff --git a/ppdiffusers/examples/Open-Sora/assets/images/condition/sunset2.png b/ppdiffusers/examples/Open-Sora/assets/images/condition/sunset2.png
diff --git a/ppdiffusers/examples/Open-Sora/assets/images/condition/wave.png b/ppdiffusers/examples/Open-Sora/assets/images/condition/wave.png
diff --git a/ppdiffusers/examples/Open-Sora/assets/texts/t2v_samples.txt b/ppdiffusers/examples/Open-Sora/assets/texts/t2v_samples.txt
@@ -0,0 +1,10 @@
+A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea. Seabirds can be seen taking flight around the cliff's precipices. As the drone slowly moves from different angles, the changing sunlight casts shifting shadows that highlight the rugged textures of the cliff and the surrounding calm sea. The water gently laps at the rock base and the greenery that clings to the top of the cliff, and the scene gives a sense of peaceful isolation at the fringes of the ocean. The video captures the essence of pristine natural beauty untouched by human structures.
+A majestic beauty of a waterfall cascading down a cliff into a serene lake. The waterfall, with its powerful flow, is the central focus of the video. The surrounding landscape is lush and green, with trees and foliage adding to the natural beauty of the scene. The camera angle provides a bird's eye view of the waterfall, allowing viewers to appreciate the full height and grandeur of the waterfall. The video is a stunning representation of nature's power and beauty.
+A vibrant underwater scene. A group of blue fish, with yellow fins, are swimming around a coral reef. The coral reef is a mix of brown and green, providing a natural habitat for the fish. The water is a deep blue, indicating a depth of around 30 feet. The fish are swimming in a circular pattern around the coral reef, indicating a sense of motion and activity. The overall scene is a beautiful representation of marine life.
+The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature's beauty and the simple joy of a sunny day in the countryside.
+A vibrant scene of a snowy mountain landscape. The sky is filled with a multitude of colorful hot air balloons, each floating at different heights, creating a dynamic and lively atmosphere. The balloons are scattered across the sky, some closer to the viewer, others further away, adding depth to the scene.  Below, the mountainous terrain is blanketed in a thick layer of snow, with a few patches of bare earth visible here and there. The snow-covered mountains provide a stark contrast to the colorful balloons, enhancing the visual appeal of the scene.  In the foreground, a few cars can be seen driving along a winding road that cuts through the mountains. The cars are small compared to the vastness of the landscape, emphasizing the grandeur of the surroundings.  The overall style of the video is a mix of adventure and tranquility, with the hot air balloons adding a touch of whimsy to the otherwise serene mountain landscape. The video is likely shot during the day, as the lighting is bright and even, casting soft shadows on the snow-covered mountains.
+A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell, is the main focus of the video, swimming gracefully towards the right side of the frame. The coral reef, teeming with life, is visible in the background, providing a vibrant and colorful backdrop to the turtle's journey. Several small fish, darting around the turtle, add a sense of movement and dynamism to the scene. The video is shot from a slightly elevated angle, providing a comprehensive view of the turtle's surroundings. The overall style of the video is calm and peaceful, capturing the beauty and tranquility of the underwater world.
+A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. The scene is a blur of motion, with cars speeding by and pedestrians navigating the crosswalks. The cityscape is a mix of towering buildings and illuminated signs, creating a vibrant and dynamic atmosphere. The perspective of the video is from a high angle, providing a bird's eye view of the street and its surroundings. The overall style of the video is dynamic and energetic, capturing the essence of urban life at night.
+A snowy forest landscape with a dirt road running through it. The road is flanked by trees covered in snow, and the ground is also covered in snow. The sun is shining, creating a bright and serene atmosphere. The road appears to be empty, and there are no people or animals visible in the video. The style of the video is a natural landscape shot, with a focus on the beauty of the snowy forest and the peacefulness of the road.
+The dynamic movement of tall, wispy grasses swaying in the wind. The sky above is filled with clouds, creating a dramatic backdrop. The sunlight pierces through the clouds, casting a warm glow on the scene. The grasses are a mix of green and brown, indicating a change in seasons. The overall style of the video is naturalistic, capturing the beauty of the landscape in a realistic manner. The focus is on the grasses and their movement, with the sky serving as a secondary element. The video does not contain any human or animal elements.
+A serene night scene in a forested area. The first frame shows a tranquil lake reflecting the star-filled sky above. The second frame reveals a beautiful sunset, casting a warm glow over the landscape. The third frame showcases the night sky, filled with stars and a vibrant Milky Way galaxy. The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. The style of the video is naturalistic, emphasizing the beauty of the night sky and the peacefulness of the forest.
diff --git a/ppdiffusers/examples/Open-Sora/assets/videos/d0_proc.mp4 b/ppdiffusers/examples/Open-Sora/assets/videos/d0_proc.mp4
diff --git a/ppdiffusers/examples/Open-Sora/dataset/__init__.py b/ppdiffusers/examples/Open-Sora/dataset/__init__.py
@@ -0,0 +1,16 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .datasets import IMG_FPS, VariableVideoTextDataset, VideoTextDataset
+from .utils import get_transforms_image, get_transforms_video, save_sample