Hunyuan video: inference and train #831

wtomin · 2025-01-23T03:33:40Z

What does this PR do?

HunyuanVideo:

support hunyuan video inference with text embeding;
support hunyuan video text encoder inference and text embedding cache;
support hunyuan video t2v training with ZeRO3 and data parallelism;

Fixes # (issue)

Adds # (feature)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

…ab#807)

* fix config names * add mem monitor * update * update * debug tae attn * update * useless change * continue work from llama3_movie_pr_20241029 * add parallel test case for scheduler and fix some minor bug * add train script * move config file outside the folder * temp save * change some ops to mint * add init for text projector * fix mint * fix type * encoder ok * add image support to OS data loader * update convert script * add recompute support in PyNative * add dataloader * update train script * add OSv1.2 VAE * fixes * reconstruct tested * update readme * discard spurious frames * rename * add train * add train config * rename * rename * add dataset * trainable * add inference * fix opl loss * z 16 * fix linear-quadratic sampling * text encoders inference * allow loading sd3.5 vae pretrained weights * update convert script * add sd3 vae * add moduels for sd3 vae * update configs * temporal median init, 1p train psnr ok * add files * fix rt id * set image and crop size * add train step mode * replace interpolate for bf16 support * add validation support * add ReduceLROnPlateau * save top K checkpoints * add drop text conditioning for training * fix eval loss calculation * add model parallel * hack for model parallel * fix hack * small fixes * add temporal tile * rm comments * clean code * draft readme and update decode * add config * add readme draft * add TAE to Movie Gen * add buckets and dynamic graph support * fix dynamic shape: defualt manual pad for conv1d same pad * fix save callback and TAE scaling * Revert "fix hack" This reverts commit bf505d4. * Revert "hack for model parallel" This reverts commit 8af7437. * revert it later * small fixes * refactoring * linting * add docs * refactor TAE add latents generation other small changes * fix training with TAE latents * revert changes to OpenSora * merge with PR mindspore-lab#778 * small fix * PR fixes: - remove forced dynamic memory allocation for data transformations - purge Model Parallel functionality until it's fully tested * Update docs * Update docs * update docs and small fixes * fix TAE encoding * PR fixes: - remove unrelated code changes - update docs * small inference fix * enable `lazy_inline` enable jit_level `O2` support * small fix * small fix * enable flexible recompute * enable flexible recompute * - add train resume feature - preserve image / video orientation in transformations * ResizeCrop fix * update docs * support SP and change rms to ops.rms * Gradio demo for MovieGen (#6) * update docs and add stage 3 configs * add ZeRO-3 support to Movie Gen * add Model Parallel * add technical report * update technical report * linting * add inference without TAE and stand-alone decoding * Drop Model Parallel * improve SP support * fix checkpoint saving * fix checkpoint saving * align with PR#778 * update README * resolve comments * fix SP * small fixes and update README.md * add TAE download link * update README.md * fix imports and update README.md * update README.md * fix dynamic graph training with TAE * update configs * update configs * turn off EMA until it's fixed --------- Co-authored-by: Samit <285365963@qq.com> Co-authored-by: Mike Cheung <zhtmike@gmail.com> Co-authored-by: Nguyen Truong Hai <47595486+itruonghai@users.noreply.github.com>

SamitHuang · 2025-02-07T01:58:25Z

suggestion for better code style and clarity

move executable scripts (run_xx.py, eval/xx.py) from src folder to the root folder or script folder
allow parsing hyper-parameters from yaml in training

SamitHuang added 7 commits January 9, 2025 00:47

add dual stream blocks, pynative run

07c9aaf

dual stream fp32 ok

a1f4d6f

fix comments

8c7ee79

add single stream

5dfadcf

add token refiner

7af1f1c

add hyvtransformer

e986726

add files

62ac3ba

wtomin requested review from vigo999, CaitinZhao, SamitHuang and zhanghuiyao as code owners January 23, 2025 03:33

wtomin changed the title ~~[Draft] Hunyuan video: inference and train~~ Hunyuan video: inference and train Jan 24, 2025

wtomin force-pushed the hunyuan-video branch from e3e6baa to 75f0a25 Compare January 24, 2025 02:16

SamitHuang and others added 17 commits January 25, 2025 00:11

fix dtype

1da59c0

add conv2d patchify

d57a4d9

feat(diffusers/docs): update docs of diffusers to 0.30.3 (mindspore-l…

727d97e

…ab#807)

vae init

c682b3f

vae ckpt conversion and vae load

56c77e3

run vae inference

41ee242

readme

3beb06b

fix vae infer error

b529f8b

vae load

a105457

gn replacement

75de4c7

causalconv3d has_bias=True

ea0572f

fix pad

c369126

fix vae

bb17054

fix vae

6f678d3

fix vae

5b87521

fix vae

dbdc65f

SamitHuang and others added 22 commits January 26, 2025 15:17

resolve conflicts

cb233c7

update remadme

1058b9a

allow get text emb for a prompt

428f4da

llama uses FA by default, fix t2v w/ text emb cache

e1497be

support t2v infer w/o text emb (544p 25f); vae half param

09c112e

complete t2v infer w/o text emb cache (720p 129f) in ms2.4.1 graph mode

4723c0f

update preprocess utils file

f12d7fb

no gpu device used

91b1df2

load from mindspore checkpoint

8bdb12b

fix error

c444d76

remove network prefix

ad27014

fix conv2d patchify error

2f78f2f

allow parameter shard saving in zero3 stage

ff2efa1

update init_env

d55fe51

run parallel inference

194fc1e

run parallel inference script

e016c64

fix typos

8e16baa

save for rank 0 only

f4810f3

embed guidance scale

b595c91

default train cfgdistill

60928f8

jit syntax default lax

67206c9

new text encoder support

64d449f

wtomin force-pushed the hunyuan-video branch from 3729fc1 to 64d449f Compare February 5, 2025 15:16

sample with multiple text

3341f80

wtomin added 2 commits February 7, 2025 10:34

remove rf scheduler

07701a5

RF trainer

c77171c

wtomin force-pushed the hunyuan-video branch from def6ef0 to c77171c Compare February 7, 2025 03:27

wtomin added 2 commits February 7, 2025 11:31

remove snr & noise_offset argument

d5e3999

fix errors

8ec932f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hunyuan video: inference and train #831

Hunyuan video: inference and train #831

wtomin commented Jan 23, 2025 •

edited

Loading

SamitHuang commented Feb 7, 2025 •

edited

Loading

Hunyuan video: inference and train #831

Are you sure you want to change the base?

Hunyuan video: inference and train #831

Conversation

wtomin commented Jan 23, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

SamitHuang commented Feb 7, 2025 • edited Loading

wtomin commented Jan 23, 2025 •

edited

Loading

SamitHuang commented Feb 7, 2025 •

edited

Loading