Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hunyuan video: inference and train #831

Open
wants to merge 137 commits into
base: master
Choose a base branch
from

Conversation

wtomin
Copy link
Collaborator

@wtomin wtomin commented Jan 23, 2025

What does this PR do?

HunyuanVideo:

  • support hunyuan video inference with text embeding;
  • support hunyuan video text encoder inference and text embedding cache;
  • support hunyuan video t2v training with ZeRO3 and data parallelism;

Fixes # (issue)

Adds # (feature)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

@wtomin wtomin changed the title [Draft] Hunyuan video: inference and train Hunyuan video: inference and train Jan 24, 2025
SamitHuang and others added 17 commits January 25, 2025 00:11
* fix config names

* add mem monitor

* update

* update

* debug tae attn

* update

* useless change

* continue work from llama3_movie_pr_20241029

* add parallel test case for scheduler and fix some minor bug

* add train script

* move config file outside the folder

* temp save

* change some ops to mint

* add init for text projector

* fix mint

* fix type

* encoder ok

* add image support to OS data loader

* update convert script

* add recompute support in PyNative

* add dataloader

* update train script

* add OSv1.2 VAE

* fixes

* reconstruct tested

* update readme

* discard spurious frames

* rename

* add train

* add train config

* rename

* rename

* add dataset

* trainable

* add inference

* fix opl loss

* z 16

* fix linear-quadratic sampling

* text encoders inference

* allow loading sd3.5 vae pretrained weights

* update convert script

* add sd3 vae

* add moduels for sd3 vae

* update configs

* temporal median init, 1p train psnr ok

* add files

* fix rt id

* set image and crop size

* add train step mode

* replace interpolate for bf16 support

* add validation support

* add ReduceLROnPlateau

* save top K checkpoints

* add drop text conditioning for training

* fix eval loss calculation

* add model parallel

* hack for model parallel

* fix hack

* small fixes

* add temporal tile

* rm comments

* clean code

* draft readme and update decode

* add config

* add readme draft

* add TAE to Movie Gen

* add buckets and dynamic graph support

* fix dynamic shape: defualt manual pad for conv1d same pad

* fix save callback and TAE scaling

* Revert "fix hack"

This reverts commit bf505d4.

* Revert "hack for model parallel"

This reverts commit 8af7437.

* revert it later

* small fixes

* refactoring

* linting

* add docs

* refactor TAE
add latents generation
other small changes

* fix training with TAE latents

* revert changes to OpenSora

* merge with PR mindspore-lab#778

* small fix

* PR fixes:
- remove forced dynamic memory allocation for data transformations
- purge Model Parallel functionality until it's fully tested

* Update docs

* Update docs

* update docs and small fixes

* fix TAE encoding

* PR fixes:
- remove unrelated code changes
- update docs

* small inference fix

* enable `lazy_inline`
enable jit_level `O2` support

* small fix

* small fix

* enable flexible recompute

* enable flexible recompute

* - add train resume feature
- preserve image / video orientation in transformations

* ResizeCrop fix

* update docs

* support SP and change rms to ops.rms

* Gradio demo for MovieGen (#6)

* update docs and add stage 3 configs

* add ZeRO-3 support to Movie Gen

* add Model Parallel

* add technical report

* update technical report

* linting

* add inference without TAE and stand-alone decoding

* Drop Model Parallel

* improve SP support

* fix checkpoint saving

* fix checkpoint saving

* align with PR#778

* update README

* resolve comments

* fix SP

* small fixes and update README.md

* add TAE download link

* update README.md

* fix imports and update README.md

* update README.md

* fix dynamic graph training with TAE

* update configs

* update configs

* turn off EMA until it's fixed

---------

Co-authored-by: Samit <285365963@qq.com>
Co-authored-by: Mike Cheung <zhtmike@gmail.com>
Co-authored-by: Nguyen Truong Hai <47595486+itruonghai@users.noreply.github.com>
@SamitHuang
Copy link
Collaborator

SamitHuang commented Feb 7, 2025

suggestion for better code style and clarity

  1. move executable scripts (run_xx.py, eval/xx.py) from src folder to the root folder or script folder
  2. allow parsing hyper-parameters from yaml in training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants