-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); #9708
Conversation
Looking forward to this @lawrence-cj! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. But to make progress on this PR:
- We need to try to eliminate unnecessary dependencies like
omegaconf
. - Follow how we implement Autoencoders in
diffusers
. Example: https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/autoencoders/autoencoder_kl.py - Try to reuse the blocks as done in https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/autoencoders/autoencoder_kl.py.
- If we cannot reuse blocks, okay to define them in the modeling file. However, we need to keep things to native
torch
only, for now. - All major model classes like
AutoencoderKL
should inherit fromModelMixin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two comments.
# Conflicts: # src/diffusers/models/normalization.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work.
I have left some comments but let's wait for @yiyixuxu's comments, as well before making any changes.
Yiyi, this autoencoder is going to be crucial to support efficient models like SANA: https://arxiv.org/abs/2410.10629 (which will land after this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR!
I left some comments to start with, let me know if you have any questions:)
@lawrence-cj @a-r-r-o-w @yiyixuxu @sayakpaul I have double checked this PR and made some minor modifications. And I will upload the converted weights soon. Could you please check if the modifications still meet the requirements of diffusers? If so, I think this PR is ready to merge. Thank you all for the efforts! |
@chenjy2003 Thanks, the changes look great and the outputs are still the same! I simplified those branches since none of the current checkpoints seemed to use them, but still good to have. Will merge this PR once you give us the go regarding the diffusers-format checkpoints |
Hi @a-r-r-o-w, all the converted weights are uploaded. Thanks! |
@@ -92,6 +97,7 @@ | |||
"double_blocks.0.img_attn.norm.key_norm.scale", | |||
"model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale", | |||
], | |||
"autoencoder_dc": "decoder.stages.0.op_list.0.main.conv.conv.weight", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to infer the model repo type using this key right? That still has to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, missed it. Adding now, but not sure how this worked before then 🤔
@@ -2198,3 +2204,250 @@ def swap_scale_shift(weight): | |||
) | |||
|
|||
return converted_state_dict | |||
|
|||
|
|||
def create_autoencoder_dc_config_from_original(original_config, checkpoint, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for new single file models let's not rely on the original configs anymore. This was for legacy support for the SD1.5/XL models with yaml configs. It's better to infer the diffusers config from the checkpoint and use that for loading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a little difficult here, so please lmk if you have any suggestions on what to do.
Some DCAE checkpoints have the exact same structure and configuration, except for scaling_factor
. For example, dc-ae-f128c512-in-1.0-diffusers
and dc-ae-f128c512-mix-1.0-diffusers
` only differ in their scaling factor.
I'm unsure how we would determine this just by the model structure. Do we rely on the user passing it as a config correctly, and document this info somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine since in the snippet in the docs, we're doing the same thing just with original_config
instead of config
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated usage to config
now and verified that it works. Thank you for the fixes and suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding docs!
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
@lawrence-cj @chenjy2003 We have removed support for loading original-format Autoencoder because of some complications in this PR. @DN6 will take it up soon to add support correctly. Sorry for the delay! Just doing some final cleanup and will merge after |
What does this PR do?
This PR will add the official DC-AE (Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models) into the
diffusers
lib. DC-AE first makes the Autoencoder is able to compress images into 32x, 64x, and 128x latent space without performance degradation. It's also an AE used by the powerful T2I base model SANAPaper: https://arxiv.org/abs/2410.10733v1
Original code repo: https://github.com/mit-han-lab/efficientvit/tree/master/applications/dc_ae
Core contributor of DC-AE:
work with @chenjy2003
Core library:
We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul