Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dango/timesteps fix #1768

Merged
merged 3 commits into from
Nov 7, 2024
Merged

Dango/timesteps fix #1768

merged 3 commits into from
Nov 7, 2024

Conversation

Dango233
Copy link

@Dango233 Dango233 commented Nov 7, 2024

  • Remove diffusers dependency in ts & sigma calc
  • Support Shift Setting
  • Support timesteps range setting
  • Add uniform distribution
  • Default to Uniform distribution and shift 1

* Remove diffusers dependency in ts & sigma calc
* support Shift setting
* Add uniform distribution
* Default to Uniform distribution and shift 1
@Dango233
Copy link
Author

Dango233 commented Nov 7, 2024

With default setting, training should catch patterns/details much quicker and reduce overfitting on early/mid timesteps

@kohya-ss kohya-ss merged commit 588ea9e into kohya-ss:sd3 Nov 7, 2024
1 check failed
@kohya-ss
Copy link
Owner

kohya-ss commented Nov 7, 2024

Thank you for this!

@kohya-ss kohya-ss mentioned this pull request Nov 7, 2024
25 tasks
indices = (u * (t_max-t_min) + t_min).long()
timesteps = indices.to(device=device, dtype=dtype)

# sigmas according to dlowmatching
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flowmatching*

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be fixed in bafd10d :)

@bghira
Copy link

bghira commented Nov 7, 2024

@Dango233 you guys are seeing better results than the normal flux schedule of sigmoid sampling?

@Dango233
Copy link
Author

Dango233 commented Nov 7, 2024 via email

@bghira
Copy link

bghira commented Nov 7, 2024

does this scale with batch size such that around 2048 we really want to weight sampling or is uniform alright then as well?

the explanation about early structure during pretraining does make sense.

@Dango233
Copy link
Author

Dango233 commented Nov 7, 2024 via email

@dsienra
Copy link

dsienra commented Nov 8, 2024

I'm having problems to set the LR for the TE and the unet independently, it trains the TE with the same LR I set to the unet.

That's is in my config file

"learning_rate": 1e-06,
"learning_rate_te": 4e-06,
"learning_rate_te1": 4e-06,
"learning_rate_te2": 4e-06,

Additional parameters: --fused_backward_pass --use_t5xxl_cache_only --train_text_encoder

I think the problem started after the last update but I'm not sure,

@kohya-ss
Copy link
Owner

kohya-ss commented Nov 9, 2024

I think the problem started after the last update but I'm not sure,

This PR is not related to the learning rates. I will check it sooner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants