what does it mean "Using a value of 1.0 seems to preserve the CFG distillation for the Dev model" in args description #893

rardz · 2024-08-27T17:45:57Z

rardz
Aug 27, 2024

I suspect that flux dev probably underwent a process similar to the distillation in latent consistency models, so it should have been completed within a certain guidance scale range. This is likely the original intention behind simpletuner providing the 'random-range' mode.
However, when using the constant mode, a specific flux_guidance_value needs to be determined. The author suggests in the explanation: 'Using a value of 1.0 seems to preserve the CFG distillation for the Dev model.' Since flux dev is a distilled version, it actually only has some conditional generation capabilities within a certain range, so what does that mean “preserve the CFG distillation”？
Maybe I got sth wrong with the guidance distillation stuff. Anyone care to ELI5?

person4268 · 2024-08-28T01:43:40Z

person4268
Aug 28, 2024

I'm not particularly knowledgeable on this, but I read here that more or less, it's relatively bad to train with a guidance scale because the model's expecting the teacher model's cfg'd noise predictions during training, so training with a guidance scale destroys said functionality and now you have to use cfg. (i could very much be subtly or majorly wrong in my understanding here)

Training against guidance_scale=1, while probably out of range during training, at least has the effect of removing cfg, and (i think) the teacher model out of the equation, so it seemingly makes more semantic sense for our training setup without flux pro.

I do find it interesting that the author of that Medium article is claiming that training against an "undistilled-via-training" flux dev produces better results for loras, wonder if anyone could corroborate, he does provide the model he trained against.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what does it mean "Using a value of 1.0 seems to preserve the CFG distillation for the Dev model" in args description #893

{{title}}

Replies: 1 comment

{{title}}

Select a reply

what does it mean "Using a value of 1.0 seems to preserve the CFG distillation for the Dev model" in args description #893

rardz Aug 27, 2024

Replies: 1 comment

person4268 Aug 28, 2024

rardz
Aug 27, 2024

person4268
Aug 28, 2024