[WIP] Integrates JorgeKFAC with diffusion model training script #3

keshprad · 2025-01-07T07:30:53Z

No description provided.

…ved diffusion

keshprad · 2025-01-07T07:41:52Z

I still am having issues with sampling like I shared on slack. This is both w/ adam and jorge
Not sure what to look for to verify correctness in the implementation...
- General trend is that I see decreasing loss/mse; however the loss_sampled is around 0.9-1 throughout.

keshprad · 2025-01-07T07:33:13Z

improved_diffusion/respace.py

+                terms["loss_sampled"] = mean_flat((y_sampled - model_output) ** 2)
+            if "vb" in terms:
+                terms["loss"] = terms["mse"] + terms["vb"]
+                # TODO: Should terms["vb"] be added to terms["loss_sampled"]?


Should terms["vb"] be added to terms["loss_sampled"]?

Most likely "vb" shouldn't be in terms. Can you check and let me know?

improved_diffusion/train_util_jorge.py

keshprad · 2025-01-07T07:49:36Z

improved_diffusion/train_util_jorge.py

+                loss_sampled = (losses["loss_sampled"] * weights).mean()
+                loss_sampled.backward(retain_graph=True)
+            self.opt.acc_stats = False
+            self.opt.zero_grad() # clear the gradient for computing true-fisher

            if isinstance(self.schedule_sampler, LossAwareSampler):
                self.schedule_sampler.update_with_local_losses(


I don't understand what L233 (self.schedule_sampler.update_with_local_losses) does. Should anything here be modified to do similar for losses["loss_sampled"]?

I don't think we need to use the proxy loss (i.e. loss_sampled) anywhere in this function. This can remain untouched.

initial implementation to integrate jorge optimizer with OpenAI impro…

975f2b9

…ved diffusion

keshprad commented Jan 7, 2025

View reviewed changes

keshprad marked this pull request as draft January 8, 2025 02:20

keshprad added 4 commits January 8, 2025 01:16

use loss scaling for fp16 precision

fd8958e

include sigma_kfac as param

edbbca3

update image train to use floating point default sigma_kfac

bd0cc99

distributed shampoo implementation for diffusion training

0be8c64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Integrates JorgeKFAC with diffusion model training script #3

[WIP] Integrates JorgeKFAC with diffusion model training script #3

keshprad commented Jan 7, 2025

keshprad commented Jan 7, 2025 •

edited

Loading

keshprad Jan 7, 2025

siddharth9820 Jan 10, 2025

keshprad Jan 7, 2025 •

edited

Loading

siddharth9820 Jan 10, 2025

[WIP] Integrates JorgeKFAC with diffusion model training script #3

Are you sure you want to change the base?

[WIP] Integrates JorgeKFAC with diffusion model training script #3

Conversation

keshprad commented Jan 7, 2025

keshprad commented Jan 7, 2025 • edited Loading

keshprad Jan 7, 2025

Choose a reason for hiding this comment

siddharth9820 Jan 10, 2025

Choose a reason for hiding this comment

keshprad Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

siddharth9820 Jan 10, 2025

Choose a reason for hiding this comment

keshprad commented Jan 7, 2025 •

edited

Loading

keshprad Jan 7, 2025 •

edited

Loading