Mixed precision training #2291

y-akbal · 2023-07-16T05:24:53Z

Motivation and description

Just wondering if there is a way to do mixed precision training in Flux?

Possible Implementation

No response

mcabbott · 2023-07-17T15:26:09Z

With the new-style training, I think this should basically just work.

m16 = f16(m32) makes a low-precision copy of the model, you can use that to compute the gradient g16, and then update!(opt_state, m32, g16) will apply this change to the original model.

Although not all operation support Float16, e.g. I'm not sure about convolutions. Maybe there are other un-anticipated problems.

It would be super-nice to have an example of this, e.g. a model zoo page which uses it.

CarloLucibello · 2023-07-26T07:48:32Z

In FluxML/Optimisers.jl#152 I introduce an optimiser handling behind the curtains what @mcabbott said

mcabbott · 2023-07-26T13:29:42Z

Xref this example of trying this out:

https://discourse.julialang.org/t/mix-mode-training-of-large-languages-models-in-julia/102090

y-akbal · 2023-07-26T15:59:54Z

Great, thank you!

mcabbott added optimisers-dot-jl float16 labels Jul 17, 2023

CarloLucibello mentioned this issue Jul 26, 2023

Mixed precision training. #543

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed precision training #2291

Mixed precision training #2291

y-akbal commented Jul 16, 2023 •

edited

Loading

mcabbott commented Jul 17, 2023

CarloLucibello commented Jul 26, 2023

mcabbott commented Jul 26, 2023

y-akbal commented Jul 26, 2023

Mixed precision training #2291

Mixed precision training #2291

Comments

y-akbal commented Jul 16, 2023 • edited Loading

Motivation and description

Possible Implementation

mcabbott commented Jul 17, 2023

CarloLucibello commented Jul 26, 2023

mcabbott commented Jul 26, 2023

y-akbal commented Jul 26, 2023

y-akbal commented Jul 16, 2023 •

edited

Loading