-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] OSS: Support nvidia's LARC #81
Comments
SGTM But if you are using LARC to wrap Adam then you may want to consider using the fused implementation of LAMB available in apex. It is implemented as a fused multi-tensor CUDA kernel so will run orders of magnitude faster than LARC, implemented in python, wrapping Adam. If you are wrapping something other than Adam, you may obtain a noticeable speedup implementing a fused multi-tensor kernel. Something like this could be added to fairscale. |
The need for now (classy/vissl) is mostly around SGD, using this as a wrap, which breaks with the |
Just to add a data point - I trained a RegNetY 128GF model on 8 nodes using FusedSGD and didn't notice any significant speed up. |
🚀 Feature
Make it possible to support LARC with OSS
Motivation
LARC is a must have for large batch jobs, right now OSS will break on LARC because of the closure() being passed
Pitch
Should be doable to gracefully handle optimizers with do not support closures in step()
Alternatives
Not supporting LARC, reduces a lot of OSS interest
Additional context
cc @mannatsingh @prigoyal @msbaines
The text was updated successfully, but these errors were encountered: