discuss whether it worked or didn't work #1
Replies: 18 comments 50 replies
-
https://crumbly.medium.com/small-scale-home-evaluation-of-googles-new-optimizer-lion-77115ba8a1a4 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
this doesn't look great https://github.com/sinpcw/showcase-optimizer |
Beta Was this translation helpful? Give feedback.
-
negative result from RL https://twitter.com/kaixhin/status/1626772629796564992 |
Beta Was this translation helpful? Give feedback.
-
slightly worse than Adam https://twitter.com/kyo_takano/status/1627147339143200768 |
Beta Was this translation helpful? Give feedback.
-
better than Adam at extreme batch sizes (16k) at open clip mlfoundations/open_clip#432 |
Beta Was this translation helpful? Give feedback.
-
I'm currently investigating Lion for an object detection problem I'm working on. Unfortunately, I'm running into a lot of problems finding a good learning rate that works throughout the whole run. Seems for whatever reason it's prone to NaNing with our model/dataset when scaled up to what we used for training production models. Currently in the process of confirming if some other changes are the cause. I'll post updates here. For reference, here's the model and a brief description of the dataset (which is proprietary): |
Beta Was this translation helpful? Give feedback.
-
positive result from someone really prominent in the text-to-image field. large batch sizes (1024), learning rate / 10, weight decay kept the same |
Beta Was this translation helpful? Give feedback.
-
Tried Lion with a proprietary OCR model, roughly similar to PP-OCRv3 (https://arxiv.org/abs/2206.03001). Compared to AdamW I got much slower convergence to a lower accuracy with learning rate / 3, weight decay * 3. With the same learning rate and weight decay I get NaN results. Batch size 2048. |
Beta Was this translation helpful? Give feedback.
-
negative result for LM https://twitter.com/VHellendoorn/status/1630737104975085568 edit: now positive result! https://twitter.com/VHellendoorn/status/1631349009473478656?s=20 |
Beta Was this translation helpful? Give feedback.
-
positive result, 3x faster for vision transformer fine tuning https://twitter.com/Haoxiang__Wang/status/1631355469439590412?s=20 |
Beta Was this translation helpful? Give feedback.
-
yet another positive result for training LLM from a really good researcher, however he also told me fine tuning was not as good (albeit the based model was trained with Adam, not sure if that makes any difference) |
Beta Was this translation helpful? Give feedback.
-
Saw this, could be relevant to Lion: https://openreview.net/pdf?id=a65YK0cqH8g |
Beta Was this translation helpful? Give feedback.
-
Hello, I used megatron to train GPT-2 with 16B parameter number, but when the learning rate increased to 3e-6, the gradient overflow led to the failure of convergence. May I ask why? My hyperparameter: |
Beta Was this translation helpful? Give feedback.
-
Seeing much better results in my
Results: https://huggingface.co/lapp0/distily_bench_gpt2_optim_extended2/tensorboard |
Beta Was this translation helpful? Give feedback.
-
share positive or negative results here
please leave the batch size and other hyperparameters
please also leave the learning rate scheduler, if one was used
Beta Was this translation helpful? Give feedback.
All reactions