LRFinder w/ Gradient Accumulation #8

rsomani95 · 2019-11-13T04:53:45Z

Great package! Thank you for sharing :)

I was wondering if you plan on adding gradient accumulation support for using LRFinder with a larger batch size.
Will you be adding mixed precision support?

The text was updated successfully, but these errors were encountered:

davidtvs · 2019-11-16T20:20:25Z

As of right now, I'm not planning on developing the package further. But I'll gladly review PR's

NaleRaphael · 2019-11-25T15:12:02Z

Hi, @rsomani95 , I've implement a version of LRFinder with gradient accumulation (see also PR #9 ).

Before the PR is merged, you can clone that version of implementation from my forked repository to do any test you want.

$ git clone -b grad_acc_amp --single-branch https://github.com/NaleRaphael/pytorch-lr-finder
$ cd pytorch-lr-finder
$ pip install .

And please feel free to let me know if there is anything need to be improved!

rsomani95 · 2019-11-26T02:05:41Z

@NaleRaphael this looks great! Give me a few days to get back to you with some feedback.

NaleRaphael · 2019-12-01T16:04:12Z

Hi, @rsomani95 !

apex has been integrated for mixed precision training, and thanks for the new version of apex APIs, there is nothing have to be changed while calling LRFinder.
To use LRFinder for mixed precision training, we just need to setup things by amp.initialize(...).
Examples for the usage is added in examples/lrfinder_mnist_amp.ipynb.

However, I ran into a little problem about this: the time of mixed precision training takes longer than training in FP32.
After searching some articles and github issues, I found this post: NVIDIA/apex - Mixed precision training slower than FP32 training.
It seems likely that it is the cause. Because I'm using a GTX 1660ti, and there is no tensor cores in it...

I'll try to validate it in the next few days. ;)

NaleRaphael · 2019-12-10T13:53:55Z

I've run more tests on my machine. And I found that we can set the flag torch.backends.cudnn.benchmark to True to improve the performance.
But it still takes a bit longer time to train in mixed precision (opt_level = "O1") than train in pure FP32. And it seems likely it's the limit of GTX 1660ti.

And I wrote a script for testing the performance of LRFinder with apex.amp:
https://gist.github.com/NaleRaphael/eda9d3f90aa57cf1f6b2ccdfe4217814

Here is the table of the result after I run that script with different conditions:

case	time (seconds)	time (seconds; with `torch.backends.cudnn.benchmark = True`)
normal (FP32)	3.6152	3.5799
amp (FP16, `opt_level="O1"`)	18.7100	4.0301
amp (FP16, `opt_level="O2"`)	18.0375	3.2404

Besides, I've create a notebook in colab. But the GPU used by colab is K80, and there is no tensor cores in it. Although the performance is not benefited by tensor cores, it seems that LRFinder is still stable enough to run with apex.amp.
https://colab.research.google.com/drive/1BhWYtLFOa24wisNckt9i6rQhBKurVWWV

rsomani95 · 2019-12-20T05:37:32Z

Hello @NaleRaphael.

This is great work!! Thank you for sharing it.

AccumulationLRFinder works smoothly, does what its supposed to do. I appreciate how easy it is to do with your PR

In my experiments, I took used an RTX 2080 Ti, so I expected performance gains with FP16 (opt_level="01").

With FP32, it took ~18:47, whereas with FP16, it took ~15:00.
Strangely, setting torch.backends.cudnn.benchmark = True was detrimental to performance, ETA was ~25:00 (I didn't see this through, for obvious reasons).

Thank you again for your time and effort!
@davidtvs In my opinion, I think this PR should be merged (if not, I will be using @NaleRaphael's fork anyways)

NaleRaphael · 2019-12-20T16:33:58Z

Hi @rsomani95 .
Many thanks for your help and feedback, and I'm glad that these implementation helped!

And it's quite weird that it takes longer time to run when torch.backends.cudnn.benchmark = True. As far as I know, that flag should accelerate training speed when input size is fixed in each iteration.

However, it seems to me that it's not harmful to pend the issue about torch.backends.cudnn.benchmark currently. Because it's not related to LRFinder directly and the use of it depends on user. Though, I'll keep it in mind!

Besides, it seems that apex is going to be integrated as a builtin component of PyTorch in the future. (nvidai/apex#659) I will keep tracking this, too.

@davidtvs Before merging this PR, I would like to add some code to make users able to install apex optionally. I'll leave a comment here when it's done.

Thanks you, guys!

davidtvs · 2019-12-20T17:37:41Z

Sounds good, I'll wait for your changes and then merge. Thanks

NaleRaphael · 2019-12-20T19:19:46Z

Hi @davidtvs .
Changes for installation scripts are done, and I've tested it by the following command on both Ubuntu and Windows, all worked fine!

$ git clone -b grad_acc_amp --single-branch https://github.com/NaleRaphael/pytorch-lr-finder
$ cd pytorch-lr-finder
$ pip install -v --global-option="amp" ./

After the latest version of this package is updated on PyPI, the command pip install torch-lr-finder -v --global-option="amp" should work too.

Thanks a lot for your review!

* UPDATE: implement a new LRFinder with the support of gradient accumulation `AccumulationLRFinder` is a learning rate finder implemented with the mechanism of gradient accumulation. Besides, the iterator used for getting batch of data for training is now replaced by `DataLoaderIterWrapper` to simplify the code and make the implementation of `AccumulationLRFinder` easiler. And the input parameter ofs `LRFinder._train_batch()` is also modified for the same reason mentioned above. * UPDATE: add support for mixed precision training (#8) * UPDATE: add requirements for mixed precision training and update README * MAINT: improve the compatibility for Python 2 and some minor fixes - add `next = __next__` in the class `DataLoaderIterWrapper` - call `logging.basicConfig()` before getting a logger, see also: https://docs.python.org/2.7/library/logging.html#logging.log - Add more information about installing this package for users who need to use it with mixed precision training

davidtvs · 2019-12-23T17:30:33Z

The PR from @NaleRaphael is merged. Thanks @rsomani95 for raising the issue.

davidtvs · 2020-01-05T15:36:02Z

I'm considering changing the API for gradient accumulation, please have a look at PR #13 and give your feedback

NaleRaphael mentioned this issue Nov 25, 2019

Support for gradient accumulation and mixed precision training #9

Merged

NaleRaphael added a commit to NaleRaphael/pytorch-lr-finder that referenced this issue Dec 1, 2019

UPDATE: add support for mixed precision training (davidtvs#8)

a823541

davidtvs closed this as completed Dec 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LRFinder w/ Gradient Accumulation #8

LRFinder w/ Gradient Accumulation #8

rsomani95 commented Nov 13, 2019

davidtvs commented Nov 16, 2019

NaleRaphael commented Nov 25, 2019

rsomani95 commented Nov 26, 2019

NaleRaphael commented Dec 1, 2019

NaleRaphael commented Dec 10, 2019 •

edited

Loading

rsomani95 commented Dec 20, 2019

NaleRaphael commented Dec 20, 2019

davidtvs commented Dec 20, 2019

NaleRaphael commented Dec 20, 2019

davidtvs commented Dec 23, 2019

davidtvs commented Jan 5, 2020

LRFinder w/ Gradient Accumulation #8

LRFinder w/ Gradient Accumulation #8

Comments

rsomani95 commented Nov 13, 2019

davidtvs commented Nov 16, 2019

NaleRaphael commented Nov 25, 2019

rsomani95 commented Nov 26, 2019

NaleRaphael commented Dec 1, 2019

NaleRaphael commented Dec 10, 2019 • edited Loading

rsomani95 commented Dec 20, 2019

NaleRaphael commented Dec 20, 2019

davidtvs commented Dec 20, 2019

NaleRaphael commented Dec 20, 2019

davidtvs commented Dec 23, 2019

davidtvs commented Jan 5, 2020

NaleRaphael commented Dec 10, 2019 •

edited

Loading