Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRFinder w/ Gradient Accumulation #8

Closed
rsomani95 opened this issue Nov 13, 2019 · 11 comments
Closed

LRFinder w/ Gradient Accumulation #8

rsomani95 opened this issue Nov 13, 2019 · 11 comments

Comments

@rsomani95
Copy link

Great package! Thank you for sharing :)

  1. I was wondering if you plan on adding gradient accumulation support for using LRFinder with a larger batch size.
  2. Will you be adding mixed precision support?
@davidtvs
Copy link
Owner

As of right now, I'm not planning on developing the package further. But I'll gladly review PR's

@NaleRaphael
Copy link
Contributor

Hi, @rsomani95 , I've implement a version of LRFinder with gradient accumulation (see also PR #9 ).

Before the PR is merged, you can clone that version of implementation from my forked repository to do any test you want.

$ git clone -b grad_acc_amp --single-branch https://github.com/NaleRaphael/pytorch-lr-finder
$ cd pytorch-lr-finder
$ pip install .

And please feel free to let me know if there is anything need to be improved!

@rsomani95
Copy link
Author

@NaleRaphael this looks great! Give me a few days to get back to you with some feedback.

NaleRaphael added a commit to NaleRaphael/pytorch-lr-finder that referenced this issue Dec 1, 2019
@NaleRaphael
Copy link
Contributor

Hi, @rsomani95 !

apex has been integrated for mixed precision training, and thanks for the new version of apex APIs, there is nothing have to be changed while calling LRFinder.
To use LRFinder for mixed precision training, we just need to setup things by amp.initialize(...).
Examples for the usage is added in examples/lrfinder_mnist_amp.ipynb.

However, I ran into a little problem about this: the time of mixed precision training takes longer than training in FP32.
After searching some articles and github issues, I found this post: NVIDIA/apex - Mixed precision training slower than FP32 training.
It seems likely that it is the cause. Because I'm using a GTX 1660ti, and there is no tensor cores in it...

I'll try to validate it in the next few days. ;)

@NaleRaphael
Copy link
Contributor

NaleRaphael commented Dec 10, 2019

I've run more tests on my machine. And I found that we can set the flag torch.backends.cudnn.benchmark to True to improve the performance.
But it still takes a bit longer time to train in mixed precision (opt_level = "O1") than train in pure FP32. And it seems likely it's the limit of GTX 1660ti.

And I wrote a script for testing the performance of LRFinder with apex.amp:
https://gist.github.com/NaleRaphael/eda9d3f90aa57cf1f6b2ccdfe4217814

Here is the table of the result after I run that script with different conditions:

case time (seconds) time (seconds; with torch.backends.cudnn.benchmark = True)
normal (FP32) 3.6152 3.5799
amp (FP16, opt_level="O1") 18.7100 4.0301
amp (FP16, opt_level="O2") 18.0375 3.2404

Besides, I've create a notebook in colab. But the GPU used by colab is K80, and there is no tensor cores in it. Although the performance is not benefited by tensor cores, it seems that LRFinder is still stable enough to run with apex.amp.
https://colab.research.google.com/drive/1BhWYtLFOa24wisNckt9i6rQhBKurVWWV

@rsomani95
Copy link
Author

Hello @NaleRaphael.

This is great work!! Thank you for sharing it.

AccumulationLRFinder works smoothly, does what its supposed to do. I appreciate how easy it is to do with your PR

In my experiments, I took used an RTX 2080 Ti, so I expected performance gains with FP16 (opt_level="01").

With FP32, it took ~18:47, whereas with FP16, it took ~15:00.
Strangely, setting torch.backends.cudnn.benchmark = True was detrimental to performance, ETA was ~25:00 (I didn't see this through, for obvious reasons).

Thank you again for your time and effort!
@davidtvs In my opinion, I think this PR should be merged (if not, I will be using @NaleRaphael's fork anyways)

@NaleRaphael
Copy link
Contributor

Hi @rsomani95 .
Many thanks for your help and feedback, and I'm glad that these implementation helped!

And it's quite weird that it takes longer time to run when torch.backends.cudnn.benchmark = True. As far as I know, that flag should accelerate training speed when input size is fixed in each iteration.

However, it seems to me that it's not harmful to pend the issue about torch.backends.cudnn.benchmark currently. Because it's not related to LRFinder directly and the use of it depends on user. Though, I'll keep it in mind!

Besides, it seems that apex is going to be integrated as a builtin component of PyTorch in the future. (nvidai/apex#659) I will keep tracking this, too.

@davidtvs Before merging this PR, I would like to add some code to make users able to install apex optionally. I'll leave a comment here when it's done.

Thanks you, guys!

@davidtvs
Copy link
Owner

Sounds good, I'll wait for your changes and then merge. Thanks

@NaleRaphael
Copy link
Contributor

Hi @davidtvs .
Changes for installation scripts are done, and I've tested it by the following command on both Ubuntu and Windows, all worked fine!

$ git clone -b grad_acc_amp --single-branch https://github.com/NaleRaphael/pytorch-lr-finder
$ cd pytorch-lr-finder
$ pip install -v --global-option="amp" ./

After the latest version of this package is updated on PyPI, the command pip install torch-lr-finder -v --global-option="amp" should work too.

Thanks a lot for your review!

davidtvs pushed a commit that referenced this issue Dec 23, 2019
* UPDATE: implement a new LRFinder with the support of gradient accumulation

`AccumulationLRFinder` is a learning rate finder implemented with the
mechanism of gradient accumulation.

Besides, the iterator used for getting batch of data for training is
now replaced by `DataLoaderIterWrapper` to simplify the code and make
the implementation of `AccumulationLRFinder` easiler.

And the input parameter ofs `LRFinder._train_batch()` is also modified
for the same reason mentioned above.

* UPDATE: add support for mixed precision training (#8)

* UPDATE: add requirements for mixed precision training and update README

* MAINT: improve the compatibility for Python 2 and some minor fixes

- add `next = __next__` in the class `DataLoaderIterWrapper`

- call `logging.basicConfig()` before getting a logger, see also:
  https://docs.python.org/2.7/library/logging.html#logging.log

- Add more information about installing this package for users
  who need to use it with mixed precision training
@davidtvs
Copy link
Owner

The PR from @NaleRaphael is merged. Thanks @rsomani95 for raising the issue.

@davidtvs
Copy link
Owner

davidtvs commented Jan 5, 2020

I'm considering changing the API for gradient accumulation, please have a look at PR #13 and give your feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants