Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Hypertuner class #3160

Closed
wants to merge 9 commits into from
Closed

Conversation

SkafteNicki
Copy link
Member

What does this PR do?

This is redo of PR #1998. The last PR was too big land because it was a major refactoring, so I will try to split into more manageable pieces.

This PR is basically how the core structure of the HyperTuner class should look. It does not implement any functionalities yet. After this, I plan on 3 follow up PRs, one for each of the features that the HyperTuner class will include:

  • Lr finder (deprecate from trainer, add more tests, get working in ddp mode, add support for DataModule)
  • Batch size finder (deprecate from trainer, somewhat stable at this point, add support for DataModule)
  • new: n worker searcher Let's add a suggested_num_workers() method? #2196) (already have some code, just need to polish for the new api)

Tagging for input: @Borda , @awaelchli , @justusschock

Question: We have informed user that from v0.9 no more API changes would happen (before v1.0). This is a somewhat big API change, so to keep that promise, I see two options:

  • Wait with this to after v1.0
  • Let the hypertuner class be a functionality of bolts, and deprecate the lr finder and batch scaler in lightning

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team August 25, 2020 14:56
@williamFalcon
Copy link
Contributor

williamFalcon commented Aug 25, 2020

@SkafteNicki please wait a week... we need to finish refactors.

We can decide if it's before or after 0.9 then and if it stays here or bolts

@SkafteNicki
Copy link
Member Author

@williamFalcon completly fine with me, whatever makes most sense for the project :]

@Borda Borda requested review from Borda and awaelchli August 25, 2020 15:50
@Borda Borda added this to the 0.9.x milestone Aug 25, 2020
@Borda Borda added the feature Is an improvement or enhancement label Aug 25, 2020
@rohitgr7
Copy link
Contributor

@SkafteNicki does it support user-defined callbacks to be used when calling lr_find or auto_scale_batch_size? as of now, it drops them.

@SkafteNicki
Copy link
Member Author

Based on discussion the tuner class is being dropped for now, instead we will refactor tuner methods into a trainer.tune() method. Closing this.

@SkafteNicki SkafteNicki closed this Sep 1, 2020
@Borda
Copy link
Member

Borda commented Sep 1, 2020

Based on discussion the tuner class is being dropped for now, instead we will refactor tuner methods into a trainer.tune() method. Closing this.

Anyway, GREAT work on this PR! ❤️

@jcsagar
Copy link

jcsagar commented Sep 7, 2020

So with the new datamodule, is there no way to tune batch_size? I was really excited by the Hypertuner class in this thread, so am a little bummed this is closed for now.

For my current workflow, I tried to pass in trainer.tune(lightningmodule, datamodule) with auto_scale_batch_size. It tries to scale the batch size, (incorrectly) succeeds with every attempt, until it finds the batch_size of 2^26 (after 25 trials). Which it tries to then fit and fails. I looked into why this was happening and realized that datamodule isn't actually looked at by the scale_batch_size function in training_tricks.py.

@awaelchli
Copy link
Contributor

@jcsagar Fixed in #3266 and #3271

@jcsagar
Copy link

jcsagar commented Sep 7, 2020

@awaelchli Updated to master and works well. Great stuff, thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants