-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] - Add example for PyTorch lr schedulers. #47454
[RLlib] - Add example for PyTorch lr schedulers. #47454
Conversation
…hms. Either a list of schedulers sequentially applied or a dicitonary mapping module IDs to their list of schedulers respectively. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…dulers with RLlib. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
.experimental( | ||
# Add tow learning rate schedulers to be applied in sequence. | ||
_torch_lr_scheduler_classes=[ | ||
# Multiplies the learning rate by a factor of 0.1 for 10 iterations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user of this script (who is too lazy reading through all the torch docs :D ) I have a few questions here that we should briefly answer here:
- What is the actually resulting total schedule here if user configured
config.training(lr=L)
? I'm assuming: Forlr_const_iters
iterations: UseL * 0.1
, after that, jump back up to L, then decay L by lr_exp_decay each iter? SoL *= 0.3
per iter? - What is an
iter
here? It's not the same necessarily as RLlib algorithm iters, correct, but actually refers toLearner.update_from...
calls, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great questions! I try to answer them in the following:
- So any list of learning rate schedulers will be chained, i.e. we apply the first and then the second in each iteration. Your assumption is (almost) correct. So the first scheduler will multiply the actual learning rate
L
by0.1
unless we have stepped 10 times (after this correct, we stay atL
), i.eL * 0.1
. The second scheduler then takes this rate (L * 0.1
) and decays it with a rate of0.3
, i.eL = (L * 0.1) * 0.3
. Because of this latter assignment, the actual learning rate in the second iteration is(L * 0.1) * 0.3
- So in the second iteration we will have
(L * 0.1) * 0.3^2
and in the third(L * 0.1) * 0.3^3
and so on. - In the 10th iteration however the
ConstantLR
multiplication is off and the learning rate AT THIS POINT is multiplied by the reverse factor, i.e.(L * 0.1) * 0.3^10 * 1/0.1
.
Yes, it is complex, but this is what we want to offer users. How torch.rl_scheduler
instances work together is a torch thing and users have to figure it out, we just apply it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some questions, nits, and comment requests.
Awesome PR @simonsays1980 . Thanks for the example, this helped a lot visualizing how this would look in action.
Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Why are these changes needed?
This PR adds an example to the
rllib/examples/learners/
that shows how to use PyTorch's learning rate schedulers to assemble a complex learning rate schedule for RL training.Related issue number
#47453
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.