Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example: Simple RL example using DQN/Lightning #1232

Merged
merged 36 commits into from
Mar 28, 2020

Conversation

djbyrne
Copy link
Contributor

@djbyrne djbyrne commented Mar 25, 2020

DQN RL Agent using Lightning. Model uses an IterableDataset to wrap the ReplayBuffer, providing mini batches of past experiences to train on during each train_step. During each train_step, the agent carries out a step through the environment and updates the ReplayBuffer within the Dataset.

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #713
Provides a basic domain example of using Lightning for Reinforcement Learning

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Donal Byrne added 2 commits March 22, 2020 16:13
* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset
@pep8speaks
Copy link

pep8speaks commented Mar 25, 2020

Hello @djbyrne! Thanks for updating this PR.

Line 270:68: W504 line break after binary operator

Comment last updated at 2020-03-28 09:34:00 UTC

@Borda Borda changed the title Example: Simple RL example using DQN/Lightning [blocked by #990] Example: Simple RL example using DQN/Lightning Mar 25, 2020
@Borda Borda requested a review from a team March 25, 2020 08:13
@Borda Borda added feature Is an improvement or enhancement example labels Mar 25, 2020
@Borda Borda added this to the 0.7.2 milestone Mar 25, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

  • add note to changelog
  • add docstring with some decribtion

pl_examples/domain_templates/dqn.py Outdated Show resolved Hide resolved
class DQN(nn.Module):
""" Simple MLP network"""

def __init__(self, obs_size, n_actions, hidden_size=128):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pl_examples/domain_templates/dqn.py Outdated Show resolved Hide resolved
djbyrne and others added 3 commits March 25, 2020 10:45
simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* CI: split tests-examples

* tests without template

* comment depends

* CircleCI typo

* add doctest

* update test req.

* CI tests

* setup macOS

* longer train

* lover pred acc

* fix model

* rename default model

* lower tests acc

* typo

* imports

* fix test optimizer

* update calls

* fix Win

* lower Drone image

* fix call

* pytorch image

* fix test

* add dev image

* add dev image

* update image

* drone volume

* lint

* update test notes

* rename tests/models >> tests/base

* group models

* conftest

* optim imports

* typos

* fix import

* fix tests

* install AMP

* tests

* fix import
@williamFalcon
Copy link
Contributor

merged #990
can you also transfer this to a colab?

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog
@djbyrne
Copy link
Contributor Author

djbyrne commented Mar 25, 2020

@williamFalcon how do I go about adding the example to the colab notebook?

@Borda Borda changed the title [blocked by #990] Example: Simple RL example using DQN/Lightning Example: Simple RL example using DQN/Lightning Mar 25, 2020
@Borda
Copy link
Member

Borda commented Mar 25, 2020

@williamFalcon how do I go about adding the example to the colab notebook?

We had a discussion with @ethanwharris and @MattPainter01 some time ago and we agreed to have it as a notebook in this repo which is connected Collab on request and also used as an example in Docs, right?

@djbyrne
Copy link
Contributor Author

djbyrne commented Mar 26, 2020

The circleci tests seem to be failing due to using typing OrderedDict. Is there any reason why this should be failing?

CHANGELOG.md Outdated Show resolved Hide resolved
Borda and others added 3 commits March 26, 2020 23:44
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
@Borda Borda requested review from neggert, tullie, williamFalcon and a team March 26, 2020 22:53
@Borda
Copy link
Member

Borda commented Mar 26, 2020

@djbyrne could you rebase master, seems that you are missing the recent test/example split

bmartinn and others added 22 commits March 27, 2020 08:32
…ng-AI#1229)

* Fix requirement-extra use released Trains package

* Update README.md add Trains and links to the external Visualization section

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
…code (Lightning-AI#1240)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* system info

* update big info

* test script

* update config

* rename script

* import path
* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset
simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Mar 28, 2020

Codecov Report

Merging #1232 into master will increase coverage by 1%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1232   +/-   ##
======================================
+ Coverage      91%     92%   +1%     
======================================
  Files          61      61           
  Lines        3121    3153   +32     
======================================
+ Hits         2833    2886   +53     
+ Misses        288     267   -21

@djbyrne
Copy link
Contributor Author

djbyrne commented Mar 28, 2020

@djbyrne could you rebase master, seems that you are missing the recent test/example split

I rebased with master, but it seems that the ubuntu and osx tests fails when uploading the pytest results. Any ideas why?

@williamFalcon
Copy link
Contributor

@djbyrne just restarted jobs. working to merge this ASAP :)

@williamFalcon williamFalcon merged commit dab3b96 into Lightning-AI:master Mar 28, 2020
@Borda
Copy link
Member

Borda commented Mar 28, 2020

@djbyrne pls next time did rebase, now it seems like you did merge since it shows 67 changed files...
but GREAT job, thanks for the RL example ⚡

alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 3, 2020
* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* CI: split tests-examples (Lightning-AI#990)

* CI: split tests-examples

* tests without template

* comment depends

* CircleCI typo

* add doctest

* update test req.

* CI tests

* setup macOS

* longer train

* lover pred acc

* fix model

* rename default model

* lower tests acc

* typo

* imports

* fix test optimizer

* update calls

* fix Win

* lower Drone image

* fix call

* pytorch image

* fix test

* add dev image

* add dev image

* update image

* drone volume

* lint

* update test notes

* rename tests/models >> tests/base

* group models

* conftest

* optim imports

* typos

* fix import

* fix tests

* install AMP

* tests

* fix import

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* updated example image

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

* Disable validation when val_percent_check=0 (Lightning-AI#1251)

* fix disable validation

* add test

* update changelog

* update docs for val_percent_check

* make "fast training" docs consistent

* calling self.forward() -> self() (Lightning-AI#1211)

* self.forward() -> self()

* update changelog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Fix requirements-extra.txt Trains package to release version (Lightning-AI#1229)

* Fix requirement-extra use released Trains package

* Update README.md add Trains and links to the external Visualization section

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Remove unnecessary parameters to super() in documentation and source code (Lightning-AI#1240)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update deprecation warning (Lightning-AI#1258)

* update docs for progress bat values (Lightning-AI#1253)

* lower timeouts for inactive issues (Lightning-AI#1250)

* update contrib list (Lightning-AI#1241)

Co-authored-by: William Falcon <waf2107@columbia.edu>

* Fix outdated docs (Lightning-AI#1227)

* Fix typo (Lightning-AI#1224)

* drop unused Tox (Lightning-AI#1242)

* system info (Lightning-AI#1234)

* system info

* update big info

* test script

* update config

* rename script

* import path

* Changed smoothing in tqdm to decrease variability of time remaining between training / eval (Lightning-AI#1194)

* Example: Simple RL example using DQN/Lightning

* DQN RL Agent using Lightning

* Uses Iterable Dataset for Replay Buffer

* Buffer is populated by agent as training is carried out, updating the
dataset

* Applied autopep8 fixes

* * Updated line length from 120 to 110

* Update pl_examples/domain_templates/dqn.py

simplify get_device method

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pl_examples/domain_templates/dqn.py

Re-ordered imports

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Clean up

* added module docstring

* renamed variables to be more descriptive

* Added missing docstrings and type annotations

* Added gym to example requirements

* Added note to changelog

* update types

* rename script

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* another rename

Co-authored-by: Donal Byrne <Donal.Byrne@xperi.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com>
Co-authored-by: Tyler Yep <tyep@stanford.edu>
Co-authored-by: Shunta Komatsu <59395084+skmatz@users.noreply.github.com>
Co-authored-by: Jack Pertschuk <jackpertschuk@gmail.com>
@edenlightning edenlightning modified the milestones: 0.7.2, 1.0.x Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
example feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for non-static data for reinforcement learning
10 participants