Example: Simple RL example using DQN/Lightning #1232

djbyrne · 2020-03-25T07:25:01Z

DQN RL Agent using Lightning. Model uses an IterableDataset to wrap the ReplayBuffer, providing mini batches of past experiences to train on during each train_step. During each train_step, the agent carries out a step through the environment and updates the ReplayBuffer within the Dataset.

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #713
Provides a basic domain example of using Lightning for Reinforcement Learning

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

* DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset

pep8speaks · 2020-03-25T07:25:04Z

Hello @djbyrne! Thanks for updating this PR.

In the file pl_examples/domain_templates/reinforse_learn_Qnet.py:

Line 270:68: W504 line break after binary operator

Comment last updated at 2020-03-28 09:34:00 UTC

Borda

LGTM 🚀

add note to changelog
add docstring with some decribtion

pl_examples/domain_templates/dqn.py

Borda · 2020-03-25T08:15:49Z

pl_examples/domain_templates/dqn.py

+class DQN(nn.Module):
+    """ Simple MLP network"""
+
+    def __init__(self, obs_size, n_actions, hidden_size=128):


pls add types

pl_examples/domain_templates/dqn.py

simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* CI: split tests-examples * tests without template * comment depends * CircleCI typo * add doctest * update test req. * CI tests * setup macOS * longer train * lover pred acc * fix model * rename default model * lower tests acc * typo * imports * fix test optimizer * update calls * fix Win * lower Drone image * fix call * pytorch image * fix test * add dev image * add dev image * update image * drone volume * lint * update test notes * rename tests/models >> tests/base * group models * conftest * optim imports * typos * fix import * fix tests * install AMP * tests * fix import

williamFalcon · 2020-03-25T11:47:03Z

merged #990
can you also transfer this to a colab?

* added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog

djbyrne · 2020-03-25T21:01:34Z

@williamFalcon how do I go about adding the example to the colab notebook?

Borda · 2020-03-25T22:15:23Z

@williamFalcon how do I go about adding the example to the colab notebook?

We had a discussion with @ethanwharris and @MattPainter01 some time ago and we agreed to have it as a notebook in this repo which is connected Collab on request and also used as an example in Docs, right?

djbyrne · 2020-03-26T07:28:39Z

The circleci tests seem to be failing due to using typing OrderedDict. Is there any reason why this should be failing?

CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

Borda · 2020-03-26T23:30:57Z

@djbyrne could you rebase master, seems that you are missing the recent test/example split

…ng-AI#1229) * Fix requirement-extra use released Trains package * Update README.md add Trains and links to the external Visualization section Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

…code (Lightning-AI#1240) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: William Falcon <waf2107@columbia.edu>

* system info * update big info * test script * update config * rename script * import path

…etween training / eval (Lightning-AI#1194)

* DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset

simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

…tning into dqn_example

codecov · 2020-03-28T09:53:54Z

Codecov Report

Merging #1232 into master will increase coverage by 1%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1232   +/-   ##
======================================
+ Coverage      91%     92%   +1%     
======================================
  Files          61      61           
  Lines        3121    3153   +32     
======================================
+ Hits         2833    2886   +53     
+ Misses        288     267   -21

djbyrne · 2020-03-28T11:14:33Z

@djbyrne could you rebase master, seems that you are missing the recent test/example split

I rebased with master, but it seems that the ubuntu and osx tests fails when uploading the pytest results. Any ideas why?

williamFalcon · 2020-03-28T19:52:32Z

@djbyrne just restarted jobs. working to merge this ASAP :)

Borda · 2020-03-28T22:55:25Z

@djbyrne pls next time did rebase, now it seems like you did merge since it shows 67 changed files...
but GREAT job, thanks for the RL example ⚡

* Example: Simple RL example using DQN/Lightning * DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset * Applied autopep8 fixes * * Updated line length from 120 to 110 * Update pl_examples/domain_templates/dqn.py simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pl_examples/domain_templates/dqn.py Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * CI: split tests-examples (Lightning-AI#990) * CI: split tests-examples * tests without template * comment depends * CircleCI typo * add doctest * update test req. * CI tests * setup macOS * longer train * lover pred acc * fix model * rename default model * lower tests acc * typo * imports * fix test optimizer * update calls * fix Win * lower Drone image * fix call * pytorch image * fix test * add dev image * add dev image * update image * drone volume * lint * update test notes * rename tests/models >> tests/base * group models * conftest * optim imports * typos * fix import * fix tests * install AMP * tests * fix import * Clean up * added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog * updated example image * update types * rename script * Update CHANGELOG.md Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * another rename * Disable validation when val_percent_check=0 (Lightning-AI#1251) * fix disable validation * add test * update changelog * update docs for val_percent_check * make "fast training" docs consistent * calling self.forward() -> self() (Lightning-AI#1211) * self.forward() -> self() * update changelog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Fix requirements-extra.txt Trains package to release version (Lightning-AI#1229) * Fix requirement-extra use released Trains package * Update README.md add Trains and links to the external Visualization section Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Remove unnecessary parameters to super() in documentation and source code (Lightning-AI#1240) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update deprecation warning (Lightning-AI#1258) * update docs for progress bat values (Lightning-AI#1253) * lower timeouts for inactive issues (Lightning-AI#1250) * update contrib list (Lightning-AI#1241) Co-authored-by: William Falcon <waf2107@columbia.edu> * Fix outdated docs (Lightning-AI#1227) * Fix typo (Lightning-AI#1224) * drop unused Tox (Lightning-AI#1242) * system info (Lightning-AI#1234) * system info * update big info * test script * update config * rename script * import path * Changed smoothing in tqdm to decrease variability of time remaining between training / eval (Lightning-AI#1194) * Example: Simple RL example using DQN/Lightning * DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset * Applied autopep8 fixes * * Updated line length from 120 to 110 * Update pl_examples/domain_templates/dqn.py simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pl_examples/domain_templates/dqn.py Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Clean up * added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog * update types * rename script * Update CHANGELOG.md Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * another rename Co-authored-by: Donal Byrne <Donal.Byrne@xperi.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch> Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com> Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com> Co-authored-by: Tyler Yep <tyep@stanford.edu> Co-authored-by: Shunta Komatsu <59395084+skmatz@users.noreply.github.com> Co-authored-by: Jack Pertschuk <jackpertschuk@gmail.com>

Donal Byrne added 2 commits March 22, 2020 16:13

Example: Simple RL example using DQN/Lightning

07a4e7d

* DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset

Applied autopep8 fixes

05cf5ac

* Updated line length from 120 to 110

fc9f31d

Borda changed the title ~~Example: Simple RL example using DQN/Lightning~~ [blocked by #990] Example: Simple RL example using DQN/Lightning Mar 25, 2020

Borda requested a review from a team March 25, 2020 08:13

Borda added feature Is an improvement or enhancement example labels Mar 25, 2020

Borda added this to the 0.7.2 milestone Mar 25, 2020

Borda approved these changes Mar 25, 2020

View reviewed changes

djbyrne and others added 3 commits March 25, 2020 10:45

Update pl_examples/domain_templates/dqn.py

cafea47

simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

Update pl_examples/domain_templates/dqn.py

606f1f2

Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

Clean up

31ef2eb

* added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog

Borda changed the title ~~[blocked by #990] Example: Simple RL example using DQN/Lightning~~ Example: Simple RL example using DQN/Lightning Mar 25, 2020

williamFalcon and others added 2 commits March 26, 2020 09:28

updated example image

e86e6b2

update types

d2ef4fa

Borda reviewed Mar 26, 2020

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Borda and others added 3 commits March 26, 2020 23:44

rename script

b4b8dd7

Update CHANGELOG.md

3255539

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

another rename

8b2c9e2

Borda requested review from neggert, tullie, williamFalcon and a team March 26, 2020 22:53

bmartinn and others added 22 commits March 27, 2020 08:32

Fix requirements-extra.txt Trains package to release version (Lightni…

6a0b171

…ng-AI#1229) * Fix requirement-extra use released Trains package * Update README.md add Trains and links to the external Visualization section Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Remove unnecessary parameters to super() in documentation and source …

6772e0c

…code (Lightning-AI#1240) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

update deprecation warning (Lightning-AI#1258)

593bf50

update docs for progress bat values (Lightning-AI#1253)

bec43c9

lower timeouts for inactive issues (Lightning-AI#1250)

9bb2e00

update contrib list (Lightning-AI#1241)

da18534

Co-authored-by: William Falcon <waf2107@columbia.edu>

Fix outdated docs (Lightning-AI#1227)

3a93aaf

Fix typo (Lightning-AI#1224)

ac6692d

drop unused Tox (Lightning-AI#1242)

1a9719c

system info (Lightning-AI#1234)

61177cd

* system info * update big info * test script * update config * rename script * import path

Changed smoothing in tqdm to decrease variability of time remaining b…

12b39a7

…etween training / eval (Lightning-AI#1194)

Example: Simple RL example using DQN/Lightning

582fe4c

* DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset

Applied autopep8 fixes

3ed2739

* Updated line length from 120 to 110

dac522e

Update pl_examples/domain_templates/dqn.py

ec85171

simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

Update pl_examples/domain_templates/dqn.py

eb72022

Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

Clean up

e03e015

* added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog

update types

0e9ca89

rename script

42838b3

Update CHANGELOG.md

9cf4915

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

another rename

f9be8b0

Merge branch 'dqn_example' of https://github.com/djbyrne/pytorch-ligh…

a44c90c

…tning into dqn_example

williamFalcon merged commit dab3b96 into Lightning-AI:master Mar 28, 2020

edenlightning modified the milestones: 0.7.2, 1.0.x Nov 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example: Simple RL example using DQN/Lightning #1232

Example: Simple RL example using DQN/Lightning #1232

djbyrne commented Mar 25, 2020 •

edited by Borda

Loading

pep8speaks commented Mar 25, 2020 •

edited

Loading

Borda left a comment •

edited

Loading

Borda Mar 25, 2020

djbyrne Mar 25, 2020

williamFalcon commented Mar 25, 2020

djbyrne commented Mar 25, 2020

Borda commented Mar 25, 2020

djbyrne commented Mar 26, 2020

Borda commented Mar 26, 2020

codecov bot commented Mar 28, 2020 •

edited

Loading

djbyrne commented Mar 28, 2020

williamFalcon commented Mar 28, 2020

Borda commented Mar 28, 2020 •

edited

Loading

Example: Simple RL example using DQN/Lightning #1232

Example: Simple RL example using DQN/Lightning #1232

Conversation

djbyrne commented Mar 25, 2020 • edited by Borda Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

pep8speaks commented Mar 25, 2020 • edited Loading

Comment last updated at 2020-03-28 09:34:00 UTC

Borda left a comment • edited Loading

Choose a reason for hiding this comment

Borda Mar 25, 2020

Choose a reason for hiding this comment

djbyrne Mar 25, 2020

Choose a reason for hiding this comment

williamFalcon commented Mar 25, 2020

djbyrne commented Mar 25, 2020

Borda commented Mar 25, 2020

djbyrne commented Mar 26, 2020

Borda commented Mar 26, 2020

codecov bot commented Mar 28, 2020 • edited Loading

Codecov Report

djbyrne commented Mar 28, 2020

williamFalcon commented Mar 28, 2020

Borda commented Mar 28, 2020 • edited Loading

djbyrne commented Mar 25, 2020 •

edited by Borda

Loading

pep8speaks commented Mar 25, 2020 •

edited

Loading

Borda left a comment •

edited

Loading

codecov bot commented Mar 28, 2020 •

edited

Loading

Borda commented Mar 28, 2020 •

edited

Loading