-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port TorchRL tutorial for DDPG Loss to pytorch.org #2413
Conversation
Hi @nairbv! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
✅ Deploy Preview for pytorch-tutorials-preview ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few editorial suggestions
intermediate_source/coding_ddpg.py
Outdated
# environment and reset it when required. | ||
# Data collectors are designed to help developers have a tight control | ||
# on the number of frames per batch of data, on the (a)sync nature of this | ||
# collection and on the resources allocated to the data collection (e.g. GPU, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# collection and on the resources allocated to the data collection (e.g. GPU, | |
# collection and on the resources allocated to the data collection (for example, GPU, |
intermediate_source/coding_ddpg.py
Outdated
# Data collectors are designed to help developers have a tight control | ||
# on the number of frames per batch of data, on the (a)sync nature of this | ||
# collection and on the resources allocated to the data collection (e.g. GPU, | ||
# number of workers etc). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# number of workers etc). | |
# number of workers, and so on). |
intermediate_source/coding_ddpg.py
Outdated
# - the policy, | ||
# - the total number of frames before the collector is considered empty, | ||
# - the maximum number of frames per trajectory (useful for non-terminating | ||
# environments, like dm_control ones). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# environments, like dm_control ones). | |
# environments, like the ``dm_control`` ones). |
intermediate_source/coding_ddpg.py
Outdated
# | ||
# .. note:: | ||
# As already mentioned above, to get a more reasonable performance, | ||
# use a greater value for ``total_frames`` e.g. 1M. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# use a greater value for ``total_frames`` e.g. 1M. | |
# use a greater value for ``total_frames`` for example, 1M. |
intermediate_source/coding_ddpg.py
Outdated
# Conclusion | ||
# ---------- | ||
# | ||
# In this tutorial, we have learnt how to code a loss module in TorchRL given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# In this tutorial, we have learnt how to code a loss module in TorchRL given | |
# In this tutorial, we have learned how to code a loss module in TorchRL given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few editorial suggestions
@nairbv the PR looks great overall! Can you please address the editorial suggestions, sign the CLA, and fix the spellcheck. |
Port existing tutorial from torchrl to pytorch.org. pytorch#2351
Please rebase to run against the latest GH workflow. |
did a rebase and made some of the stylistic changes, still waiting internally to be added to CLA |
the pyspelling spell checker seems to be having an issue with acronyms like DDPG, common terms like cuda, as well as variable names mentioned from within code comments. What's the recommended way of addressing these kinds of CI errors? |
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
@nairbv Check the https://github.com/pytorch/tutorials/blob/main/en-wordlist.txt for the spelling of CUDA and some other words and for the new acronyms, you can propose an update to that file. All code references (name of functions, classes, API methods, etc) should be enclosed in double quotes. |
hmm.... that works for terms like DDPG I guess but...
This seems unconventional for single-line code comments. E.g. In cases like the |
It is a hack, yes. But Pyspelling can't differentiate between a one-line comment and a multiline comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with this. See my comment about next steps
# loss component; | ||
# - How to use (or not) a target network, and how to update its parameters; | ||
# - How to create an optimizer associated with a loss module. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some ideas of next steps:
- using @dispatch for loss modules (see [Feature] Distpatch IQL loss module rl#1230 for instance)
- Flexible keys (see https://github.com/pytorch/rl/pulls?page=2&q=is%3Apr+is%3Aclosed)
These features should not necessarily be part of the tutorial but it's interesting to mention that the current implementation could be improved with these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment to document potential future tutorial updates, or are you suggesting we should add as a "next steps" section in the content of the tutorial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a "next steps" or something like that.
The point is that most losses have more features than this "toy example" and it could be weird to read this tutorial from top to bottom without mentioning some nice features such as customization of the tensordict keys or using the loss without tensordict at all.
For now we don't have a tutorial about these features, but I would mention that these things are possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, but not sure if that's exactly how you wanted it. Your link above for flexible keys just goes to a long list of issues, you might have meant to paste something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have been pytorch/rl#1175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with this. See my comment about next steps
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Regarging https://github.com/pytorch/tutorials/actions/runs/5214703804/jobs/9414299217?pr=2413#step:8:14384 |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fix our bug
intermediate_source/coding_ddpg.py
Outdated
create_env_fn=[ | ||
parallel_env, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not to use a parallel env here but just the async collector + regular env
Co-authored-by: Vincent Moens <vincentmoens@gmail.com>
intermediate_source/coding_ddpg.py
Outdated
def parallel_env_constructor( | ||
env_per_collector, | ||
def env_constructor( | ||
transform_state_dict, | ||
): | ||
if env_per_collector == 1: | ||
|
||
def make_t_env(): | ||
env = make_transformed_env(make_env()) | ||
env.transform[2].init_stats(3) | ||
env.transform[2].loc.copy_(transform_state_dict["loc"]) | ||
env.transform[2].scale.copy_(transform_state_dict["scale"]) | ||
return env | ||
|
||
env_creator = EnvCreator(make_t_env) | ||
return env_creator | ||
|
||
parallel_env = ParallelEnv( | ||
num_workers=env_per_collector, | ||
create_env_fn=EnvCreator(lambda: make_env()), | ||
create_env_kwargs=None, | ||
pin_memory=False, | ||
) | ||
env = make_transformed_env(parallel_env) | ||
# we call `init_stats` for a limited number of steps, just to instantiate | ||
# the lazy buffers. | ||
env.transform[2].init_stats(3, cat_dim=1, reduce_dim=[0, 1]) | ||
env.transform[2].load_state_dict(transform_state_dict) | ||
return env | ||
def make_t_env(): | ||
env = make_transformed_env(make_env()) | ||
env.transform[2].init_stats(3) | ||
env.transform[2].loc.copy_(transform_state_dict["loc"]) | ||
env.transform[2].scale.copy_(transform_state_dict["scale"]) | ||
return env | ||
env_creator = EnvCreator(make_t_env) | ||
return env_creator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svekars we can revert this and use the old parallel_env_constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def parallel_env_constructor(
env_per_collector,
transform_state_dict,
):
if env_per_collector == 1:
def make_t_env():
env = make_transformed_env(make_env())
env.transform[2].init_stats(3)
env.transform[2].loc.copy_(transform_state_dict["loc"])
env.transform[2].scale.copy_(transform_state_dict["scale"])
return env
env_creator = EnvCreator(make_t_env)
return env_creator
parallel_env = ParallelEnv(
num_workers=env_per_collector,
create_env_fn=EnvCreator(lambda: make_env()),
create_env_kwargs=None,
pin_memory=False,
)
env = make_transformed_env(parallel_env)
# we call `init_stats` for a limited number of steps, just to instantiate
# the lazy buffers.
env.transform[2].init_stats(3, cat_dim=1, reduce_dim=[0, 1])
env.transform[2].load_state_dict(transform_state_dict)
return env
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. Need to make sure to add the image from https://github.com/pytorch/rl/blob/main/docs/source/_static/img/replaybuffer_traj.png. This also feels like a advanced tutorial rather than intermediate so might want to move to advanced_source
. We can do in a follow up PR.
Port existing tutorial from torchrl to pytorch.org.
Fixes #2351
Description
copy of https://pytorch.org/rl/tutorials/coding_ddpg.html
The tutorial uses half_cheetah so used that for the image in the index.
cc @vmoens
Checklist