-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate SerialEnv and introduct multistep policy logic #26
Incorporate SerialEnv and introduct multistep policy logic #26
Conversation
Thanks! Very useful. Dont hesitate to ping me on discord if you need insights to pass the unit tests ;) |
In the description could you add a minimal code to run the tests so that we can breakpoint easily to better understand the code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's iterate one more them and let's merge this very cool feature!
Could you add the time that you saved by adding SerialEnv?
Not sure we have time for this, but ideally we should make sure by training a diffusion policy on Pusht that we can reproduce results obtained before this PR and post them in the PR description.
Thanks!
…multistep_policy_and_serial_env
…multistep_policy_and_serial_env
…multistep_policy_and_serial_env
You mean the PR description? I'm confused, as the way to run the tests is the same as for all other tests. I'll link the relevant issue. |
…e/multistep_policy_and_serial_env
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments ;) Almost ready to merge
@Cadene bty. Can you please check the previously unresolved comments as well? |
@Cadene 🏓. I removed abstractmethods as you suggested. Also, please take a look at what I did re. online training. There's definitely a way to do it properly but I'm short-circuiting for now to get this PR through. |
This PR has two unrelated features but it was more sensible to make sure they work together.
Incorporate
SerialEnv
. For rollout, this runs the policy in batch mode, but the environment still runs in a sequential batch. For PushT this still provides a 10x speedup on a machine with an RTX 3090 and 16 CPU threads. The goal is to swap this out withParallelEnv
.Creates a base class for policies thereby enabling a base forward method that handles multi-step policies. This is for correct handling of the step outputs (reward, terminated, truncated etc). For example, see: https://github.com/users/Cadene/projects/1?pane=issue&itemId=56225488