Incorporate SerialEnv and introduct multistep policy logic #26

alexander-soare · 2024-03-14T15:30:07Z

This PR has two unrelated features but it was more sensible to make sure they work together.

Incorporate SerialEnv. For rollout, this runs the policy in batch mode, but the environment still runs in a sequential batch. For PushT this still provides a 10x speedup on a machine with an RTX 3090 and 16 CPU threads. The goal is to swap this out with ParallelEnv.
Creates a base class for policies thereby enabling a base forward method that handles multi-step policies. This is for correct handling of the step outputs (reward, terminated, truncated etc). For example, see: https://github.com/users/Cadene/projects/1?pane=issue&itemId=56225488

…rial_env

…oare/multistep_policy_and_serial_env

Cadene · 2024-03-14T23:42:36Z

Thanks! Very useful. Dont hesitate to ping me on discord if you need insights to pass the unit tests ;)
Best

lerobot/common/policies/abstract.py

Cadene · 2024-03-15T10:36:52Z

In the description could you add a minimal code to run the tests so that we can breakpoint easily to better understand the code
Also, the main issue that we are trying to address here is probably the miscalculation of the reward, no? Maybe we should add this in the description !
Thanks for your contribution ;)

lerobot/common/policies/act/policy.py

lerobot/common/policies/tdmpc/policy.py

lerobot/configs/default.yaml

lerobot/scripts/eval.py

tests/test_policies.py

Cadene

Let's iterate one more them and let's merge this very cool feature!
Could you add the time that you saved by adding SerialEnv?
Not sure we have time for this, but ideally we should make sure by training a diffusion policy on Pusht that we can reproduce results obtained before this PR and post them in the PR description.
Thanks!

…multistep_policy_and_serial_env

…rial_env

alexander-soare · 2024-03-18T18:58:42Z

In the description could you add a minimal code to run the tests so that we can breakpoint easily to better understand the code Also, the main issue that we are trying to address here is probably the miscalculation of the reward, no? Maybe we should add this in the description ! Thanks for your contribution ;)

You mean the PR description? I'm confused, as the way to run the tests is the same as for all other tests.

I'll link the relevant issue.

tests/test_policies.py

…e/multistep_policy_and_serial_env

Cadene

Left some comments ;) Almost ready to merge

lerobot/common/envs/aloha/env.py

lerobot/common/envs/pusht/env.py

lerobot/common/policies/abstract.py

lerobot/scripts/eval.py

pyproject.toml

tests/test_policies.py

alexander-soare · 2024-03-20T09:46:51Z

@Cadene bty. Can you please check the previously unresolved comments as well?

alexander-soare · 2024-03-20T14:50:49Z

@Cadene 🏓. I removed abstractmethods as you suggested. Also, please take a look at what I did re. online training. There's definitely a way to do it properly but I'm short-circuiting for now to get this PR through.

alexander-soare added 6 commits March 11, 2024 13:34

early training loss as expected

2a01487

Merge remote-tracking branch 'origin/main' into train_pusht

304355c

wip - still need to verify full training run

87fcc53

Merge branch 'main' into user/alexander-soare/train_pusht

9512d1d

ready for review

98484ac

wip: still needs batch logic for act and tdmp

ba91976

alexander-soare requested a review from Cadene March 14, 2024 15:30

alexander-soare added 3 commits March 14, 2024 16:04

Merge branch 'main' into user/alexander-soare/multistep_policy_and_se…

4822d63

…rial_env

Merge branch 'main' into user/alexander-soare/train_pusht

736bc96

Merge branch 'user/alexander-soare/train_pusht' into user/alexander-s…

a222c88

…oare/multistep_policy_and_serial_env