-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: missing attention mask in VAE encoder in ACT policy #279
Merged
alexander-soare
merged 4 commits into
main
from
thomwolf_2024_06_17_encoder_masking_bug
Jun 19, 2024
Merged
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# @package _global_ | ||
|
||
seed: 1000 | ||
dataset_repo_id: lerobot/aloha_sim_insertion_human | ||
|
||
override_dataset_stats: | ||
observation.images.top: | ||
# stats from imagenet, since we use a pretrained vision model | ||
mean: [[[0.485]], [[0.456]], [[0.406]]] # (c,1,1) | ||
std: [[[0.229]], [[0.224]], [[0.225]]] # (c,1,1) | ||
|
||
training: | ||
offline_steps: 80000 | ||
online_steps: 0 | ||
eval_freq: 10000 | ||
save_freq: 100000 | ||
log_freq: 250 | ||
save_checkpoint: true | ||
|
||
batch_size: 8 | ||
lr: 1e-5 | ||
lr_backbone: 1e-5 | ||
weight_decay: 1e-4 | ||
grad_clip_norm: 10 | ||
online_steps_between_rollouts: 1 | ||
|
||
delta_timestamps: | ||
action: "[i / ${fps} for i in range(${policy.chunk_size})]" | ||
|
||
eval: | ||
n_episodes: 50 | ||
batch_size: 50 | ||
|
||
# See `configuration_act.py` for more details. | ||
policy: | ||
name: act | ||
|
||
# Input / output structure. | ||
n_obs_steps: 1 | ||
chunk_size: 1000 # chunk_size | ||
n_action_steps: 1000 | ||
|
||
input_shapes: | ||
# TODO(rcadene, alexander-soare): add variables for height and width from the dataset/env? | ||
observation.images.top: [3, 480, 640] | ||
observation.state: ["${env.state_dim}"] | ||
output_shapes: | ||
action: ["${env.action_dim}"] | ||
|
||
# Normalization / Unnormalization | ||
input_normalization_modes: | ||
observation.images.top: mean_std | ||
observation.state: mean_std | ||
output_normalization_modes: | ||
action: mean_std | ||
|
||
# Architecture. | ||
# Vision backbone. | ||
vision_backbone: resnet18 | ||
pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1 | ||
replace_final_stride_with_dilation: false | ||
# Transformer layers. | ||
pre_norm: false | ||
dim_model: 512 | ||
n_heads: 8 | ||
dim_feedforward: 3200 | ||
feedforward_activation: relu | ||
n_encoder_layers: 4 | ||
# Note: Although the original ACT implementation has 7 for `n_decoder_layers`, there is a bug in the code | ||
# that means only the first layer is used. Here we match the original implementation by setting this to 1. | ||
# See this issue https://github.com/tonyzhaozh/act/issues/25#issue-2258740521. | ||
n_decoder_layers: 1 | ||
# VAE. | ||
use_vae: true | ||
latent_dim: 32 | ||
n_vae_encoder_layers: 4 | ||
|
||
# Inference. | ||
temporal_ensemble_momentum: null | ||
|
||
# Training and loss computation. | ||
dropout: 0.1 | ||
kl_weight: 10.0 |
3 changes: 3 additions & 0 deletions
3
tests/data/save_policy_to_safetensors/aloha_act_1000_actions/actions.safetensors
Git LFS file not shown
3 changes: 3 additions & 0 deletions
3
tests/data/save_policy_to_safetensors/aloha_act_1000_actions/grad_stats.safetensors
Git LFS file not shown
3 changes: 3 additions & 0 deletions
3
tests/data/save_policy_to_safetensors/aloha_act_1000_actions/output_dict.safetensors
Git LFS file not shown
3 changes: 3 additions & 0 deletions
3
tests/data/save_policy_to_safetensors/aloha_act_1000_actions/param_stats.safetensors
Git LFS file not shown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mentioned in your PR message, I think we can avoid adding a whole config file by doing
(and in the test)
Feels like a no-brainer to me but maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes in this case we still need a
act_1000_actions
config file...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @aliberts you have an opinion on this question around tests and config file.
See my
How it was tested
section in this PR description.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I meant also to change to using just the act.json file. Also happy to hear what @aliberts says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes this also doesn't work because the safetensor files overide themselves (cf my comment in the PR description. maybe a bit short) unless we either:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks. My stance on this is it's better to add test artifacts than to add config files to the source code. That can even mean just moving the yaml file to the artifacts directory. Also happy with both of your proposals. Waiting on @aliberts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO both are fine (the naming trick you went with is okay, although it adds artifacts they're relatively small). I was concerned with speed too but from what I'm seeing in CI tests it doesn't increase tests duration (cf this branch vs main)
Long-term though, I think it'll be much better to have more fine-grained tests and test individual components of the policies and do away with these artifacts, similar to what's done in transformers. My motivations for this are:
For now it's okay because this granularity allows us to iterate faster while still having guardrails. Happy to hear your opinions on this as well.