Active Exploration for System Identification in Robotic Manipulation

Marius Memmel, Andrew Wagenmaker, Chuning Zhu, Patrick Yin, Dieter Fox, Abhishek Gupta

ICLR 2024 (oral)

paper | arxiv | website

ASID is a generic pipeline for Sim2Real transfer that solves dynamic tasks zero-shot!

This repository contains our implementation of the Fisher Information objective presented in the paper.

Setup

Create the conda environment from file:
```
conda env create -f environment.yaml
```
We use mujoco as our simulator and stable-baselines3 for policy training.
Activate the conda environment:
```
conda activate asid
```
Login to wandb (optional)

You're all set!

Code tested with Python 3.9 and CUDA versions 11.8 and 12.3.

Training

To train a policy with the default configuration (inertia parameter), run

python train.py

We use hydra to manage the configurations. You can find the configuration files in configs/.

This repository supports physics parameters inertia and friction for the rod. Pass asid=inertia, asid=friction to python train.py to change the parameters.

By default, the action space is 2DoF, i.e., endeffector x,y delta positions. You can change this by passing robot=sim_3_dof (xyz) or robot=sim_6_dof (xyz,rpy) but training requires more samples and possibly a longer horizon which you can set by robot.max_path_length=30.

To speed up training, we spin up multiple environments in parallel for training and evaluation. Set num_workers=8, num_workers_eval=4 according to your system specifications. Warning: since we initialize an additional environment per instance for the gradient computation, start with lower numbers and work your way up. We found num_workers=32, num_workers_eval=4 to work best on a machine with 64GB RAM.

Visualization

The default setup logs visualizations and metrics to tensorboard. Launch tensorboard like:

tensorboard --logdir logdir --host 0.0.0.0 --port 6006

You can change the output to stdout and Weights & Biases by passing log=stdout, log=wandb to python train.py.

When using Weights & Biases don't forget to log in and pass your username like log.entity=USERNAME.

Example

Here's an example of the training loss (friction, inertia) and policy behavior (friction) after 250k steps with the default parameters:

Tipp: In the intertia setting, the policy might learn to push the rod off the table to maximize reward. Pass env.safety_penalty=1000 to penalize the rod falling off.

Citation

@article{memmel2024asid,
  title={ASID: Active Exploration for System Identification in Robotic Manipulation},
  author={Memmel, Marius and Wagenmaker, Andrew and Zhu, Chuning and Yin, Patrick and Fox, Dieter and Gupta, Abhishek},
  journal={arXiv preprint arXiv:2404.12308},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
env		env
media		media
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Active Exploration for System Identification in Robotic Manipulation

Setup

Training

Visualization

Example

Citation

About

Releases

Packages

Languages

WEIRDLabUW/asid

Folders and files

Latest commit

History

Repository files navigation

Active Exploration for System Identification in Robotic Manipulation

Setup

Training

Visualization

Example

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages