Skip to content

Latest commit

 

History

History
89 lines (65 loc) · 4 KB

README.md

File metadata and controls

89 lines (65 loc) · 4 KB

Active Exploration for System Identification in Robotic Manipulation

Marius Memmel, Andrew Wagenmaker, Chuning Zhu, Patrick Yin, Dieter Fox, Abhishek Gupta

ICLR 2024 (oral)

paper | arxiv | website

ASID is a generic pipeline for Sim2Real transfer that solves dynamic tasks zero-shot!

This repository contains our implementation of the Fisher Information objective presented in the paper.

Setup

  1. Create the conda environment from file:
    conda env create -f environment.yaml
    
    We use mujoco as our simulator and stable-baselines3 for policy training.
  2. Activate the conda environment:
    conda activate asid
    
  3. Login to wandb (optional)

You're all set!

Code tested with Python 3.9 and CUDA versions 11.8 and 12.3.

Training

To train a policy with the default configuration (inertia parameter), run

python train.py

We use hydra to manage the configurations. You can find the configuration files in configs/.

This repository supports physics parameters inertia and friction for the rod. Pass asid=inertia, asid=friction to python train.py to change the parameters.

By default, the action space is 2DoF, i.e., endeffector x,y delta positions. You can change this by passing robot=sim_3_dof (xyz) or robot=sim_6_dof (xyz,rpy) but training requires more samples and possibly a longer horizon which you can set by robot.max_path_length=30.

To speed up training, we spin up multiple environments in parallel for training and evaluation. Set num_workers=8, num_workers_eval=4 according to your system specifications. Warning: since we initialize an additional environment per instance for the gradient computation, start with lower numbers and work your way up. We found num_workers=32, num_workers_eval=4 to work best on a machine with 64GB RAM.

Visualization

The default setup logs visualizations and metrics to tensorboard. Launch tensorboard like:

tensorboard --logdir logdir --host 0.0.0.0 --port 6006

You can change the output to stdout and Weights & Biases by passing log=stdout, log=wandb to python train.py.

When using Weights & Biases don't forget to log in and pass your username like log.entity=USERNAME.

Example

Here's an example of the training loss (friction, inertia) and policy behavior (friction) after 250k steps with the default parameters:

Tipp: In the intertia setting, the policy might learn to push the rod off the table to maximize reward. Pass env.safety_penalty=1000 to penalize the rod falling off.

Citation

@article{memmel2024asid,
  title={ASID: Active Exploration for System Identification in Robotic Manipulation},
  author={Memmel, Marius and Wagenmaker, Andrew and Zhu, Chuning and Yin, Patrick and Fox, Dieter and Gupta, Abhishek},
  journal={arXiv preprint arXiv:2404.12308},
  year={2024}
}