A Reproducibility study of Behavior Transformers: Cloning k modes with one stone
This repository extends the original code repository of Behavior Transformers: Cloning k modes with one stone to serve as an in-depth reproducibility assessment of the paper.
It contains scripts to reproduce each of the figures and tables in the paper in addition to scripts to run additional experiments.
It is structured as follows:
Clone the repository with its submodules.
We track Relay Policy Learning repo as a submodule for the
Franka kitchen environment.
It uses git-lfs
. Make sure you have it installed.
git clone --recurse-submodules
If you didn't clone the repo with --recurse-submodules
, you can clone the submodules with:
git submodule update --init
The datasets are stored in the data
folder and are not tracked by git
.
-
Download the datasets here.
wget https://osf.io/download/4g53p/ -O ./data/bet_data_release.tar.gz
-
Extract the datasets into the
data/
folder.tar -xvf data/bet_data_release.tar.gz -C data
The contents of the data
folder should look like this:
data/bet_data_release.tar.gz
: The archive just downloaded.data/bet_data_release
: contains the datasets released by the paper authors.data/README.md
: A placeholder.
We provide installation methods to meet different systems. The methods aim to insure ease of use, portability, and reproducibility thanks to Docker. It is hard to cover all systems, so we focused on the main ones.
- A: amd64 with CUDA: for machines with Nvidia GPUs with Intel CPUs.
- B: amd 64 CPU-only: for machines with Intel CPUs.
- C: arm64 with MPS: to leverage the M1 GPU of Apple machines.
This installation method is adapted from the Cresset initiative. Refer to the Cresset repository for more details.
Steps prefixed with [CUDA] are only required for the CUDA option.
Prerequisites:
To check if you have each of them run <command-name> --version
or <command-name> version
in the terminal.
make
.docker
. (v20.10+)docker compose
(V2)- [CUDA] Nvidia CUDA Driver (Only the driver. No CUDA toolkit, etc)
- [CUDA]
nvidia-docker
(the NVIDIA Container Toolkit).
Installation
cd installation/amd64
Run
make env
The .env
file will be created in the installation directory. You need to edit it according to the following needs:
[CUDA] If you are using an old Nvidia GPU (i.e. capability) < 3.7) you need to compile PyTorch from source. Find the compute capability for your GPU and edit it below.
BUILD_MODE=include # Whether to build PyTorch from source.
CCA=3.5 # Compute capability.
[CUDA] If your Nvidia drivers are also old you may need to change the CUDA Toolkit version. See the compatibility matrix for compatible versions of the CUDA driver and CUDA Toolkit
CUDA_VERSION=11.3.1 # Must be compatible with hardware and CUDA driver.
CUDNN_VERSION=8 # Only major version specifications are available.
Build the docker image by running the following.
Set SERVICE=cuda
or SERVICE=cpu
for your desired option. The default is SERVICE=cuda
.
make build SERVICE=<cpu|cuda>
Then to use the environment, run
make exec SERVICE=<cpu|cuda>
and you'll be inside the container.
To run multiple instances of the container you can use
make run SERVICE=<cpu|cuda>
We provide pre-built AMD64 CUDA images on Docker Hub for easier reproducibility. Visit https://hub.docker.com/r/mlrc2022bet/bet-reproduction/tags to find a suitable Docker image.
To download and use the CUDA 11.2.2 image, follow these steps.
- Run
docker pull mlrc2022bet/bet-reproduction:cuda112-py38
on the host. - Run
docker tag mlrc2022bet/bet-reproduction:cuda112-py38 bet-user:cuda
to change the name of the image. This part is necessary due to the naming conventions in thedocker-compose.yaml
file. - Edit the
.env
file to set theUID, GID, USR, GRP, and IMAGE_NAME
values generated by themake env
command.
An issue with the Docker Hub images is that their User ID (UID) and Group ID (GID)
values have been fixed to the default values of UID=1000, GID=1000, USR=user, GRP=user
.
The .env
file must be edited to use the downloaded images.
An example of the resulting .env
file would be as follows:
# Make sure that `IMAGE_NAME` matches `bet-user:cuda` instead of the default value generated by `make env`.
IMAGE_NAME=bet-user # Do not add `:cuda` to the name.
PROJECT_ROOT=/opt/project
UID=1000
GID=1000
USR=user
GRP=user
# W&B configurations must be included manually.
WANDB_MODE=online
WANDB_API_KEY=<your_key>
- Run
make up
to start the container. - Run
make exec
to enter the container. - Run
sudo chown -R $(id -u):$(id -g) /opt/project
inside the container to change ownership of the project root. - After finishing training, run
sudo chown -R $(id -u):$(id -g) bet-reproduction
from the host to restore ownership to the host user.
As the MPS backend isn't supported by PyTorch on Docker, this methods relies on a local installation of conda
, thus
unfortunately limiting portability and reproducibility.
We provide an environment.yml
file adapted from the BeT's author's repo to be compatible with the M1 system.
Prerequisites:
conda
: which we recommend installing with miniforge.
Installation
conda env create --file=installation/osx-arm64/environment.yml
conda activate behavior-transformer
Set environment variables.
export PYTHONPATH=$PYTHONPATH:$(pwd)/relay-policy-learning/adept_envs
export ASSET_PATH=$(pwd)
export PYTHONDONTWRITEBYTECODE=1
export HYDRA_FULL_ERROR=1
We track our experiments with Weights and Biases. To use it, either
-
[Docker] Add your
wandb
API key to the.env
fileWANDB_MODE=online WANDB_API_KEY=<your_key>
then
make up SERVICE=<cpu|cuda>
. -
Or, run
export WANDB_MODE=online && wandb login
in the docker container, or in your custom environment.
Test your setup by running the default training and evaluation scripts in each of the environments.
Environment values (<env>
below) can be blockpush
, kitchen
, pointmass1
, or pointmass2
.
Training.
python train.py env=<env> experiment.num_prior_epochs=1
Evaluation.
Find the model you just trained in train_runs/train_<env>/<date>/<job_id>
.
Plug it in the command below.
python run_on_env.py env=<env> experiment.num_eval_eps=1 \
model.load_dir=$(pwd)/train_runs/train_<env>/<date>/<job_id>
We provide scripts to reproduce our training and evaluation results. For example,
to reproduce the Blockpush results with the configurations from the paper, run
sh reproducibility_scripts/paper_params/blockpush.sh
.
This will run a cross-validation training for 3 independent runs with evaluations for each of the runs.
Other ablations are also available in the reproducibility_scripts
directory.
Note that the directories in train_runs
and eval_runs
directories corresponding to each experiment should not exist before starting to prevent directory name clashes.
For example, the Blockpush experiments using the paper parameters will create
train_runs/train_blockpush/reproduction/paper_params
and
eval_runs/eval_blockpush/reproduction/paper_params
subdirectories.
The subdirectories inside reproduction
must be deleted before the corresponding run can be launched.
This may be necessary if the previous run was terminated before completion.
We provide model weights, their rollouts, and their evaluation metrics for all the experiments we ran. You can use these to reproduce our results at any stage of the pipeline. In addition, we share our Weights and Biases runs in this W&B project.
The scripts used to generate the models, their rollouts, and compute their evaluation metrics can be found in reproducibility_scripts/
Obtain the model weights, the rollouts, and the logs of the evaluations by downloading this file. Move it to the machine where you're running the code then extract it with
tar -xvf weights_rollouts_and_metrics.tar.gz
The configurations are stored in the configs/
directory and are subdivided into categories.
They are managed by Hydra
You can experiment with different configurations by passing the relevant flags.
You can get examples on how to do so in the reproducibility_scripts/
directory.