This repository provides the official implementation of the Human Guided Exploration (HuGE) algorithm, as proposed in Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-loop feedback The manuscript is available on arXiv. See the project page
If you use this codebase, please cite
Marcel Torne, Max Balsells, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta. Breadcrumbs to the goal: Goal-Conditioned Exploration from Human-in-the-loop feedback.
@misc{torne2023breadcrumbs,
title={Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback},
author={Marcel Torne and Max Balsells and Zihan Wang and Samedh Desai and Tao Chen and Pulkit Agrawal and Abhishek Gupta},
year={2023},
eprint={2307.11049},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Download the MuJoCo binaries for Linux
Extract the downloaded mujoco200
directory into ~/.mujoco/mujoco200
.
If you want to specify a nonstandard location for the package, use the env variable MUJOCO_PY_MUJOCO_PATH
.
git clone git@github.com:Improbable-AI/human-guided-exploration.git
cd human-guided-exploration
conda env create -f environment.yml
conda activate huge
conda develop dependencies
conda develop dependencies/lexa_benchmark
conda develop dependencies/ravens
See the Troubleshooting section if you are having any issues
python launch_main.py --env_name pointmass_rooms --method huge
- huge: official implementation using synthetic human feedback (see section TODO for running HuGE from real human feedback), the synthetic human feedback is generated from reward functions (useful for analysis).
- oracle: same algorithm as HuGE but directly querying the reward function for selecting the closest goal instead of learning a goal selector from human feedback.
- gcsl: implementation of Goal-Conditioned Supervised Learning (GCSL) Baseline. [1]
- bandu: Object assembly task, using a Ur5 with a suction gripper it needs to assemble a very specific castle-like structure. Simulated using pybullet and code inspired from ravens benchmark [2].
- block_stacking: Object assembly task, using a Ur5 with a suction gripper it needs to stack three blocks. Simulated using pybullet and code inspired from ravens benchmark [2].
- kitchenSeq: long-horizon arm manipulation task, Sawyer arm needs to open the slider, microwave and cabinet sequentially to succeed. Simulated using MuJoCo and code inspired from lexa-benchmark [3].
- pusher_hard: object manipulation task, moving puck around walls to reach a goal using a Sawyer arm, simulated using MuJoCo and code inspired from GCSL [1].
- complex_maze: long-horizon 2D navigation task, simulated using MuJoCo and code inspired from GCSL [1].
- pointmass_rooms: simple 2D navigation task, simulated using MuJoCo and code inspired from GCSL [1].
We designed an interface (see below) to collect labels from humans and integrated it with our HuGE algorithm. Next, we provide the instructions to launch the interface and train policies from human feedback using HuGE.
First, launch the backend. HuGE will be running on this thread and listening for Human Feedback coming from our interface. This backend is using FastAPI.
ENV_NAME=${env_name} uvicorn launch_huge_human:app --host 0.0.0.0
Second, launch the frontend. We designed an interface using ReactJS. This will keep presenting the user with two images of achieved states during training and will ask the user to select which one of the two is closer to achieving the target goal. This interface will keep sending the answers to the backend, which will asynchronously train the goal selector as more labels are received. We prepared a docker container to hold and run the interface. Proceed, to launch the frontend:
cd interface/frontend
make
make run
You should be able to see the interface on port 80 of the machine you are running the interface at. For example, http://localhost:80
By default, we are running everything in the localhost. However, if you want to run crowdsourcing experiments with annotators from all over the world without needing direct access to your physical machine, we allow you to do that and next we show you how to do it.
First, change the url of your backend in interface/frontend/src/App.js line 129
You should substitute:
const base = "http://localhost:8000"
for the public IP adress corresponding to the machine you are running your code at.
Then do as before,
cd interface/frontend
make
make run
You should be able to see the interface on port 80 of the machine you are running the interface at: http://${IP_ADDRESS_INTERFACE}:80
The GymGoalEnvWrapper class is defined at huge/envs/gymenv_wrapper.py
.
We provide an example of a simple environment wrapped under this class in huge/envs/simple_example.py
Next, you must name and add your environment on the creat_env
function in huge/envs/__init__.py
Add an entry corresponding to your new environment on the config.yaml
file for specifying custom parameters that you want to change different from the default ones.
If you get any errors like the following:
ImportError: $CONDA_PATH/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6: version `GLIBCXX_3.4
.29' not found (required by /lib/x86_64-linux-gnu/libOSMesa.so.8)
delete the libstdc++.so.6
file:
rm $CONDA_PATH/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6
If you get the following error:
ImportError: cannot import name 'ParamSpec'
do the following:
pip uninstall typing_extensions
pip uninstall fastapi
pip install --no-cache fastapi
The directory structure currently looks like this:
- huge (Contains all code)
- envs (Contains all environment files and wrappers)
- algo (Contains all HuGE code)
- huge.py (implements high-level algorithm logic, e.g. data collection, policy update, evaluate, save data)
- buffer.py (The replay buffer used to relabel and sample (s,g,a,h) tuples
- networks.py (Implements neural network policies.)
- variants.py (Contains relevant hyperparameters for HuGE)
- baselines (Contains implementations of the baselines presented in the paper)
- doodad (We require this old version of doodad)
- dependencies (Contains other libraries like rlkit, rlutil, room_world, multiworld, etc.)
Please file an issue if you have trouble running this code.
[1] D. Ghosh, A. Gupta, J. Fu, A. Reddy, C. Devin, B. Eysenbach, and S. Levine. Learning to reach goals without reinforcement learning. CoRR, abs/1912.06088, 2019
[2] A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, and J. Lee. Transporter networks: Rearranging the visual world for robotic manipulation. Conference on Robot Learning (CoRL), 2020.
[3] R. Mendonca, O. Rybkin, K. Daniilidis, D. Hafner, and D. Pathak. Discovering and achieving goals via world models. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 24379–24391, 2021