The codebase for the corresponding paper (as titled).
INQUIRE is the first Interactive Robot Learning algorithm to implement (and optimize over) a generalized representation of information gain across multiple interaction types. It can dynamically optimize its interaction type (and respective optimal query) based on its current learning status and the robot's state in the world, and users can bias its selection of interaction types via customizable cost metrics.
See the paper for more details.
Fitzgerald, T., Koppol, P., Callaghan, P., Wong, R., Kroemer, O., Simmons, R., Admoni, H. “INQUIRE: INteractive Querying for User-Aware Informative REasoning”. In Sixth Conference on Robot Learning (CoRL). PMLR, 2022.
Note: This repo was built using Python 3.10.2 and a conda virtual environment.
Install Conda by following the instructions according to your machine's specifications.
From INQUIRE's top-level directory, run:
conda deactivate
conda env create -n inquire --file environment.yml
conda activate inquire
pip install -e .
From INQUIRE's top-level directory, run:
conda install -c conda-forge swig
pip install -e .
python scripts/corl22.py
python scripts/corl22.py --domain <environment name>
e.g.
python scripts/corl22.py --domain lander
From the top-level directory, run:
bash scripts/run_inquire.sh lander
Substitute for other domains such as linear_combo
, linear_system
, or pizza
.
Note: this script assumes you have already run the cache script.
python scripts/corl22.py --domain lander --queries 2 --runs 1 --tests 1
including experiment parameters, domain and agent specification, and other algorithmic nuance.
python viz/viz_weights.py --domain lander --weights "0.55 0.55, 0.41, 0.48" -I 1000
python viz/visualize_lunar_lander_control.py </relative/path/to/stored/trajectory.csv>
For faster testing, you can cache a set of trajectory samples ahead of time. For example:
python scripts/cache_trajectories.py --domain lander -Z 1 -X 250 -N 1000
This will cache 1000 trajectories for each of 250 states for 1 task (i.e., 1 ground-truth weight vector). Make sure that X >= [(number of runs * number of queries) + number of test states] and that N >= the number of trajectory samples used during evaluations.
You can then use this cache by adding the --use_cache
command-line argument.