Iterative Vision-and-Language Navigation (IVLN part)

Jacob Krantz*, Shurjo Banerjee*, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, and Jesse Thomason

[Paper] [Project Page] [GitHub (IVLN part)] [GitHub (IVLN-CE part)]

This is the official implementation of Iterative Vision-and-Language Navigation (IVLN) in discrete environments, a paradigm for evaluating language-guided agents navigating in a persistent environment over time. Existing Vision-and-Language Navigation (VLN) benchmarks erase the agent’s memory at the beginning of every episode, testing the ability to perform cold-start navigation with no prior information. However, deployed robots occupy the same environment for long periods of time. The IVLN paradigm addresses this disparity by training and evaluating VLN agents that maintain memory across tours of scenes that consist of up to 100 ordered instruction-following Room-to-Room (R2R) episodes each defined by an individual language instruction and a target path. This repository implements the Iterative Room-to-Room (IR2R) benchmark.

Installation

Install requirements:

conda create --name vlnhamt python=3.8.5
conda activate vlnhamt
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Download R2R data from Dropbox, including processed annotations, features and pretrained models. Put the data under datasets directory.
Download Matterport 3D adjacency maps and angle features from Dropbox. Put the files under datasets directory.
Download the tour files for the original VLN-R2R and the prevalent augmented data from GDrive. Put the files under the iterative-vln directory. The directory structure after this will be like

iterative-vln
|___finetune_src
|___datasets
|   |___R2R
|   |___total_adj_list.json
|   |___angle_feature.npy
|   |___...
|___tours_iVLN.json
|___tours_iVLN_prevalent.json
|___...

Run the HAMT baseline

HAMT Teacher-forcing IL

cd finetune_src
bash scripts/run_r2r_il.sh

Run the TourHAMT model

cd finetune_src
bash scripts/iter_train_sep_hist_weight.sh

Model architecture:

TourHAMT Variations

cd finetune_src
bash scripts/iter_train.sh                  # extended memory with prev episodes only
bash scripts/iter_train_hist.sh             # train hist encoder
bash scripts/iter_train_sep_hist.sh         # with prev hist identifier and train hist encoder

Citation

If you find this work useful, please consider citing:

@article{krantz2022iterative,
    title = "Iterative Vision-and-Language Navigation",
    author = "Krantz, Jacob and Banerjee, Shurjo and Zhu, Wang and Corso, Jason and Anderson, Peter and Lee, Stefan and Thomason, Jesse",
    year = "2022",
    publisher = "arXiv",
    url = {https://arxiv.org/abs/2210.03087},
}

Acknowledgement

Some of the codes are built upon HAMT, pytorch-image-models, UNITER and Recurrent-VLN-BERT. Thanks them for their great works!

License

This codebase is MIT licensed. Trained models and task datasets are considered data derived from the mp3d scene dataset. Matterport3D based task datasets and trained models are distributed with Matterport3D Terms of Use and under CC BY-NC-SA 3.0 US license.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
files		files
finetune_src		finetune_src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iterative Vision-and-Language Navigation (IVLN part)

Installation

Run the HAMT baseline

Run the TourHAMT model

TourHAMT Variations

Citation

Acknowledgement

License

About

Releases

Packages

Contributors 3

Languages

License

Bill1235813/IVLN

Folders and files

Latest commit

History

Repository files navigation

Iterative Vision-and-Language Navigation (IVLN part)

Installation

Run the HAMT baseline

Run the TourHAMT model

TourHAMT Variations

Citation

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages