Breaking the Frame: Visual Place Recognition by Overlap Prediction

Updates

2024.06. Available on arxiv.

2024.10. Accepted at WACV 2025.

Summary

The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching. By focusing on obtaining patch-level embeddings by DINOV2 backbone and establishing patch-to-patch correspondences, our approach uses a voting mechanism to assess overlap scores for potential database images, thereby providing a nuanced image retrieval metric in challenging scenarios.

Installation

torch == 2.3.1
Python == 3.10.13
OpenCV == 4.10.0.84
OmegaConf == 2.3.0
h5py == 3.11.0
tqdm == 4.66.4
faiss-gpu == 1.7.2
lightglue
hloc

Demo

Try the proposed VOP on one example image pair and visualize their matched patches.

Evaluation

Step 1. Preprocess the test data and GT information (e.g., camera parameters R, K) if available. Load images and run the frozen DINOv2 on them, then, save the [CLS] tokens and patch embeddings.

Step 2. Load the best checkpoint and use the trained encoder on the test set. Do retrieval and save a list of images with high overlaps.

Step 3. Evaluate the retrieval results by running Relative pose estimation or localization.

Here are the instructions for the test sets used in the paper. The best checkpoint is downloaded automatically.

💥 important: before data preprocessing, create/update an original dirs for the specific dataset in dump_datasets/data_dirs.yaml.

dataset_dirs:
  inloc:<src_path>

[Megadepth]

Download the data from glue-factory including images and scene_info.
Data preprocess and top-1/5/10 retrieval.

python dump_data.py -ds megadepth
python register.py -k 5 -m best -pre 20 -ds megadepth

Relative pose estimation using RANSAC.

python relative_pose.py -k 5 -m best -pre 20 -ds megadepth

[ETH3D]

Download ETH3D (5.6G).
Data preprocess and top-1/5/10 retrieval.

python dump_data.py -ds eth3d
python register.py -k 5 -m best -pre 20 -ds eth3d

Relative pose estimation using RANSAC.

python relative_pose.py -k 5 -m best -pre 20 -ds eth3d

[Inloc]

Download the DB images and format the data to database/cutouts/; download the queries into query/iphone7/.
Data preprocess and top-40 retrieval.

python dump_data.py -ds inloc
python retrieve.py -ds inloc -k 40 -m best -pre 100

Install and run hloc for localization.

python inloc_localization.py --loc_pairs outputs/inloc/best/cls_100/top40_overlap_pairs.txt -m best -ds inloc -out output_local

Submit the result poses to the long-term visual localization benchmark.

[Customized data]

Add the path of the custom data in data_dirs.yaml, and creat a dump script into here to load images and GT pose information if needed and available.
Run retrieve.py to find overlapping DB images for the queries or register.py to search overlapping images for each image in the pool.
Run the evaluation of relative pose estimation or localization, or use the saved retrieved pairs somewhere else as you want.

Training

Step 1. Download GT depths of Megadepth to for training supervision from here.

Step 2. Customize the configs and start training based on glue-factory. Here we provided a default config with fixed positive/negative image pairs saved (fast) and random positive/negative pairs in this config (slow).

python -m gluefactory.train best_easy_retrain --conf train_configs/best_easy.yaml

Note that the easy version requires prepared labels, pls download it from train and validation.

Important configs:

data:
    data_dir: ""
    info_dir: ""
    # choose the data augmentation type: 'flip, dark, lighglue'
    photometric: {
            "name": "flip",
           "p": 0.95,
            # 'difficulty': 1.0,  # currently unused
       }
    gt_label_path: ""


model:
    matcher:
        name: overlap_predictor # our model
        input_dim: 1024 # the dimension of the pretrained DINOv2 features
        embedding_dim: 256 # projected embedding dim
        dropout_prob: 0.5    # dropout probability

Notes

[Useful configs]

--model, name of the loaded model.
--k, top-k retrievals.
--radius, default=-1, compute the median similarity over 100 random samples as the radius threshold.
--cls, default=True, action True, whether CLS tokens (prefilter) is used.
--pre_filter, default=20, shortlist length.
--weighted, default=True, action True, whether to use TF-IDF weights for voting.
--overwrite, default=False, action True.
--conf, config path used for training.

[Acknowledgement]

glue-factory

long-term visual localization benchmark

pre-commit

[Contact]

Contact me at weitongln@gmail.com or weitong@fel.cvut.cz.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dump_datasets		dump_datasets
gluefactory		gluefactory
train_configs		train_configs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
args.py		args.py
datasets.py		datasets.py
demo.ipynb		demo.ipynb
dump_data.py		dump_data.py
evaluate_utils.py		evaluate_utils.py
inloc_localization.py		inloc_localization.py
register.py		register.py
relative_pose.py		relative_pose.py
retrieve.py		retrieve.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breaking the Frame: Visual Place Recognition by Overlap Prediction

Updates

Summary

Installation

Demo

Evaluation

Training

Notes

About

Releases

Packages

Languages

weitong8591/vop

Folders and files

Latest commit

History

Repository files navigation

Breaking the Frame: Visual Place Recognition by Overlap Prediction

Updates

Summary

Installation

Demo

Evaluation

Training

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages