2024.06. Available on arxiv.
2024.10. Accepted at WACV 2025.
The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching. By focusing on obtaining patch-level embeddings by DINOV2 backbone and establishing patch-to-patch correspondences, our approach uses a voting mechanism to assess overlap scores for potential database images, thereby providing a nuanced image retrieval metric in challenging scenarios.
torch == 2.3.1
Python == 3.10.13
OpenCV == 4.10.0.84
OmegaConf == 2.3.0
h5py == 3.11.0
tqdm == 4.66.4
faiss-gpu == 1.7.2
lightglue
hloc
Try the proposed VOP on one example image pair and visualize their matched patches.
Step 1. Preprocess the test data and GT information (e.g., camera parameters R, K) if available. Load images and run the frozen DINOv2 on them, then, save the [CLS] tokens and patch embeddings.
Step 2. Load the best checkpoint and use the trained encoder on the test set. Do retrieval and save a list of images with high overlaps.
Step 3. Evaluate the retrieval results by running Relative pose estimation or localization.
Here are the instructions for the test sets used in the paper. The best checkpoint is downloaded automatically.
💥 important: before data preprocessing, create/update an original dirs for the specific dataset in dump_datasets/data_dirs.yaml.
dataset_dirs:
inloc:<src_path>
[Megadepth]
-
Download the data from glue-factory including images and scene_info.
-
Data preprocess and top-1/5/10 retrieval.
python dump_data.py -ds megadepth
python register.py -k 5 -m best -pre 20 -ds megadepth
- Relative pose estimation using RANSAC.
python relative_pose.py -k 5 -m best -pre 20 -ds megadepth
[ETH3D]
- Download ETH3D (5.6G).
- Data preprocess and top-1/5/10 retrieval.
python dump_data.py -ds eth3d
python register.py -k 5 -m best -pre 20 -ds eth3d
- Relative pose estimation using RANSAC.
python relative_pose.py -k 5 -m best -pre 20 -ds eth3d
[Inloc]
- Download the DB images and format the data to database/cutouts/; download the queries into query/iphone7/.
- Data preprocess and top-40 retrieval.
python dump_data.py -ds inloc
python retrieve.py -ds inloc -k 40 -m best -pre 100
- Install and run hloc for localization.
python inloc_localization.py --loc_pairs outputs/inloc/best/cls_100/top40_overlap_pairs.txt -m best -ds inloc -out output_local
- Submit the result poses to the long-term visual localization benchmark.
[Customized data]
-
Add the path of the custom data in data_dirs.yaml, and creat a dump script into here to load images and GT pose information if needed and available.
-
Run retrieve.py to find overlapping DB images for the queries or register.py to search overlapping images for each image in the pool.
-
Run the evaluation of relative pose estimation or localization, or use the saved retrieved pairs somewhere else as you want.
Step 1. Download GT depths of Megadepth to for training supervision from here.
Step 2. Customize the configs and start training based on glue-factory. Here we provided a default config with fixed positive/negative image pairs saved (fast) and random positive/negative pairs in this config (slow).
python -m gluefactory.train best_easy_retrain --conf train_configs/best_easy.yaml
Note that the easy version requires prepared labels, pls download it from train and validation.
Important configs:
data:
data_dir: ""
info_dir: ""
# choose the data augmentation type: 'flip, dark, lighglue'
photometric: {
"name": "flip",
"p": 0.95,
# 'difficulty': 1.0, # currently unused
}
gt_label_path: ""
model:
matcher:
name: overlap_predictor # our model
input_dim: 1024 # the dimension of the pretrained DINOv2 features
embedding_dim: 256 # projected embedding dim
dropout_prob: 0.5 # dropout probability
[Useful configs]
--model, name of the loaded model.
--k, top-k retrievals.
--radius, default=-1, compute the median similarity over 100 random samples as the radius threshold.
--cls, default=True, action True, whether CLS tokens (prefilter) is used.
--pre_filter, default=20, shortlist length.
--weighted, default=True, action True, whether to use TF-IDF weights for voting.
--overwrite, default=False, action True.
--conf, config path used for training.