T-PAMI, 2024
Jianzong Wu *
.
Xiangtai Li *
·
Shilin Xu *
·
Haobo Yuan *
·
Henghui Ding
·
Yibo Yang
·
Xia Li
·
Jiangning Zhang
·
Yunhai Tong
·
Xudong Jiang
·
Bernard Ghanem
·
Dacheng Tao
·
This repo is used for recording, tracking, and benchmarking several recent open vocabulary methods to supplement our survey.
If you find any work missing or have any suggestions (papers, implementations, and other resources), feel free to pull requests.
We will add the missing papers to this repo as soon as possible.
[-] You are welcome to give us an issue or PR for your open vocabulary learning work !!!!!
[-] Note that: Due to the huge paper in Arxiv, we are sorry to cover all in our survey. You can directly present a PR into this repo and we will record it for next version update of our survey.
[-] Our survey will be updated in 2024.3.
[-] Our work is accepted by T-PAMI !!! 🔥🔥🔥
[-] We update GitHub to record the available paper by the end of 2024/1/10.
[-] We update GitHub to record the available paper by the end of 2023/7/20.
[1] The first survey for open vocabulary learning, including open vocabulary detection/segmentation/tracking.
[2] It also contains several related domains, including foundation model tuning and open-world detection.
[3] We list detailed results for the most representative works and give a fairer and clearer comparison of different approaches.
This survey presents the first detailed survey on open vocabulary tasks, including open-vocabulary object detection, open-vocabulary segmentation, and 3D/video open-vocabulary tasks.
- Introduction
- Summary of Contents
- Methods: A Survey
- Related Domains and Beyond
- Acknowledgement
- Contact
Keywords
cap.
: Use caption as auxiliary training datavlm.
: Use pretrained VLMs like CLIPpl.
: Generate pseudo labelsw/o ps.
: Training without pixel-level supervisionpre.
: Vision-language pretrainingdiff.
: Use diffusion modelsunify
: Unify several tasks (semantic segmentation, instance segmentation, and panoptic segmentation)sam
: Use SAM (Segment Anything Model)open.
: Demonstrated with open-set capability. (only for Video Understanding)audio.
: With audio modality.bench
: Propose a benchmark.other
: Other methods that cannot be grouped into above ones.no-train
: Does not need training.
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | unify. , vlm. |
Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation | Code |
2023 | CVPR | unify. , vlm. |
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation | Code |
2023 | arXiv | unify. , vlm. |
OpenSD: Unified Open-Vocabulary Segmentation and Detection | Code |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | vlm. |
Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation | Code |
2022 | CVPR | cap. , pl. , vlm. |
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling | Code |
2023 | CVPR | vlm , cap , w/o ps. |
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations | Code |
2023 | arXiv | cap. |
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Code |
2023 | arXiv | cap. |
Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation | N/A |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | unify. , vlm. |
Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation | Code |
2022 | arXiv | vlm |
Open-Vocabulary Panoptic Segmentation with MaskCLIP | N/A |
2023 | CVPR | diff , vlm |
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models | Code |
2023 | arXiv | vlm. |
Open-vocabulary Panoptic Segmentation with Embedding Modulation | N/A |
2023 | arXiv | vlm. , unify |
Hierarchical Open-vocabulary Universal Image Segmentation | Code |
2023 | arXiv | vlm. |
Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation | N/A |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | vlm. ,open. |
OVTrack: Open-Vocabulary Multiple Object Tracking | Project |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | ICCV | vlm. ,open. |
Towards Open-Vocabulary Video Instance Segmentation | Code |
2023 | arXiv | vlm. ,open. |
OpenVIS: Open-vocabulary Video Instance Segmentation | N/A |
2023 | arXiv | vlm. ,open. |
DVIS++: Improved Decoupled Framework for Universal Video Segmentation | Code |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2022 | CVPR | vlm. |
PointCLIP: Point Cloud Understanding by CLIP | Code |
2023 | CVPR | vlm. |
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding | Code |
2023 | ICCV | vlm. |
PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning | Code |
2023 | ICCV | vlm. |
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training | Code |
2023 | ICML | vlm. |
Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining | Code |
2024 | WACV | vlm. |
LidarCLIP or: How I Learned to Talk to Point Clouds | Code |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2022 | arXiv | vlm. |
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning | N/A |
2023 | CVPR | vlm. |
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation | Code |
2023 | NeurIPS | vlm. |
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection | Project |
2023 | arXiv | vlm. |
Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection | N/A |
2023 | arXiv | vlm. |
FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection | N/A |
2023 | arXiv | vlm. |
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection | N/A |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2023 | CVPR | vlm. |
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding | Code |
2023 | CVPR | vlm. |
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP | Code |
2023 | CVPR | vlm. |
OpenScene: 3D Scene Understanding with Open Vocabularies | Project |
2023 | ICCVW | vlm. |
CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP | N/A |
2023 | NeurIPS | vlm. |
OpenMask3D: Open-Vocabulary 3D Instance Segmentation | Project |
2023 | arXiv | vlm. |
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation | Project |
2023 | arXiv | vlm. |
Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance | Project |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2022 | RA-L | - | Learning Open-World Object Proposals without Learning to Classify | Code |
2021 | ICCV | - | Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation | Project |
2022 | CVPR | - | Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity | Project |
2022 | ECCV | - | Class-agnostic object detection with multi-modal transformer | Code |
2022 | TPAMI | - | Open World Entity Segmentation | Project |
2022 | arXiv | - | Fine-Grained Entity Segmentation | Project |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2015 | CVPR | - | Towards Open World Recognition | N/A |
2021 | CVPR | - | Towards Open World Object Detection. | Code |
2022 | CVPR | - | OW-DETR: Open-world Detection Transformer | Code |
2022 | ECCV | - | UC-OWOD: Unknown-Classified Open World Object Detection | Code |
2022 | arXiv | - | Revisiting Open World Object Detection | Code |
2022 | arXiv | - | Rectifying Open-set Object Detection: A Taxonomy, Practical Applications, and Proper Evaluation | [N/A] |
2022 | arXiv | - | Open World DETR: Transformer based Open World Object Detection | N/A |
2023 | CVPR | - | PROB: Probabilistic Objectness for Open World Object Detection | Code |
2023 | arXiv | - | Open World Object Detection in the Era of Foundation Models | Code |
2023 | arXiv | - | Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection | [N/A] |
Year | Venue | Keywords | Paper Title | Code/Project |
---|---|---|---|---|
2021 | CVPR | - | Exemplar-Based Open-Set Panoptic Segmentation Network | Project |
2022 | arXiv | - | Dual Decision Improves Open-Set Panoptic Segmentation | Code |
If you find our survey and repository useful for your research project, please consider citing our paper:
@article{wu2023open,
title={Towards Open Vocabulary Learning: A Survey},
author={Jianzong Wu and Xiangtai Li and Shilin Xu and Haobo Yuan and Henghui Ding and Yibo Yang and Xia Li and Jiangning Zhang and Yunhai Tong and Xudong Jiang and Bernard Ghanem and Dacheng Tao},
year={2024},
journal={T-PAMI},
}
jzwu@stu.pku.edu.cn
lxtpku@pku.edu.cn or xiangtai94@gmail.com