Highlight:

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Junyi Li^1* · Junfeng Wu^1* · Weizhi Zhao¹ · Song Bai² · Xiang Bai^1†

¹Huazhong University of Science and Technology ²Bytedance Inc.

*Equal Contribution †Corresponding Author

Highlight:

PartGLEE is accepted by ECCV2024!
PartGLEE is a part-level foundation model for locating and identifying both objects and parts in images.
PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario.
PartGLEE achieves SOTA performance across various part-level tasks and obtain competitive results on traditional object-level tasks.

We will release the following contents for PartGLEE:

Demo Code
[√] Model Zoo
[√] Comprehensive User Guide
[√] Training Code and Scripts
[√] Evaluation Code and Scripts

Getting started

Installation: Please refer to INSTALL.md for more details.
Data preparation: Please refer to DATA.md for more details.
Training: Please refer to TRAIN.md for more details.
Testing: Please refer to TEST.md for more details.
Model zoo: Please refer to MODEL_ZOO.md for more details.

Introduction

We present PartGLEE, a part-level foundation model for locating and identifying both objects and parts in images. Through a unified framework, PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario. Specifically, we propose a Q-Former to construct the hierarchical relationship between objects and parts, parsing every object into corresponding semantic parts.

PartGLEE is comprised of an image encoder, a Q-Former, two independent decoders and a text encoder. We propose a Q-Former to establish the hierarchical relationship between objects and parts. A set of parsing queries are initialized in the Q-Former to interact with each object query, parsing objects into their corresponding parts. This Q-Former functions as a decomposer, extracting and representing parts from object queries. Hence, by training jointly on extensive object-level datasets and limited hierarchical datasets which contain object-part correspondences, our Q-Former obtains strong generalization ability to parse any novel object into its consitute parts.

Datsets Unification

To facilitate training our Q-Former, we augment the original part-level datasets with object-level annotations to establish hierarchical correspondences. Specifically, we add object-level annotations to Pascal Part, PartImageNet, Pascal-Part-116, ADE-Part-234, in order to establish the hierarchical correspondence between objects and parts. We further introduce a subset of the open-world instance segmentation dataset SA-1B and augment it into a hierarchical dataset, thus further improving the generalization capability of our model.

Results

Hierarchical Cognitive Performance

Tradition Object-level Tasks

Generalization Performance

Cross Category Generalization Performance

Cross Dataset Generalization Performance

Visualization Results

Comparison with SAM

Visualization of Generalization Capability

Citing PartGLEE

@article{li2024partglee,
  title={PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects},
  author={Li, Junyi and Wu, Junfeng and Zhao, Weizhi and Bai, Song and Bai, Xiang},
  journal={arXiv preprint arXiv:2407.16696},
  year={2024}
}

Acknowledgments

Thanks GLEE for the implementation of multi-dataset training and data processing.
Thanks MaskDINO for providing a powerful detector and segmenter.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
configs		configs
datasets/visual_genome		datasets/visual_genome
demo		demo
detectron2		detectron2
dev		dev
docker		docker
docs		docs
projects/PartGLEE		projects/PartGLEE
tests		tests
tools		tools
.gitignore		.gitignore
README.md		README.md
launch.py		launch.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Highlight:

Getting started

Introduction

Datsets Unification

Results

Hierarchical Cognitive Performance

Tradition Object-level Tasks

Generalization Performance

Cross Category Generalization Performance

Cross Dataset Generalization Performance

Visualization Results

Comparison with SAM

Visualization of Generalization Capability

Citing PartGLEE

Acknowledgments

About

Releases

Packages

Languages

ProvenceStar/PartGLEE

Folders and files

Latest commit

History

Repository files navigation

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Highlight:

Getting started

Introduction

Datsets Unification

Results

Hierarchical Cognitive Performance

Tradition Object-level Tasks

Generalization Performance

Cross Category Generalization Performance

Cross Dataset Generalization Performance

Visualization Results

Comparison with SAM

Visualization of Generalization Capability

Citing PartGLEE

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages