[ICRA 2024] Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation
Official code for the ICRA 2024 paper "Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation".
Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D point clouds, leveraging knowledge distillation and text-point correlation. Our approach employs pre-trained 3D models through knowledge distillation to enhance feature extraction and semantic understanding in 3D point clouds. We further introduce a new text-point correlation method to learn the semantic links between point cloud features and open-vocabulary labels. The intensive experiments show that our approach outperforms previous works and adapts to new affordance labels and unseen objects. Notably, our method achieves the improvement of 7.96% mIOU score compared to the baselines. Furthermore, it offers real-time inference which is well-suitable for robotic manipulation applications.
We introduce a new open-vocabulary affordance detection method in 3D point clouds, leveraging knowledge distillation and text-point correlation.Our approach employs pre-trained 3D models through knowledge distillation to enhance feature extraction and semantic understanding in 3D point clouds. We further introduce a new text-point correlation method to learn the semantic links between point cloud features and open-vocabulary labels.
We strongly encourage you to create a separate CONDA environment.
conda create -n openad python=3.8
conda activate openad
conda install pytorch pytorch-cuda=11.6 -c pytorch -c nvidia
conda install pip
pip install -r requirements.txt
Download data from this drive folder.
Currently, we support 2 models (OpenAD with backbones of PointNet++ and DGCNN) and 2 settings (full-shape and partial-view).
Please train the model on a single GPU for the best performance. Below are the steps for training the model with PointNet++ backbone on the full-shape setting, those of other combinations are equivalent.
-
Dowloaed the techer pretrained-weight at here: https://drive.google.com/drive/folders/1doi0Ye1HPg5sALkcbkfyBeQA6f5NYNYq?usp=sharing
-
In
config/openad_pn2/full_shape_cfg.py
, change the value ofdata_root
to your downloaded data folder, and change the path to class weights to the path of the filefull_shape_weights.npy
(contained in the data folder). -
Assume you use the GPU 0, then run the following command to start training:
CUDA_VISIBLE_DEVICES=0 python3 train.py --config ./config/openad_pn2/full_shape_cfg.py --config_teacher './config/teacher/estimation_cfg.py' --checkpoint_teacher <path to your checkpoint teacher model> --work_dir ./log/openad_pn2/OPENAD_PN2_FULL_SHAPE_Release/ --gpu 0
Where
<path to your checkpoint teacher model>
is your dowloaed the pretrained teacher model.
The followings are steps for open-vocabulary testing a trained model with PointNet++ backbone on the full-shape setting, those of other combinations are equivalent.
-
Change the value of
data_root
inconfig/openad_pn2/full_shape_open_vocab_cfg.py
to your downloaded data folder. -
Run the following command:
CUDA_VISIBLE_DEVICES=0 python3 test_open_vocab.py --config ./config/openad_pn2/full_shape_open_vocab_cfg.py --checkpoint <path to your checkpoint model> --gpu 0
Where
<path to your checkpoint model>
is your traiend model.
The code for the visualize should be update at './visualization_scrips'
If you find our work useful for your research, please cite:
@inproceedings{van2023open,
title={Open-vocabulary affordance detection using knowledge distillation and text-point correlation},
author={Van Vo, Tuan and Vu, Minh Nhat and Huang, Baoru and Nguyen, Toan and Le, Ngan and Vo, Thieu and Nguyen, Anh},
booktitle= ICRA,
year={2024}
}