Skip to content

Latest commit

 

History

History
275 lines (235 loc) · 12.4 KB

README.md

File metadata and controls

275 lines (235 loc) · 12.4 KB

OPDMulti: Openable Part Detection for Multiple Objects

Xiaohao Sun*, Hanxiao Jiang*, Manolis Savva, Angel Xuan Chang

Overview

This repository contains the implementation of OPDFormer based methods for the new proposed OPDMulti task and corresponding dataset. The code is based on Detectron2 and OPD. And the OPDFormer models were built on Mask2Former.

arXiv  Website  Demo

Content

Setup

The implementation has been tested on Ubuntu 20.04, with Python 3.7, PyTorch 1.10.1, CUDA 11.1.1 and CUDNN 8.2.0.

  • Clone the repository
git clone git@github.com:3dlg-hcvc/OPDMulti.git
  • Setup python environment to train the model
conda create -n opdmulti python=3.7 
conda activate opdmulti

pip install -r requirements.txt

cd opdformer/mask2former/modeling/pixel_decoder/ops
python setup.py build install

Dataset

Download our OPDMulti dataset (7.2G) and extract it inside ./dataset/ folder. Make sure the data is in this format. You can follow these steps if you want to convert your data to OPDMulti dataset. To try our model on OPDSynth and OPDReal datasets, download the data from OPD repository.

Training

To train from the scratch, you can use the below commands. The output will include evaluation results on the val set.

cd opdformer
python train.py \
--config-file <MODEL_CONFIG> \
--output-dir <OUTPUT_DIR> \
--data-path <PATH_TO_DATASET> \
--input-format <RGB/depth/RGBD> \
--model_attr_path <PATH_TO_ATTR> 
  • <MODEL_CONFIG>: the config file path for different model variants can be found in the table OPDMulti "Model Name" column.

  • Dataset:

    • --data-path OPDMulti/MotionDataset_h5
    • --model_attr_path: OPDMulti/obj_info.json
  • You can add the following command to use the model weights, pretrained on OPDReal dataset. We finetune this model on OPDMulti dataset:

    --opts MODEL.WEIGHTS <PPRETRAINED_MODEL>

Evaluation

To evaluate, use the following command:

python evaluate_on_log.py \
--config-file <MODEL_CONFIG> \
--output-dir <OUTPUT_DIR> \
--data-path <PATH_TO_DATASET> \
--input-format <RGB/depth/RGBD> \
--model_attr_path <PATH_TO_ATTR> \
--opts MODEL.WEIGHTS <PPRETRAINED_MODEL>
  • Evaluate on test set: --opts MODEL.WEIGHTS <PPRETRAINED_MODEL> DATASETS.TEST "('MotionNet_test',)".
  • To evaluate directly on pre-saved inference file, pass the file path as an argument --inference-file <PATH_TO_INFERENCE_FILE>.

Pretrained-Models

You can download our pretrained model weights (on both OPDReal and OPDMulti) for different input format (RGB, RGB-D, depth) from the following table.

For model evaluation, download pretrained weights from the OPDMulti column. To finetune with custom data, use pretrained weights from OPDReal column, which are also utilized in OPDMulti results.

How to read the table

The "Model Name" column contains a link to the config file. "PSeg" is the part segmentation score, "+M" adds motion type prediction, "+MA" includes axis prediction, and "+MAO" further incorporates origin prediction.

To train/evaluate the different model variants, change the --config-file /path/to/config/name.yaml in the training/evaluation command.

OPDMulti

Model Name Input PSeg +M +MA +MAO OPDMulti Model OPDReal Model
OPDFormer-C RGB 29.1 28.0 13.5 12.3 model(169M) model(169M)
OPDFormer-O RGB 27.8 26.3 5.0 1.5 model(175M) model(175M)
OPDFormer-P RGB 31.4 30.4 18.9 15.1 model(169M) model(169M)
OPDFormer-C depth 20.9 18.9 11.4 10.1 model(169M) model(169M)
OPDFormer-O depth 23.4 21.5 5.9 1.9 model(175M) model(175M)
OPDFormer-P depth 21.7 19.8 15.4 13.5 model(169M) model(169M)
OPDFormer-C RGBD 24.2 22.7 14.1 13.4 model(169M) model(169M)
OPDFormer-O RGBD 23.1 21.2 6.7 2.6 model(175M) model(175M)
OPDFormer-P RGBD 27.4 25.5 18.1 16.7 model(169M) model(169M)

Visualization

The visualization code is based on OPD repository. We only support visualization based on raw dataset format (download link (5.0G)).

And the visualization uses the inference file, which can be obtained after the evaluation.

  • Visualize the GT with 1000 random images in val set
    cd opdformer
    python render_gt.py \
    --output-dir vis_output \
    --data-path <PATH_TO_DATASET> \
    --valid-image <IMAGE_LIST_FILE> \
    --is-real
  • Visualize the PREDICTION with 1000 random images in val set
    cd opdformer
    python render_pred.py \
    --output-dir vis_output \
    --data-path <PATH_TO_DATASET> \
    --model_attr_path <PATH_TO_ATTR> \
    --valid-image <IMAGE_LIST_FILE> \
    --inference-file <PATH_TO_INFERENCE_FILE> \
    --score-threshold 0.8 \
    --update-all \
    --is-real
    • --data-path dataset/MotionDataset
    • --valid_image dataset/MotionDataset/valid_1000.json

Citation

If you find this code useful, please consider citing:

@article{sun2023opdmulti,
  title={OPDMulti: Openable Part Detection for Multiple Objects},
  author={Sun, Xiaohao and Jiang, Hanxiao and Savva, Manolis and Chang, Angel Xuan},
  journal={arXiv preprint arXiv:2303.14087},
  year={2023}
}

@article{mao2022multiscan,
  title={MultiScan: Scalable RGBD scanning for 3D environments with articulated objects},
  author={Mao, Yongsen and Zhang, Yiming and Jiang, Hanxiao and Chang, Angel and Savva, Manolis},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={9058--9071},
  year={2022}
}

@inproceedings{jiang2022opd,
  title={OPD: Single-view 3D openable part detection},
  author={Jiang, Hanxiao and Mao, Yongsen and Savva, Manolis and Chang, Angel X},
  booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXIX},
  pages={410--426},
  year={2022},
  organization={Springer}
}

@inproceedings{cheng2022masked,
  title={Masked-attention mask transformer for universal image segmentation},
  author={Cheng, Bowen and Misra, Ishan and Schwing, Alexander G and Kirillov, Alexander and Girdhar, Rohit},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={1290--1299},
  year={2022}
}

Acknowledgement

This work was funded in part by a Canada CIFAR AI Chair, a Canada Research Chair and NSERC Discovery Grant, and enabled in part by support from WestGrid and Compute Canada. We thank Yongsen Mao for helping us with the data processing procedure. We also thank Jiayi Liu, Sonia Raychaudhuri, Ning Wang, Yiming Zhang for feedback on paper drafts.