This is the official source code for the IROS2023 oral work: Depth-based Object 6DoF Pose Estimation using Swin Transformers. (https://arxiv.org/abs/2303.02133).
- SwinDePose
Update!!! Uploaded the information of the robot that we integrated with our model for testing in a real-world environment for object grasping.
Before the representation learning stage of SwinDePose, we build normal vector angles image generation module to generate normal vector angles images from depth images. Besides, depth images are lifted to point clouds by camera intrinsic parameters K. Then, the normal vector angles images and point clouds are fed into images and point clouds feature extraction networks to learn representations. Moreover, the learned embeddings from normal vector angles images and point clouds are fed into 3D keypoints localization module and instance segmentation module. Finally, a least-squares fitting manner is applied to estimate 6D poses.
If you find SwinDePose useful in your research, please consider citing:
@inproceedings{Li2023Depthbased6O,
title={Depth-based 6DoF Object Pose Estimation using Swin Transformer},
author={Zhujun Li and Ioannis Stamos},
year={2023}
}
- Install conda environment from conda environment.yml (it might take a while, and don't forget changing the prefix in the end of environment.yml file)
conda env create -f swin_de_pose/environment.yml
- Activate our swin-pose conda environment
conda activate lab-swin
- Install mmseg within conda
pip install -r swin_de_pose/mmseg_install.txt
- Following normalSpeed to install normalSpeed within conda
pip3 install "pybind11[global]"
git clone https://github.com/hfutcgncas/normalSpeed.git
cd normalSpeed
python3 setup.py install --user
- Install some neccessary package
cd models/RandLA
sh compile_op.sh
Due to lacking of apex installation, you may have to delete all apex related modules and functions.
- To infer pose for ROS manipulation system,
sh scripts/test_single_lab.sh
- Pull docker image from docker hub
docker pull zhujunli/swin-pose:latest
- Run our swin-pose docker
sudo nvidia-docker run --gpus all --ipc=host --shm-size 50G --ulimit memlock=-1 --name your_docker_environment_name -it --rm -v your_workspace_directory:/workspace zhujunli/swin-pose:latest
- Install mmseg within docker
pip install -r swin_de_pose/mmseg_install.txt
- Install some neccessary package
cd models/RandLA
sh compile_op.sh
[Click to expand]
- swin_de_pose
- swin_de_pose/apps
- swin_de_pose/apps/train_lm.py: Training & Evaluating code of SwinDePose models for the LineMOD dataset.
- swin_de_pose/apps/train_occlm.py: Training & Evaluating code of SwinDePose models for the Occ-LineMOD dataset.
- swin_de_pose/config
- swin_de_pose/config/common.py: Some network and datasets settings for experiments.
- swin_de_pose/config/options.py: Training and evaluating parameters settings for experiments.
- swin_de_pose/scripts
- swin_de_pose/scripts/train_lm.sh: Bash scripts to start the traing on the LineMOD dataset.
- swin_de_pose/scripts/test_lm.sh: Bash scripts to start the testing on the LineMOD dataset.
- swin_de_pose/scripts/train_occlm.sh: Bash scripts to start the training on the Occ-LineMOD dataset.
- swin_de_pose/scripts/test_occlm.sh: Bash scripts to start the testing on the Occ-LineMOD dataset.
- swin_de_pose/datasets
- swin_de_pose/datasets/linemod/
- swin_de_pose/datasets/linemod/linemod_dataset.py: Data loader for LineMOD dataset.
- swin_de_pose/datasets/linemod/create_angle_npy.py: Generate normal vector angles images for real scene Linemod datset.
- swin_de_pose/datasets/occ_linemod
- swin_de_pose/datasets/occ_linemod/occ_dataset.py: Data loader for Occ-LineMOD dataset.
- swin_de_pose/datasets/occ_linemod/create_angle_npy.py:Generate normal vector angles images for Occ-Linemod datset.
- swin_de_pose/datasets/linemod/
- swin_de_pose/mmsegmentation: packages of swin-transformer.
- swin_de_pose/models
- swin_de_pose/models/SwinDePose.py: Network architecture of the proposed SwinDePose.
- swin_de_pose/models/cnn
- swin_de_pose/models/cnn/extractors.py: Resnet backbones.
- swin_de_pose/models/cnn/pspnet.py: PSPNet decoder.
- swin_de_pose/models/cnn/ResNet_pretrained_mdl: Resnet pretraiend model weights.
- swin_de_pose/models/loss.py: loss calculation for training of FFB6D model.
- swin_de_pose/models/pytorch_utils.py: pytorch basic network modules.
- swin_de_pose/models/RandLA/: pytorch version of RandLA-Net from RandLA-Net-pytorch
- swin_de_pose/utils
- swin_de_pose/utils/basic_utils.py: basic functions for data processing, visualization and so on.
- swin_de_pose/utils/meanshift_pytorch.py: pytorch version of meanshift algorithm for 3D center point and keypoints voting.
- swin_de_pose/utils/pvn3d_eval_utils_kpls.py: Object pose esitimation from predicted center/keypoints offset and evaluation metrics.
- swin_de_pose/utils/ip_basic: Image Processing for Basic Depth Completion from ip_basic.
- swin_de_pose/utils/dataset_tools
- swin_de_pose/utils/dataset_tools/DSTOOL_README.md: README for dataset tools.
- swin_de_pose/utils/dataset_tools/requirement.txt: Python3 requirement for dataset tools.
- swin_de_pose/utils/dataset_tools/gen_obj_info.py: Generate object info, including SIFT-FPS 3d keypoints, radius etc.
- swin_de_pose/utils/dataset_tools/rgbd_rnder_sift_kp3ds.py: Render rgbd images from mesh and extract textured 3d keypoints (SIFT/ORB).
- swin_de_pose/utils/dataset_tools/utils.py: Basic utils for mesh, pose, image and system processing.
- swin_de_pose/utils/dataset_tools/fps: Furthest point sampling algorithm.
- swin_de_pose/utils/dataset_tools/example_mesh: Example mesh models.
- swin_de_pose/train_log
- swin_de_pose/train_log/
- swin_de_pose/train_log/{your experiment name}/checkpoints/: Storing trained checkpoints on your experiment.
- swin_de_pose/train_log/{your experiment name}/eval_results/: Storing evaluated results on your experiment.
- swin_de_pose/train_log/{your experiment name}/train_info/: Training log on your experiment.
- swin_de_pose/train_log/
- swin_de_pose/apps
- figs/: Images shown in README.
-
- Download the preprocessed LineMOD dataset from onedrive link or google drive link (refer from DenseFusion). Unzip it and link the unzipped
Linemod_preprocessed/
toffb6d/datasets/linemod/Linemod_preprocessed
:
ln -s path_to_unzipped_Linemod_preprocessed ffb6d/dataset/linemod/
- Download the preprocessed LineMOD dataset from onedrive link or google drive link (refer from DenseFusion). Unzip it and link the unzipped
-
- For synthetic dataset:
- Generate rendered and fused data following raster_triangle.
- Open raster_triangle folder. Replace its fuse.py to swin_de_pose/fuse.py and rgbd_renderer.py to swin_de_pose/rgbd_renderer.py.
- Link the Linemod to the current folder.
ln -s path_to_Linemod_preprocessed ./Linemod_preprocessed
Don't have to do it every time. - Render renders_nrm/ data. For example, for phone class.
python3 rgbd_renderer.py --cls phone --render_num 10000
- Render fuse_nrm/ data. For example, for phone class.
python3 fuse.py --cls phone --fuse_num 10000
- For real dataset: Open swin_de_pose/datasets/linemod/
python -m create_angle_npy.py --cls_num your_cls_num --train_list 'train.txt' --test_list 'test.txt'
-
- Download the BOP Occ-LineMOD dataset from (https://bop.felk.cvut.cz/datasets/)
-
- For both pbr_synthetic and real dataset: Open swin_de_pose/datasets/occ_linemod/
python -m create_angle_npy.py --cls_num your_cls_num --train_list 'train.txt' --test_list 'test.txt'
- For both pbr_synthetic and real dataset: Open swin_de_pose/datasets/occ_linemod/
- Train the model for the target object.
bash sh scripts/train_lm.sh
The trained checkpoints are stored inexperiment_name/train_log/linemod/checkpoints/{cls}/
.
- Start evaluation by:
bash sh scripts/test_lm.sh
You can evaluate different checkpoint by revisingtst_mdl
to the path of your target model. - Pretrained model: We provide our pre-trained models for each object on onedrive, link. Download them and move them to their according folders. For example, move the
ape_best.pth.tar
totrain_log/linemod/checkpoints/ape/
. Then revisetst_mdl=train_log/linemod/checkpoints/ape/ape_best.path.tar
for testing.
- After training your models or downloading the pre-trained models, you can visualizing the results:
bash sh scripts/test_lm_vis.sh
-
Train the model for the target object.
bash sh scripts/train_occlm.sh
The trained checkpoints are stored in
experiment_name/train_log/occ_linemod/checkpoints/{cls}/
.
-
Start evaluation by:
bash sh scripts/test_occlm.sh
You can evaluate different checkpoint by revising
tst_mdl
to the path of your target model. -
Pretrained model: We provide our pre-trained models for each object on onedrive, link. Download them and move them to their according folders.
- After training your models or downloading the pre-trained models, you can visualizing the results:
bash sh scripts/test_occlm_vis.sh
-
Train the model for the target object.
bash sh scripts/train_ycb.sh
The trained checkpoints are stored in
experiment_name/train_log/ycb/checkpoints/ycb.pth.tar
.
-
Start evaluation by:
bash sh scripts/test_ycb.sh
You can evaluate different checkpoint by revising
tst_mdl
to the path of your target model. -
Pretrained model: We provide our pre-trained models on onedrive, link. Download them and move them to their according folders.
[Click to expand]
- Evaluation on the LineMod Dataset
- Qualitative Results on the LineMod Dataset
- Evaluation on the Occlusion LineMod Dataset
- Qualitative Results on the Occlusion LineMod Dataset
- Evaluation on the YCBV Dataset
Following Fetch Robot to check the robot we integrated.
Following Robot Grasping Video to check the video that our fetch robot embedded our SwinDePose network grasps texture-less objects.
SwinDePose is released under the MIT License (refer to the LICENSE file for details).