Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms
This is the official implementation of "Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms" paper, that you can download here.
Our implemented network "SPVCNN with Point Transformer in the voxel branch", achieves State Of the Art Results in Street3D dataset.
The baseline models are implemented according to Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
All the codes are tested in the following environment:
- Linux (tested on Ubuntu 18.04)
- Python 3.9.7
- PyTorch 1.10
- CUDA 11.4
- Construct an anaconda environment with python 3.9.7
- Install pytorch 1.10
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
- Install torchsparse with
pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0
- For the k-NN, we use the operations as implemented in PointTransformer. Execute the lib\pointops\setup.py file, downloaded from PointTransformer, with
python3.9 setup.py install
- Install h5py with
conda install h5py
- Install tqdm with
pip install tqdm
- Install ignite with
pip install pytorch-ignite
- Install numba with
pip install numba
- Please follow the instructions from here to download the SemanticKITTI dataset (both KITTI Odometry dataset and SemanticKITTI labels) and extract all the files in the sequences folder to
data/SemanticKITTI
. You should see 22 folders. Folders 00-10 should have subfolders namedvelodyne
andlabels
. The rest 11-21 folders are used for online testing and should not contain anylabels
folder, only thevelodyne
folder.
- Plese follow the instructions from here to download the Street3D dataset. It is in a
.txt
form. Place it in thedata/Street3D/txt
folder, where you should have two folders,train
andtest
with 60 and 20.txt
files, respectively. - Next, execute the pre-processing scripts as follows:
python scripts/Streed3D/street3d_txt_to_h5.py
python scripts/Streed3D/street3d_partition_train.py
python scripts/Streed3D/street3d_partition_test.py
The first script converts the dataset to h5 format and places it in the data/Street3D/h5
folder
The following scripts split each scene into subscenes of around 80k points and save them in .bin
format into proper folders, train_part_80k
and test_part_80k
sets, respectively. The train_part_80k
folder should contain 2458 files and the test_part_80k
folder should contain 845 files. Training and testing is performed based on these split subscenes of 80k points.
The final structure for both datasets should look like this:
data/
-
SemanticKITTI/
- sequences/
-
00/
-
poses.txt
-
labels/
- 000000.label
- ...
-
velodyne/
- 000000.bin
- ...
-
-
...
-
21/
- poses.txt
- velodyne/
- 000000.bin
- ...
-
- sequences/
-
Street3D/
-
txt/
- train/
- 5D4KVPBP.txt
- ...
- test/
- 5D4KVPG4.txt
- ...
- train/
-
h5/
-
train/
- 5D4KVPBP.h5
- ...
-
test/
- 5D4KVPG4.h5
- ...
-
train_part_80k/
- 5D4KVPBP0.bin
- ...
-
test_part_80k/
- 5D4KVPG40.bin
- ...
-
-
To train the networks, check the following scripts for each dataset:
python scripts/SemanticKITTI/kitti_train_all.py
python scripts/Street3D/street3d_train_all.py
Inside each file, you can select the proper network to train, as well as training parameters.
To test the networks in SemanticKITTI validation set or Street3D test set, check the following scripts for each dataset:
python scripts/SemanticKITTI/kitti_inference_all.py
python scripts/Street3D/street3d_inference_all.py
Inside each file, you can select the proper network to inference, as well as to load the proper weights.
The pretrained weights, used in our paper, are provided here.
The size is around 4.8 GB for the weights for all networks
Next, unzip the pretrained_weights.zip
file in the main folder of the repository
If you find this work useful in your research, please consider cite:
@article{vanian2022improving,
title={Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms},
author={Vanian, Vazgen and Zamanakos, Georgios and Pratikakis, Ioannis},
journal={Computers \& Graphics},
year={2022},
publisher={Elsevier}
}