diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
index 3affceb9d2..9b015021e5 100644
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -1,53 +1 @@
-# Contributing to mmdetection
-
-All kinds of contributions are welcome, including but not limited to the following.
-
-- Fixes (typo, bugs)
-- New features and components
-
-## Workflow
-
-1. fork and pull the latest mmdetection
-2. checkout a new branch (do not use master branch for PRs)
-3. commit your changes
-4. create a PR
-
-Note
-- If you plan to add some new features that involve large changes, it is encouraged to open an issue for discussion first.
-- If you are the author of some papers and would like to include your method to mmdetection,
-please contact Kai Chen (chenkaidev[at]gmail[dot]com) and Wenwei Zhang (zwwdev[at]gmail[dot]com). We will much appreciate your contribution.
-
-## Code style
-
-### Python
-We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.
-
-We use the following tools for linting and formatting:
-- [flake8](http://flake8.pycqa.org/en/latest/): linter
-- [yapf](https://github.com/google/yapf): formatter
-- [isort](https://github.com/timothycrosley/isort): sort imports
-
-Style configurations of yapf and isort can be found in [setup.cfg](../setup.cfg).
-
-We use [pre-commit hook](https://pre-commit.com/) that checks and formats for `flake8`, `yapf`, `isort`, `trailing whitespaces`,
- fixes `end-of-files`, sorts `requirments.txt` automatically on every commit.
-The config for a pre-commit hook is stored in [.pre-commit-config](../.pre-commit-config.yaml).
-
-After you clone the repository, you will need to install initialize pre-commit hook.
-
-```
-pip install -U pre-commit
-```
-
-From the repository folder
-```
-pre-commit install
-```
-
-After this on every commit check code linters and formatter will be enforced.
-
-
->Before you create a PR, make sure that your code lints and is formatted by yapf.
-
-### C++ and CUDA
-We follow the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
+We appreciate all contributions to improve MMDetection3D. Please refer to [CONTRIBUTING.md](https://github.com/open-mmlab/mmcv/blob/master/CONTRIBUTING.md) in MMCV for more details about the contributing guideline.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index d686226559..c5134c568c 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -7,7 +7,7 @@ on: [push, pull_request]
 
 jobs:
   lint:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-18.04
     steps:
       - uses: actions/checkout@v2
       - name: Set up Python 3.7
@@ -34,7 +34,7 @@ jobs:
       UBUNTU_VERSION: ubuntu1804
       FORCE_CUDA: 1
       CUDA_ARCH: ${{matrix.cuda_arch}}
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-18.04
     strategy:
       matrix:
         python-version: [3.6, 3.7]
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
index ac5afb8a25..625f050797 100644
--- a/.github/workflows/deploy.yml
+++ b/.github/workflows/deploy.yml
@@ -4,7 +4,7 @@ on: push
 
 jobs:
   build-n-publish:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-18.04
     if: startsWith(github.event.ref, 'refs/tags')
     steps:
       - uses: actions/checkout@v2
diff --git a/README.md b/README.md
index df337ad78e..f93acb0590 100644
--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@
 [![license](https://img.shields.io/github/license/open-mmlab/mmdetection3d.svg)](https://github.com/open-mmlab/mmdetection3d/blob/master/LICENSE)
 
 
-**News**: We released the codebase v0.10.0.
+**News**: We released the codebase v0.11.0.
 
 In the recent [nuScenes 3D detection challenge](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any) of the 5th AI Driving Olympics in NeurIPS 2020, we obtained the best PKL award and the second runner-up by multi-modality entry, and the best vision-only results. Code and models will be released soon!
 
@@ -16,7 +16,9 @@ Documentation: https://mmdetection3d.readthedocs.io/
 
 ## Introduction
 
-The master branch works with **PyTorch 1.3 to 1.6**.
+English | [简体中文](README_zh-CN.md)
+
+The master branch works with **PyTorch 1.3+**.
 
 MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is
 a part of the OpenMMLab project developed by [MMLab](http://mmlab.ie.cuhk.edu.hk/).
@@ -36,7 +38,7 @@ a part of the OpenMMLab project developed by [MMLab](http://mmlab.ie.cuhk.edu.hk
 
 - **Natural integration with 2D detection**
 
-  All the about **50+ methods, 300+ models**, and modules supported in [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/model_zoo.md) can be trained or used in this codebase.
+  All the about **300+ models, methods of 40+ papers**, and modules supported in [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/model_zoo.md) can be trained or used in this codebase.
 
 - **High efficiency**
 
@@ -58,7 +60,7 @@ This project is released under the [Apache 2.0 license](LICENSE).
 
 ## Changelog
 
-v0.10.0 was released in 1/2/2021.
+v0.11.0 was released in 1/3/2021.
 Please refer to [changelog.md](docs/changelog.md) for details and release history.
 
 ## Benchmark and model zoo
@@ -66,6 +68,25 @@ Please refer to [changelog.md](docs/changelog.md) for details and release histor
 Supported methods and backbones are shown in the below table.
 Results and models are available in the [model zoo](docs/model_zoo.md).
 
+Support backbones:
+
+- [x] PointNet (CVPR'2017)
+- [x] PointNet++ (NeurIPS'2017)
+- [x] RegNet (CVPR'2020)
+
+Support methods
+
+- [x] [SECOND (Sensor'2018)](configs/second/README.md)
+- [x] [PointPillars (CVPR'2019)](configs/pointpillars/README.md)
+- [x] [FreeAnchor (NeurIPS'2019)](configs/free_anchor/README.md)
+- [x] [VoteNet (ICCV'2019)](configs/votenet/README.md)
+- [x] [H3DNet (ECCV'2020)](configs/h3dnet/README.md)
+- [x] [3DSSD (CVPR'2020)](configs/3dssd/README.md)
+- [x] [Part-A2 (TPAMI'2020)](configs/parta2/README.md)
+- [x] [MVXNet (ICRA'2019)](configs/mvxnet/README.md)
+- [x] [CenterPoint (Arxiv'2020)](configs/centerpoint/README.md)
+- [x] [SSN (ECCV'2020)](configs/ssn/README.md)
+
 |                    | ResNet   | ResNeXt  | SENet    |PointNet++ | HRNet | RegNetX | Res2Net |
 |--------------------|:--------:|:--------:|:--------:|:---------:|:-----:|:--------:|:-----:|
 | SECOND             | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
@@ -80,9 +101,9 @@ Results and models are available in the [model zoo](docs/model_zoo.md).
 | SSN                | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
 
 Other features
-- [x] [Dynamic Voxelization](configs/carafe/README.md)
+- [x] [Dynamic Voxelization](configs/dynamic_voxelization/README.md)
 
-**Note:** All the about **300 models, methods of 40+ papers** in 2D detection supported by [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/model_zoo.md) can be trained or used in this codebase.
+**Note:** All the about **300+ models, methods of 40+ papers** in 2D detection supported by [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/model_zoo.md) can be trained or used in this codebase.
 
 ## Installation
 
@@ -90,7 +111,7 @@ Please refer to [getting_started.md](docs/getting_started.md) for installation.
 
 ## Get Started
 
-Please see [getting_started.md](docs/getting_started.md) for the basic usage of MMDetection3D. We provide guidance for quick run [with existing dataset](docs/1_exist_data_model.md) and [with customized dataset](docs/2_new_data_model.md) for beginners. There are also tutorials for [learning configuration systems](docs/tutorials/config.md), [adding new dataset](docs/tutorials/customize_dataset.md), [designing data pipeline](docs/tutorials/data_pipeline.md), [customizing models](docs/tutorials/customize_models.md), [customizing runtime settings](docs/tutorials/customize_runtime.md) and [waymo dataset](docs/tutorials/waymo.md).
+Please see [getting_started.md](docs/getting_started.md) for the basic usage of MMDetection3D. We provide guidance for quick run [with existing dataset](docs/1_exist_data_model.md) and [with customized dataset](docs/2_new_data_model.md) for beginners. There are also tutorials for [learning configuration systems](docs/tutorials/config.md), [adding new dataset](docs/tutorials/customize_dataset.md), [designing data pipeline](docs/tutorials/data_pipeline.md), [customizing models](docs/tutorials/customize_models.md), [customizing runtime settings](docs/tutorials/customize_runtime.md) and [Waymo dataset](docs/tutorials/waymo.md).
 
 ## Contributing
 
diff --git a/README_zh-CN.md b/README_zh-CN.md
new file mode 100644
index 0000000000..b0c4c4936e
--- /dev/null
+++ b/README_zh-CN.md
@@ -0,0 +1,121 @@
+<div align="center">
+  <img src="resources/mmdet3d-logo.png" width="600"/>
+</div>
+
+[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmdetection3d.readthedocs.io/en/latest/)
+[![badge](https://github.com/open-mmlab/mmdetection3d/workflows/build/badge.svg)](https://github.com/open-mmlab/mmdetection3d/actions)
+[![codecov](https://codecov.io/gh/open-mmlab/mmdetection3d/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmdetection3d)
+[![license](https://img.shields.io/github/license/open-mmlab/mmdetection3d.svg)](https://github.com/open-mmlab/mmdetection3d/blob/master/LICENSE)
+
+
+**新闻**: 我们发布了版本v0.11.0.
+
+在第三届[ nuScenes 3D 检测挑战赛](https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any)（第五届 AI Driving Olympics, NeurIPS 2020）中，我们获得了最佳 PKL 奖、第三名和最好的纯视觉的结果，相关的代码和模型将会在不久后发布。
+
+文档: https://mmdetection3d.readthedocs.io/
+
+## 简介
+
+[English](README.md) | 简体中文
+
+主分支代码目前支持 PyTorch 1.3 以上的版本。
+
+MMDetection3D 是一个基于 PyTorch 的目标检测开源工具箱, 下一代面向3D检测的平台. 它是 OpenMMlab 项目的一部分，这个项目由香港中文大学多媒体实验室和商汤科技联合发起.
+
+![demo image](resources/mmdet3d_outdoor_demo.gif)
+
+### 主要特性
+
+- **支持多模态/单模态的检测器**
+
+  支持多模态/单模态检测器，包括 MVXNet，VoteNet，PointPillars 等。
+
+- **支持户内/户外的数据集**
+
+  支持室内/室外的3D检测数据集，包括 ScanNet, SUNRGB-D, Waymo, nuScenes, Lyft, KITTI.
+
+  对于 nuScenes 数据集, 我们也支持 [nuImages 数据集](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/nuimages).
+
+- **与 2D 检测器的自然整合**
+
+   [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/model_zoo.md) 支持的**300+个模型 , 40+的论文算法**, 和相关模块都可以在此代码库中训练或使用。
+
+- **性能高**
+
+   训练速度比其他代码库更快。下表可见主要的对比结果。更多的细节可见[基准测评文档](./docs/benchmarks.md)。我们对比了每秒训练的样本数（值越高越好）。其他代码库不支持的模型被标记为 `×`。
+
+  | Methods | MMDetection3D | [OpenPCDet](https://github.com/open-mmlab/OpenPCDet) |[votenet](https://github.com/facebookresearch/votenet)| [Det3D](https://github.com/poodarchu/Det3D) |
+  |:-------:|:-------------:|:---------:|:-----:|:-----:|
+  | VoteNet | 358           | ×         |   77  | ×     |
+  | PointPillars-car| 141           | ×         |   ×  | 140     |
+  | PointPillars-3class| 107           |44     |   ×      | ×    |
+  | SECOND| 40           |30     |   ×      | ×    |
+  | Part-A2| 17           |14     |   ×      | ×    |
+
+和 [MMDetection](https://github.com/open-mmlab/mmdetection)，[MMCV](https://github.com/open-mmlab/mmcv) 一样, MMDetection3D 也可以作为一个库去支持各式各样的项目.
+
+## 开源许可证
+
+该项目采用 [Apache 2.0 开源许可证](LICENSE)。
+
+## 更新日志
+
+最新的版本 v0.11.0 在 2021.03.01发布。
+如果想了解更多版本更新细节和历史信息，请阅读[更新日志](docs/changelog.md)。
+
+## 基准测试和模型库
+
+测试结果和模型可以在[模型库](docs/model_zoo.md)中找到。
+
+已支持的骨干网络：
+
+- [x] PointNet (CVPR'2017)
+- [x] PointNet++ (NeurIPS'2017)
+- [x] RegNet (CVPR'2020)
+
+已支持的算法：
+
+- [x] [SECOND (Sensor'2018)](configs/second/README.md)
+- [x] [PointPillars (CVPR'2019)](configs/pointpillars/README.md)
+- [x] [FreeAnchor (NeurIPS'2019)](configs/free_anchor/README.md)
+- [x] [VoteNet (ICCV'2019)](configs/votenet/README.md)
+- [x] [H3DNet (ECCV'2020)](configs/h3dnet/README.md)
+- [x] [3DSSD (CVPR'2020)](configs/3dssd/README.md)
+- [x] [Part-A2 (TPAMI'2020)](configs/parta2/README.md)
+- [x] [MVXNet (ICRA'2019)](configs/mvxnet/README.md)
+- [x] [CenterPoint (Arxiv'2020)](configs/centerpoint/README.md)
+- [x] [SSN (ECCV'2020)](configs/ssn/README.md)
+
+|                    | ResNet   | ResNeXt  | SENet    |PointNet++ | HRNet | RegNetX | Res2Net |
+|--------------------|:--------:|:--------:|:--------:|:---------:|:-----:|:--------:|:-----:|
+| SECOND             | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
+| PointPillars       | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
+| FreeAnchor         | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
+| VoteNet            | ✗        | ✗        | ✗        | ✓         | ✗     | ✗        | ✗     |
+| H3DNet            | ✗        | ✗        | ✗        | ✓         | ✗     | ✗        | ✗     |
+| 3DSSD            | ✗        | ✗        | ✗        | ✓         | ✗     | ✗        | ✗     |
+| Part-A2            | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
+| MVXNet             | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
+| CenterPoint        | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
+| SSN                | ☐        | ☐        | ☐        | ✗         | ☐     | ✓        | ☐     |
+
+其他特性
+- [x] [Dynamic Voxelization](configs/dynamic_voxelization/README.md)
+
+**注意：** [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/model_zoo.md) 支持的基于2D检测的**300+个模型 , 40+的论文算法**在 MMDetection3D 中都可以被训练或使用。
+
+## 安装
+
+请参考[快速入门文档](docs/get_started.md)进行安装。
+
+## 快速入门
+
+请参考[快速入门文档](docs/get_started.md)学习 MMDetection3D 的基本使用。 我们为新手提供了分别针对[已有数据集](docs/1_exist_data_model.md)和[新数据集](docs/2_new_data_model.md)的使用指南。我们也提供了一些进阶教程，内容覆盖了[学习配置文件](docs/tutorials/config.md), [增加数据集支持](docs/tutorials/customize_dataset.md), [设计新的数据预处理流程](docs/tutorials/data_pipeline.md), [增加自定义模型](docs/tutorials/customize_models.md), [增加自定义的运行时配置](docs/tutorials/customize_runtime.md)和 [Waymo 数据集](docs/tutorials/waymo.md).
+
+## 贡献指南
+
+我们感谢所有的贡献者为改进和提升 MMDetection3D 所作出的努力。请参考[贡献指南](.github/CONTRIBUTING.md)来了解参与项目贡献的相关指引。
+
+## 致谢
+
+MMDetection3D 是一款由来自不同高校和企业的研发人员共同参与贡献的开源项目。我们感谢所有为项目提供算法复现和新功能支持的贡献者，以及提供宝贵反馈的用户。我们希望这个工具箱和基准测试可以为社区提供灵活的代码工具，供用户复现已有算法并开发自己的新的 3D 检测模型。
diff --git a/configs/_base_/models/3dssd.py b/configs/_base_/models/3dssd.py
index 3de2e3d6ad..c3a08ddd65 100644
--- a/configs/_base_/models/3dssd.py
+++ b/configs/_base_/models/3dssd.py
@@ -65,17 +65,16 @@
             type='SmoothL1Loss', reduction='sum', loss_weight=1.0),
         corner_loss=dict(
             type='SmoothL1Loss', reduction='sum', loss_weight=1.0),
-        vote_loss=dict(type='SmoothL1Loss', reduction='sum', loss_weight=1.0)))
-
-# model training and testing settings
-train_cfg = dict(
-    sample_mod='spec', pos_distance_thr=10.0, expand_dims_length=0.05)
-test_cfg = dict(
-    nms_cfg=dict(type='nms', iou_thr=0.1),
-    sample_mod='spec',
-    score_thr=0.0,
-    per_class_proposal=True,
-    max_output_num=100)
+        vote_loss=dict(type='SmoothL1Loss', reduction='sum', loss_weight=1.0)),
+    # model training and testing settings
+    train_cfg=dict(
+        sample_mod='spec', pos_distance_thr=10.0, expand_dims_length=0.05),
+    test_cfg=dict(
+        nms_cfg=dict(type='nms', iou_thr=0.1),
+        sample_mod='spec',
+        score_thr=0.0,
+        per_class_proposal=True,
+        max_output_num=100))
 
 # optimizer
 # This schedule is mainly used by models on indoor dataset,
diff --git a/configs/_base_/models/cascade_mask_rcnn_r50_fpn.py b/configs/_base_/models/cascade_mask_rcnn_r50_fpn.py
index f90b78cef3..fb9e0a8f06 100644
--- a/configs/_base_/models/cascade_mask_rcnn_r50_fpn.py
+++ b/configs/_base_/models/cascade_mask_rcnn_r50_fpn.py
@@ -105,96 +105,96 @@
             conv_out_channels=256,
             num_classes=80,
             loss_mask=dict(
-                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))))
-# model training and testing settings
-train_cfg = dict(
-    rpn=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            pos_iou_thr=0.7,
-            neg_iou_thr=0.3,
-            min_pos_iou=0.3,
-            match_low_quality=True,
-            ignore_iof_thr=-1),
-        sampler=dict(
-            type='RandomSampler',
-            num=256,
-            pos_fraction=0.5,
-            neg_pos_ub=-1,
-            add_gt_as_proposals=False),
-        allowed_border=0,
-        pos_weight=-1,
-        debug=False),
-    rpn_proposal=dict(
-        nms_across_levels=False,
-        nms_pre=2000,
-        nms_post=2000,
-        max_num=2000,
-        nms_thr=0.7,
-        min_bbox_size=0),
-    rcnn=[
-        dict(
-            assigner=dict(
-                type='MaxIoUAssigner',
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.5,
-                min_pos_iou=0.5,
-                match_low_quality=False,
-                ignore_iof_thr=-1),
-            sampler=dict(
-                type='RandomSampler',
-                num=512,
-                pos_fraction=0.25,
-                neg_pos_ub=-1,
-                add_gt_as_proposals=True),
-            mask_size=28,
-            pos_weight=-1,
-            debug=False),
-        dict(
-            assigner=dict(
-                type='MaxIoUAssigner',
-                pos_iou_thr=0.6,
-                neg_iou_thr=0.6,
-                min_pos_iou=0.6,
-                match_low_quality=False,
-                ignore_iof_thr=-1),
-            sampler=dict(
-                type='RandomSampler',
-                num=512,
-                pos_fraction=0.25,
-                neg_pos_ub=-1,
-                add_gt_as_proposals=True),
-            mask_size=28,
-            pos_weight=-1,
-            debug=False),
-        dict(
+                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
+    # model training and testing settings
+    train_cfg=dict(
+        rpn=dict(
             assigner=dict(
                 type='MaxIoUAssigner',
                 pos_iou_thr=0.7,
-                neg_iou_thr=0.7,
-                min_pos_iou=0.7,
-                match_low_quality=False,
+                neg_iou_thr=0.3,
+                min_pos_iou=0.3,
+                match_low_quality=True,
                 ignore_iof_thr=-1),
             sampler=dict(
                 type='RandomSampler',
-                num=512,
-                pos_fraction=0.25,
+                num=256,
+                pos_fraction=0.5,
                 neg_pos_ub=-1,
-                add_gt_as_proposals=True),
-            mask_size=28,
+                add_gt_as_proposals=False),
+            allowed_border=0,
             pos_weight=-1,
-            debug=False)
-    ])
-test_cfg = dict(
-    rpn=dict(
-        nms_across_levels=False,
-        nms_pre=1000,
-        nms_post=1000,
-        max_num=1000,
-        nms_thr=0.7,
-        min_bbox_size=0),
-    rcnn=dict(
-        score_thr=0.05,
-        nms=dict(type='nms', iou_threshold=0.5),
-        max_per_img=100,
-        mask_thr_binary=0.5))
+            debug=False),
+        rpn_proposal=dict(
+            nms_across_levels=False,
+            nms_pre=2000,
+            nms_post=2000,
+            max_num=2000,
+            nms_thr=0.7,
+            min_bbox_size=0),
+        rcnn=[
+            dict(
+                assigner=dict(
+                    type='MaxIoUAssigner',
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.5,
+                    min_pos_iou=0.5,
+                    match_low_quality=False,
+                    ignore_iof_thr=-1),
+                sampler=dict(
+                    type='RandomSampler',
+                    num=512,
+                    pos_fraction=0.25,
+                    neg_pos_ub=-1,
+                    add_gt_as_proposals=True),
+                mask_size=28,
+                pos_weight=-1,
+                debug=False),
+            dict(
+                assigner=dict(
+                    type='MaxIoUAssigner',
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.6,
+                    min_pos_iou=0.6,
+                    match_low_quality=False,
+                    ignore_iof_thr=-1),
+                sampler=dict(
+                    type='RandomSampler',
+                    num=512,
+                    pos_fraction=0.25,
+                    neg_pos_ub=-1,
+                    add_gt_as_proposals=True),
+                mask_size=28,
+                pos_weight=-1,
+                debug=False),
+            dict(
+                assigner=dict(
+                    type='MaxIoUAssigner',
+                    pos_iou_thr=0.7,
+                    neg_iou_thr=0.7,
+                    min_pos_iou=0.7,
+                    match_low_quality=False,
+                    ignore_iof_thr=-1),
+                sampler=dict(
+                    type='RandomSampler',
+                    num=512,
+                    pos_fraction=0.25,
+                    neg_pos_ub=-1,
+                    add_gt_as_proposals=True),
+                mask_size=28,
+                pos_weight=-1,
+                debug=False)
+        ]),
+    test_cfg=dict(
+        rpn=dict(
+            nms_across_levels=False,
+            nms_pre=1000,
+            nms_post=1000,
+            max_num=1000,
+            nms_thr=0.7,
+            min_bbox_size=0),
+        rcnn=dict(
+            score_thr=0.05,
+            nms=dict(type='nms', iou_threshold=0.5),
+            max_per_img=100,
+            mask_thr_binary=0.5)))
diff --git a/configs/_base_/models/centerpoint_01voxel_second_secfpn_nus.py b/configs/_base_/models/centerpoint_01voxel_second_secfpn_nus.py
index 280d4b52cc..efdce59c6d 100644
--- a/configs/_base_/models/centerpoint_01voxel_second_secfpn_nus.py
+++ b/configs/_base_/models/centerpoint_01voxel_second_secfpn_nus.py
@@ -52,32 +52,32 @@
             out_size_factor=8,
             voxel_size=voxel_size[:2],
             code_size=9),
-        seperate_head=dict(
+        separate_head=dict(
             type='SeparateHead', init_bias=-2.19, final_kernel=3),
         loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
         loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=0.25),
-        norm_bbox=True))
-# model training and testing settings
-train_cfg = dict(
-    pts=dict(
-        grid_size=[1024, 1024, 40],
-        voxel_size=voxel_size,
-        out_size_factor=8,
-        dense_reg=1,
-        gaussian_overlap=0.1,
-        max_objs=500,
-        min_radius=2,
-        code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2]))
-test_cfg = dict(
-    pts=dict(
-        post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
-        max_per_img=500,
-        max_pool_nms=False,
-        min_radius=[4, 12, 10, 1, 0.85, 0.175],
-        score_threshold=0.1,
-        out_size_factor=8,
-        voxel_size=voxel_size[:2],
-        nms_type='rotate',
-        pre_max_size=1000,
-        post_max_size=83,
-        nms_thr=0.2))
+        norm_bbox=True),
+    # model training and testing settings
+    train_cfg=dict(
+        pts=dict(
+            grid_size=[1024, 1024, 40],
+            voxel_size=voxel_size,
+            out_size_factor=8,
+            dense_reg=1,
+            gaussian_overlap=0.1,
+            max_objs=500,
+            min_radius=2,
+            code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2])),
+    test_cfg=dict(
+        pts=dict(
+            post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
+            max_per_img=500,
+            max_pool_nms=False,
+            min_radius=[4, 12, 10, 1, 0.85, 0.175],
+            score_threshold=0.1,
+            out_size_factor=8,
+            voxel_size=voxel_size[:2],
+            nms_type='rotate',
+            pre_max_size=1000,
+            post_max_size=83,
+            nms_thr=0.2)))
diff --git a/configs/_base_/models/centerpoint_02pillar_second_secfpn_nus.py b/configs/_base_/models/centerpoint_02pillar_second_secfpn_nus.py
index b5f23acd59..311d76373b 100644
--- a/configs/_base_/models/centerpoint_02pillar_second_secfpn_nus.py
+++ b/configs/_base_/models/centerpoint_02pillar_second_secfpn_nus.py
@@ -51,33 +51,33 @@
             out_size_factor=4,
             voxel_size=voxel_size[:2],
             code_size=9),
-        seperate_head=dict(
+        separate_head=dict(
             type='SeparateHead', init_bias=-2.19, final_kernel=3),
         loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
         loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=0.25),
-        norm_bbox=True))
-# model training and testing settings
-train_cfg = dict(
-    pts=dict(
-        grid_size=[512, 512, 1],
-        voxel_size=voxel_size,
-        out_size_factor=4,
-        dense_reg=1,
-        gaussian_overlap=0.1,
-        max_objs=500,
-        min_radius=2,
-        code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2]))
-test_cfg = dict(
-    pts=dict(
-        post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
-        max_per_img=500,
-        max_pool_nms=False,
-        min_radius=[4, 12, 10, 1, 0.85, 0.175],
-        score_threshold=0.1,
-        pc_range=[-51.2, -51.2],
-        out_size_factor=4,
-        voxel_size=voxel_size[:2],
-        nms_type='rotate',
-        pre_max_size=1000,
-        post_max_size=83,
-        nms_thr=0.2))
+        norm_bbox=True),
+    # model training and testing settings
+    train_cfg=dict(
+        pts=dict(
+            grid_size=[512, 512, 1],
+            voxel_size=voxel_size,
+            out_size_factor=4,
+            dense_reg=1,
+            gaussian_overlap=0.1,
+            max_objs=500,
+            min_radius=2,
+            code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2])),
+    test_cfg=dict(
+        pts=dict(
+            post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
+            max_per_img=500,
+            max_pool_nms=False,
+            min_radius=[4, 12, 10, 1, 0.85, 0.175],
+            score_threshold=0.1,
+            pc_range=[-51.2, -51.2],
+            out_size_factor=4,
+            voxel_size=voxel_size[:2],
+            nms_type='rotate',
+            pre_max_size=1000,
+            post_max_size=83,
+            nms_thr=0.2)))
diff --git a/configs/_base_/models/h3dnet.py b/configs/_base_/models/h3dnet.py
index f2444054c3..760566744f 100644
--- a/configs/_base_/models/h3dnet.py
+++ b/configs/_base_/models/h3dnet.py
@@ -311,32 +311,31 @@
                 reduction='none',
                 loss_weight=5.0),
             primitive_center_loss=dict(
-                type='MSELoss', reduction='none', loss_weight=1.0))))
-
-# model training and testing settings
-train_cfg = dict(
-    rpn=dict(pos_distance_thr=0.3, neg_distance_thr=0.6, sample_mod='vote'),
-    rpn_proposal=dict(use_nms=False),
-    rcnn=dict(
-        pos_distance_thr=0.3,
-        neg_distance_thr=0.6,
-        sample_mod='vote',
-        far_threshold=0.6,
-        near_threshold=0.3,
-        mask_surface_threshold=0.3,
-        label_surface_threshold=0.3,
-        mask_line_threshold=0.3,
-        label_line_threshold=0.3))
-
-test_cfg = dict(
-    rpn=dict(
-        sample_mod='seed',
-        nms_thr=0.25,
-        score_thr=0.05,
-        per_class_proposal=True,
-        use_nms=False),
-    rcnn=dict(
-        sample_mod='seed',
-        nms_thr=0.25,
-        score_thr=0.05,
-        per_class_proposal=True))
+                type='MSELoss', reduction='none', loss_weight=1.0))),
+    # model training and testing settings
+    train_cfg=dict(
+        rpn=dict(
+            pos_distance_thr=0.3, neg_distance_thr=0.6, sample_mod='vote'),
+        rpn_proposal=dict(use_nms=False),
+        rcnn=dict(
+            pos_distance_thr=0.3,
+            neg_distance_thr=0.6,
+            sample_mod='vote',
+            far_threshold=0.6,
+            near_threshold=0.3,
+            mask_surface_threshold=0.3,
+            label_surface_threshold=0.3,
+            mask_line_threshold=0.3,
+            label_line_threshold=0.3)),
+    test_cfg=dict(
+        rpn=dict(
+            sample_mod='seed',
+            nms_thr=0.25,
+            score_thr=0.05,
+            per_class_proposal=True,
+            use_nms=False),
+        rcnn=dict(
+            sample_mod='seed',
+            nms_thr=0.25,
+            score_thr=0.05,
+            per_class_proposal=True)))
diff --git a/configs/_base_/models/hv_pointpillars_fpn_lyft.py b/configs/_base_/models/hv_pointpillars_fpn_lyft.py
index f49fde4a13..87c7fe0c61 100644
--- a/configs/_base_/models/hv_pointpillars_fpn_lyft.py
+++ b/configs/_base_/models/hv_pointpillars_fpn_lyft.py
@@ -17,6 +17,6 @@
         num_classes=9,
         anchor_generator=dict(
             ranges=[[-80, -80, -1.8, 80, 80, -1.8]], custom_values=[]),
-        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=7)))
-# model training settings (based on nuScenes model settings)
-train_cfg = dict(pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]))
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=7)),
+    # model training settings (based on nuScenes model settings)
+    train_cfg=dict(pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])))
diff --git a/configs/_base_/models/hv_pointpillars_fpn_nus.py b/configs/_base_/models/hv_pointpillars_fpn_nus.py
index d76ef8cde0..e153f6c6e6 100644
--- a/configs/_base_/models/hv_pointpillars_fpn_nus.py
+++ b/configs/_base_/models/hv_pointpillars_fpn_nus.py
@@ -70,27 +70,27 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    pts=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.6,
-            neg_iou_thr=0.3,
-            min_pos_iou=0.3,
-            ignore_iof_thr=-1),
-        allowed_border=0,
-        code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
-        pos_weight=-1,
-        debug=False))
-test_cfg = dict(
-    pts=dict(
-        use_rotate_nms=True,
-        nms_across_levels=False,
-        nms_pre=1000,
-        nms_thr=0.2,
-        score_thr=0.05,
-        min_bbox_size=0,
-        max_num=500))
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        pts=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.3,
+                min_pos_iou=0.3,
+                ignore_iof_thr=-1),
+            allowed_border=0,
+            code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
+            pos_weight=-1,
+            debug=False)),
+    test_cfg=dict(
+        pts=dict(
+            use_rotate_nms=True,
+            nms_across_levels=False,
+            nms_pre=1000,
+            nms_thr=0.2,
+            score_thr=0.05,
+            min_bbox_size=0,
+            max_num=500)))
diff --git a/configs/_base_/models/hv_pointpillars_fpn_range100_lyft.py b/configs/_base_/models/hv_pointpillars_fpn_range100_lyft.py
index f43c307dc2..9cd200f3e4 100644
--- a/configs/_base_/models/hv_pointpillars_fpn_range100_lyft.py
+++ b/configs/_base_/models/hv_pointpillars_fpn_range100_lyft.py
@@ -17,6 +17,6 @@
         num_classes=9,
         anchor_generator=dict(
             ranges=[[-100, -100, -1.8, 100, 100, -1.8]], custom_values=[]),
-        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=7)))
-# model training settings (based on nuScenes model settings)
-train_cfg = dict(pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]))
+        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=7)),
+    # model training settings (based on nuScenes model settings)
+    train_cfg=dict(pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])))
diff --git a/configs/_base_/models/hv_pointpillars_secfpn_kitti.py b/configs/_base_/models/hv_pointpillars_secfpn_kitti.py
index 5466962003..824ac53c7f 100644
--- a/configs/_base_/models/hv_pointpillars_secfpn_kitti.py
+++ b/configs/_base_/models/hv_pointpillars_secfpn_kitti.py
@@ -52,40 +52,40 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    assigner=[
-        dict(  # for Pedestrian
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.35,
-            min_pos_iou=0.35,
-            ignore_iof_thr=-1),
-        dict(  # for Cyclist
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.35,
-            min_pos_iou=0.35,
-            ignore_iof_thr=-1),
-        dict(  # for Car
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.6,
-            neg_iou_thr=0.45,
-            min_pos_iou=0.45,
-            ignore_iof_thr=-1),
-    ],
-    allowed_border=0,
-    pos_weight=-1,
-    debug=False)
-test_cfg = dict(
-    use_rotate_nms=True,
-    nms_across_levels=False,
-    nms_thr=0.01,
-    score_thr=0.1,
-    min_bbox_size=0,
-    nms_pre=100,
-    max_num=50)
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        assigner=[
+            dict(  # for Pedestrian
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(  # for Cyclist
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(  # for Car
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1),
+        ],
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        use_rotate_nms=True,
+        nms_across_levels=False,
+        nms_thr=0.01,
+        score_thr=0.1,
+        min_bbox_size=0,
+        nms_pre=100,
+        max_num=50))
diff --git a/configs/_base_/models/hv_pointpillars_secfpn_waymo.py b/configs/_base_/models/hv_pointpillars_secfpn_waymo.py
index 066a36ac56..14873ead47 100644
--- a/configs/_base_/models/hv_pointpillars_secfpn_waymo.py
+++ b/configs/_base_/models/hv_pointpillars_secfpn_waymo.py
@@ -66,44 +66,43 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    pts=dict(
-        assigner=[
-            dict(  # car
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.4,
-                min_pos_iou=0.4,
-                ignore_iof_thr=-1),
-            dict(  # cyclist
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.3,
-                min_pos_iou=0.3,
-                ignore_iof_thr=-1),
-            dict(  # pedestrian
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.3,
-                min_pos_iou=0.3,
-                ignore_iof_thr=-1),
-        ],
-        allowed_border=0,
-        code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
-        pos_weight=-1,
-        debug=False))
-
-test_cfg = dict(
-    pts=dict(
-        use_rotate_nms=True,
-        nms_across_levels=False,
-        nms_pre=4096,
-        nms_thr=0.25,
-        score_thr=0.1,
-        min_bbox_size=0,
-        max_num=500))
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        pts=dict(
+            assigner=[
+                dict(  # car
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # cyclist
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.3,
+                    min_pos_iou=0.3,
+                    ignore_iof_thr=-1),
+                dict(  # pedestrian
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.3,
+                    min_pos_iou=0.3,
+                    ignore_iof_thr=-1),
+            ],
+            allowed_border=0,
+            code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
+            pos_weight=-1,
+            debug=False)),
+    test_cfg=dict(
+        pts=dict(
+            use_rotate_nms=True,
+            nms_across_levels=False,
+            nms_pre=4096,
+            nms_thr=0.25,
+            score_thr=0.1,
+            min_bbox_size=0,
+            max_num=500)))
diff --git a/configs/_base_/models/hv_second_secfpn_kitti.py b/configs/_base_/models/hv_second_secfpn_kitti.py
index b566796099..2da46496c1 100644
--- a/configs/_base_/models/hv_second_secfpn_kitti.py
+++ b/configs/_base_/models/hv_second_secfpn_kitti.py
@@ -48,40 +48,40 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    assigner=[
-        dict(  # for Pedestrian
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.35,
-            neg_iou_thr=0.2,
-            min_pos_iou=0.2,
-            ignore_iof_thr=-1),
-        dict(  # for Cyclist
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.35,
-            neg_iou_thr=0.2,
-            min_pos_iou=0.2,
-            ignore_iof_thr=-1),
-        dict(  # for Car
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.6,
-            neg_iou_thr=0.45,
-            min_pos_iou=0.45,
-            ignore_iof_thr=-1),
-    ],
-    allowed_border=0,
-    pos_weight=-1,
-    debug=False)
-test_cfg = dict(
-    use_rotate_nms=True,
-    nms_across_levels=False,
-    nms_thr=0.01,
-    score_thr=0.1,
-    min_bbox_size=0,
-    nms_pre=100,
-    max_num=50)
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        assigner=[
+            dict(  # for Pedestrian
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.35,
+                neg_iou_thr=0.2,
+                min_pos_iou=0.2,
+                ignore_iof_thr=-1),
+            dict(  # for Cyclist
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.35,
+                neg_iou_thr=0.2,
+                min_pos_iou=0.2,
+                ignore_iof_thr=-1),
+            dict(  # for Car
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1),
+        ],
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        use_rotate_nms=True,
+        nms_across_levels=False,
+        nms_thr=0.01,
+        score_thr=0.1,
+        min_bbox_size=0,
+        nms_pre=100,
+        max_num=50))
diff --git a/configs/_base_/models/hv_second_secfpn_waymo.py b/configs/_base_/models/hv_second_secfpn_waymo.py
index c7c7ed71bc..eb9bd3ae5c 100644
--- a/configs/_base_/models/hv_second_secfpn_waymo.py
+++ b/configs/_base_/models/hv_second_secfpn_waymo.py
@@ -60,42 +60,41 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    assigner=[
-        dict(  # car
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.55,
-            neg_iou_thr=0.4,
-            min_pos_iou=0.4,
-            ignore_iof_thr=-1),
-        dict(  # pedestrian
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.3,
-            min_pos_iou=0.3,
-            ignore_iof_thr=-1),
-        dict(  # cyclist
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.3,
-            min_pos_iou=0.3,
-            ignore_iof_thr=-1)
-    ],
-    allowed_border=0,
-    code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
-    pos_weight=-1,
-    debug=False)
-
-test_cfg = dict(
-    use_rotate_nms=True,
-    nms_across_levels=False,
-    nms_pre=4096,
-    nms_thr=0.25,
-    score_thr=0.1,
-    min_bbox_size=0,
-    max_num=500)
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        assigner=[
+            dict(  # car
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.4,
+                min_pos_iou=0.4,
+                ignore_iof_thr=-1),
+            dict(  # pedestrian
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.3,
+                min_pos_iou=0.3,
+                ignore_iof_thr=-1),
+            dict(  # cyclist
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.3,
+                min_pos_iou=0.3,
+                ignore_iof_thr=-1)
+        ],
+        allowed_border=0,
+        code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        use_rotate_nms=True,
+        nms_across_levels=False,
+        nms_pre=4096,
+        nms_thr=0.25,
+        score_thr=0.1,
+        min_bbox_size=0,
+        max_num=500))
diff --git a/configs/_base_/models/mask_rcnn_r50_fpn.py b/configs/_base_/models/mask_rcnn_r50_fpn.py
index 4472bd0a80..c5d5e32b04 100644
--- a/configs/_base_/models/mask_rcnn_r50_fpn.py
+++ b/configs/_base_/models/mask_rcnn_r50_fpn.py
@@ -65,60 +65,60 @@
             conv_out_channels=256,
             num_classes=80,
             loss_mask=dict(
-                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))))
-# model training and testing settings
-train_cfg = dict(
-    rpn=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            pos_iou_thr=0.7,
-            neg_iou_thr=0.3,
-            min_pos_iou=0.3,
-            match_low_quality=True,
-            ignore_iof_thr=-1),
-        sampler=dict(
-            type='RandomSampler',
-            num=256,
-            pos_fraction=0.5,
-            neg_pos_ub=-1,
-            add_gt_as_proposals=False),
-        allowed_border=-1,
-        pos_weight=-1,
-        debug=False),
-    rpn_proposal=dict(
-        nms_across_levels=False,
-        nms_pre=2000,
-        nms_post=1000,
-        max_num=1000,
-        nms_thr=0.7,
-        min_bbox_size=0),
-    rcnn=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.5,
-            min_pos_iou=0.5,
-            match_low_quality=True,
-            ignore_iof_thr=-1),
-        sampler=dict(
-            type='RandomSampler',
-            num=512,
-            pos_fraction=0.25,
-            neg_pos_ub=-1,
-            add_gt_as_proposals=True),
-        mask_size=28,
-        pos_weight=-1,
-        debug=False))
-test_cfg = dict(
-    rpn=dict(
-        nms_across_levels=False,
-        nms_pre=1000,
-        nms_post=1000,
-        max_num=1000,
-        nms_thr=0.7,
-        min_bbox_size=0),
-    rcnn=dict(
-        score_thr=0.05,
-        nms=dict(type='nms', iou_threshold=0.5),
-        max_per_img=100,
-        mask_thr_binary=0.5))
+                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
+    # model training and testing settings
+    train_cfg=dict(
+        rpn=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                pos_iou_thr=0.7,
+                neg_iou_thr=0.3,
+                min_pos_iou=0.3,
+                match_low_quality=True,
+                ignore_iof_thr=-1),
+            sampler=dict(
+                type='RandomSampler',
+                num=256,
+                pos_fraction=0.5,
+                neg_pos_ub=-1,
+                add_gt_as_proposals=False),
+            allowed_border=-1,
+            pos_weight=-1,
+            debug=False),
+        rpn_proposal=dict(
+            nms_across_levels=False,
+            nms_pre=2000,
+            nms_post=1000,
+            max_num=1000,
+            nms_thr=0.7,
+            min_bbox_size=0),
+        rcnn=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.5,
+                min_pos_iou=0.5,
+                match_low_quality=True,
+                ignore_iof_thr=-1),
+            sampler=dict(
+                type='RandomSampler',
+                num=512,
+                pos_fraction=0.25,
+                neg_pos_ub=-1,
+                add_gt_as_proposals=True),
+            mask_size=28,
+            pos_weight=-1,
+            debug=False)),
+    test_cfg=dict(
+        rpn=dict(
+            nms_across_levels=False,
+            nms_pre=1000,
+            nms_post=1000,
+            max_num=1000,
+            nms_thr=0.7,
+            min_bbox_size=0),
+        rcnn=dict(
+            score_thr=0.05,
+            nms=dict(type='nms', iou_threshold=0.5),
+            max_per_img=100,
+            mask_thr_binary=0.5)))
diff --git a/configs/_base_/models/votenet.py b/configs/_base_/models/votenet.py
index 3bddf30ace..129339dc9e 100644
--- a/configs/_base_/models/votenet.py
+++ b/configs/_base_/models/votenet.py
@@ -62,8 +62,12 @@
         size_res_loss=dict(
             type='SmoothL1Loss', reduction='sum', loss_weight=10.0 / 3.0),
         semantic_loss=dict(
-            type='CrossEntropyLoss', reduction='sum', loss_weight=1.0)))
-# model training and testing settings
-train_cfg = dict(pos_distance_thr=0.3, neg_distance_thr=0.6, sample_mod='vote')
-test_cfg = dict(
-    sample_mod='seed', nms_thr=0.25, score_thr=0.05, per_class_proposal=True)
+            type='CrossEntropyLoss', reduction='sum', loss_weight=1.0)),
+    # model training and testing settings
+    train_cfg=dict(
+        pos_distance_thr=0.3, neg_distance_thr=0.6, sample_mod='vote'),
+    test_cfg=dict(
+        sample_mod='seed',
+        nms_thr=0.25,
+        score_thr=0.05,
+        per_class_proposal=True))
diff --git a/configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py b/configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py
index 7bd7624b4d..f905906d51 100644
--- a/configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py
+++ b/configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py
@@ -111,88 +111,94 @@
                 type='CrossEntropyLoss',
                 use_sigmoid=True,
                 reduction='sum',
-                loss_weight=1.0))))
-# model training and testing settings
-train_cfg = dict(
-    rpn=dict(
-        assigner=[
-            dict(  # for Pedestrian
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.35,
-                min_pos_iou=0.35,
-                ignore_iof_thr=-1),
-            dict(  # for Cyclist
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.35,
-                min_pos_iou=0.35,
-                ignore_iof_thr=-1),
-            dict(  # for Car
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.6,
-                neg_iou_thr=0.45,
-                min_pos_iou=0.45,
-                ignore_iof_thr=-1)
-        ],
-        allowed_border=0,
-        pos_weight=-1,
-        debug=False),
-    rpn_proposal=dict(
-        nms_pre=9000,
-        nms_post=512,
-        max_num=512,
-        nms_thr=0.8,
-        score_thr=0,
-        use_rotate_nms=False),
-    rcnn=dict(
-        assigner=[
-            dict(  # for Pedestrian
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.55,
-                min_pos_iou=0.55,
-                ignore_iof_thr=-1),
-            dict(  # for Cyclist
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.55,
-                min_pos_iou=0.55,
-                ignore_iof_thr=-1),
-            dict(  # for Car
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.55,
-                min_pos_iou=0.55,
-                ignore_iof_thr=-1)
-        ],
-        sampler=dict(
-            type='IoUNegPiecewiseSampler',
-            num=128,
-            pos_fraction=0.55,
-            neg_piece_fractions=[0.8, 0.2],
-            neg_iou_piece_thrs=[0.55, 0.1],
-            neg_pos_ub=-1,
-            add_gt_as_proposals=False,
-            return_iou=True),
-        cls_pos_thr=0.75,
-        cls_neg_thr=0.25))
-test_cfg = dict(
-    rpn=dict(
-        nms_pre=1024,
-        nms_post=100,
-        max_num=100,
-        nms_thr=0.7,
-        score_thr=0,
-        use_rotate_nms=True),
-    rcnn=dict(
-        use_rotate_nms=True, use_raw_score=True, nms_thr=0.01, score_thr=0.3))
+                loss_weight=1.0))),
+    # model training and testing settings
+    train_cfg=dict(
+        rpn=dict(
+            assigner=[
+                dict(  # for Pedestrian
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1),
+                dict(  # for Cyclist
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1),
+                dict(  # for Car
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.45,
+                    min_pos_iou=0.45,
+                    ignore_iof_thr=-1)
+            ],
+            allowed_border=0,
+            pos_weight=-1,
+            debug=False),
+        rpn_proposal=dict(
+            nms_pre=9000,
+            nms_post=512,
+            max_num=512,
+            nms_thr=0.8,
+            score_thr=0,
+            use_rotate_nms=False),
+        rcnn=dict(
+            assigner=[
+                dict(  # for Pedestrian
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(
+                        type='BboxOverlaps3D', coordinate='lidar'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.55,
+                    min_pos_iou=0.55,
+                    ignore_iof_thr=-1),
+                dict(  # for Cyclist
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(
+                        type='BboxOverlaps3D', coordinate='lidar'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.55,
+                    min_pos_iou=0.55,
+                    ignore_iof_thr=-1),
+                dict(  # for Car
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(
+                        type='BboxOverlaps3D', coordinate='lidar'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.55,
+                    min_pos_iou=0.55,
+                    ignore_iof_thr=-1)
+            ],
+            sampler=dict(
+                type='IoUNegPiecewiseSampler',
+                num=128,
+                pos_fraction=0.55,
+                neg_piece_fractions=[0.8, 0.2],
+                neg_iou_piece_thrs=[0.55, 0.1],
+                neg_pos_ub=-1,
+                add_gt_as_proposals=False,
+                return_iou=True),
+            cls_pos_thr=0.75,
+            cls_neg_thr=0.25)),
+    test_cfg=dict(
+        rpn=dict(
+            nms_pre=1024,
+            nms_post=100,
+            max_num=100,
+            nms_thr=0.7,
+            score_thr=0,
+            use_rotate_nms=True),
+        rcnn=dict(
+            use_rotate_nms=True,
+            use_raw_score=True,
+            nms_thr=0.01,
+            score_thr=0.3)))
 
 # dataset settings
 dataset_type = 'KittiDataset'
diff --git a/configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py b/configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py
index 4f4df6ce4f..1def914403 100644
--- a/configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py
+++ b/configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py
@@ -50,27 +50,27 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    assigner=dict(
-        type='MaxIoUAssigner',
-        iou_calculator=dict(type='BboxOverlapsNearest3D'),
-        pos_iou_thr=0.6,
-        neg_iou_thr=0.45,
-        min_pos_iou=0.45,
-        ignore_iof_thr=-1),
-    allowed_border=0,
-    pos_weight=-1,
-    debug=False)
-test_cfg = dict(
-    use_rotate_nms=True,
-    nms_across_levels=False,
-    nms_thr=0.01,
-    score_thr=0.1,
-    min_bbox_size=0,
-    nms_pre=100,
-    max_num=50)
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        assigner=dict(
+            type='MaxIoUAssigner',
+            iou_calculator=dict(type='BboxOverlapsNearest3D'),
+            pos_iou_thr=0.6,
+            neg_iou_thr=0.45,
+            min_pos_iou=0.45,
+            ignore_iof_thr=-1),
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        use_rotate_nms=True,
+        nms_across_levels=False,
+        nms_thr=0.01,
+        score_thr=0.1,
+        min_bbox_size=0,
+        nms_pre=100,
+        max_num=50))
 
 # dataset settings
 dataset_type = 'KittiDataset'
diff --git a/configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py b/configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py
index 9d12089f25..708754a6fb 100644
--- a/configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py
+++ b/configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py
@@ -63,43 +63,42 @@
         loss_dir=dict(
             type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2),
     ),
-)
-# model training and testing settings
-train_cfg = dict(
-    assigner=[
-        dict(  # for Pedestrian
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.35,
-            min_pos_iou=0.35,
-            ignore_iof_thr=-1),
-        dict(  # for Cyclist
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.35,
-            min_pos_iou=0.35,
-            ignore_iof_thr=-1),
-        dict(  # for Car
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.6,
-            neg_iou_thr=0.45,
-            min_pos_iou=0.45,
-            ignore_iof_thr=-1),
-    ],
-    allowed_border=0,
-    pos_weight=-1,
-    debug=False)
-test_cfg = dict(
-    use_rotate_nms=True,
-    nms_across_levels=False,
-    nms_thr=0.01,
-    score_thr=0.1,
-    min_bbox_size=0,
-    nms_pre=100,
-    max_num=50)
+    # model training and testing settings
+    train_cfg=dict(
+        assigner=[
+            dict(  # for Pedestrian
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(  # for Cyclist
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(  # for Car
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1),
+        ],
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        use_rotate_nms=True,
+        nms_across_levels=False,
+        nms_thr=0.01,
+        score_thr=0.1,
+        min_bbox_size=0,
+        nms_pre=100,
+        max_num=50))
 
 # dataset settings
 dataset_type = 'KittiDataset'
diff --git a/configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py b/configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py
index 3c1fff703c..862e4f673f 100644
--- a/configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py
+++ b/configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py
@@ -52,43 +52,43 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    assigner=[
-        dict(  # for Pedestrian
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.35,
-            min_pos_iou=0.35,
-            ignore_iof_thr=-1),
-        dict(  # for Cyclist
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.35,
-            min_pos_iou=0.35,
-            ignore_iof_thr=-1),
-        dict(  # for Car
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.6,
-            neg_iou_thr=0.45,
-            min_pos_iou=0.45,
-            ignore_iof_thr=-1),
-    ],
-    allowed_border=0,
-    pos_weight=-1,
-    debug=False)
-test_cfg = dict(
-    use_rotate_nms=True,
-    nms_across_levels=False,
-    nms_thr=0.01,
-    score_thr=0.1,
-    min_bbox_size=0,
-    nms_pre=100,
-    max_num=50)
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        assigner=[
+            dict(  # for Pedestrian
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(  # for Cyclist
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.35,
+                min_pos_iou=0.35,
+                ignore_iof_thr=-1),
+            dict(  # for Car
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1),
+        ],
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        use_rotate_nms=True,
+        nms_across_levels=False,
+        nms_thr=0.01,
+        score_thr=0.1,
+        min_bbox_size=0,
+        nms_pre=100,
+        max_num=50))
 
 # dataset settings
 dataset_type = 'KittiDataset'
diff --git a/configs/centerpoint/README.md b/configs/centerpoint/README.md
index 80c4feb073..03372ce610 100644
--- a/configs/centerpoint/README.md
+++ b/configs/centerpoint/README.md
@@ -47,10 +47,11 @@ For example, we change `centerpoint_0075voxel_second_secfpn_circlenms_4x8_cyclic
 _base_ = './centerpoint_0075voxel_second_secfpn_circlenms' \
          '_4x8_cyclic_20e_nus.py'
 
-test_cfg = dict(
-    pts=dict(
-        use_rotate_nms=True,
-        max_num=83))
+model = dict(
+    test_cfg=dict(
+        pts=dict(
+            use_rotate_nms=True,
+            max_num=83)))
 
 point_cloud_range = [-54, -54, -5.0, 54, 54, 3.0]
 file_client_args = dict(backend='disk')
diff --git a/configs/centerpoint/centerpoint_0075voxel_second_secfpn_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_0075voxel_second_secfpn_4x8_cyclic_20e_nus.py
index 887fe7e6a9..f17d98effd 100644
--- a/configs/centerpoint/centerpoint_0075voxel_second_secfpn_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_0075voxel_second_secfpn_4x8_cyclic_20e_nus.py
@@ -16,16 +16,14 @@
     pts_middle_encoder=dict(sparse_shape=[41, 1440, 1440]),
     pts_bbox_head=dict(
         bbox_coder=dict(
-            voxel_size=voxel_size[:2], pc_range=point_cloud_range[:2])))
-
-train_cfg = dict(
-    pts=dict(
-        grid_size=[1440, 1440, 40],
-        voxel_size=voxel_size,
-        point_cloud_range=point_cloud_range))
-
-test_cfg = dict(
-    pts=dict(voxel_size=voxel_size[:2], pc_range=point_cloud_range[:2]))
+            voxel_size=voxel_size[:2], pc_range=point_cloud_range[:2])),
+    train_cfg=dict(
+        pts=dict(
+            grid_size=[1440, 1440, 40],
+            voxel_size=voxel_size,
+            point_cloud_range=point_cloud_range)),
+    test_cfg=dict(
+        pts=dict(voxel_size=voxel_size[:2], pc_range=point_cloud_range[:2])))
 
 dataset_type = 'NuScenesDataset'
 data_root = 'data/nuscenes/'
@@ -62,6 +60,7 @@
         traffic_cone=2),
     points_loader=dict(
         type='LoadPointsFromFile',
+        coord_type='LIDAR',
         load_dim=5,
         use_dim=[0, 1, 2, 3, 4],
         file_client_args=file_client_args))
diff --git a/configs/centerpoint/centerpoint_0075voxel_second_secfpn_circlenms_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_0075voxel_second_secfpn_circlenms_4x8_cyclic_20e_nus.py
index 140d6bc857..1541a10240 100644
--- a/configs/centerpoint/centerpoint_0075voxel_second_secfpn_circlenms_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_0075voxel_second_secfpn_circlenms_4x8_cyclic_20e_nus.py
@@ -1,3 +1,3 @@
 _base_ = ['./centerpoint_0075voxel_second_secfpn_4x8_cyclic_20e_nus.py']
 
-test_cfg = dict(pts=dict(nms_type='circle'))
+model = dict(test_cfg=dict(pts=dict(nms_type='circle')))
diff --git a/configs/centerpoint/centerpoint_0075voxel_second_secfpn_dcn_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_0075voxel_second_secfpn_dcn_4x8_cyclic_20e_nus.py
index f5d8c58e70..e479650af4 100644
--- a/configs/centerpoint/centerpoint_0075voxel_second_secfpn_dcn_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_0075voxel_second_secfpn_dcn_4x8_cyclic_20e_nus.py
@@ -2,8 +2,8 @@
 
 model = dict(
     pts_bbox_head=dict(
-        seperate_head=dict(
-            type='DCNSeperateHead',
+        separate_head=dict(
+            type='DCNSeparateHead',
             dcn_config=dict(
                 type='DCN',
                 in_channels=64,
diff --git a/configs/centerpoint/centerpoint_0075voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_0075voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
index dd5e67645b..1e7d14e26a 100644
--- a/configs/centerpoint/centerpoint_0075voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_0075voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
@@ -2,8 +2,8 @@
 
 model = dict(
     pts_bbox_head=dict(
-        seperate_head=dict(
-            type='DCNSeperateHead',
+        separate_head=dict(
+            type='DCNSeparateHead',
             dcn_config=dict(
                 type='DCN',
                 in_channels=64,
@@ -12,6 +12,5 @@
                 padding=1,
                 groups=4),
             init_bias=-2.19,
-            final_kernel=3)))
-
-test_cfg = dict(pts=dict(nms_type='circle'))
+            final_kernel=3)),
+    test_cfg=dict(pts=dict(nms_type='circle')))
diff --git a/configs/centerpoint/centerpoint_01voxel_second_secfpn_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_01voxel_second_secfpn_4x8_cyclic_20e_nus.py
index 1ff069763d..5d2d0128c5 100644
--- a/configs/centerpoint/centerpoint_01voxel_second_secfpn_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_01voxel_second_secfpn_4x8_cyclic_20e_nus.py
@@ -15,10 +15,10 @@
 
 model = dict(
     pts_voxel_layer=dict(point_cloud_range=point_cloud_range),
-    pts_bbox_head=dict(bbox_coder=dict(pc_range=point_cloud_range[:2])))
-# model training and testing settings
-train_cfg = dict(pts=dict(point_cloud_range=point_cloud_range))
-test_cfg = dict(pts=dict(pc_range=point_cloud_range[:2]))
+    pts_bbox_head=dict(bbox_coder=dict(pc_range=point_cloud_range[:2])),
+    # model training and testing settings
+    train_cfg=dict(pts=dict(point_cloud_range=point_cloud_range)),
+    test_cfg=dict(pts=dict(pc_range=point_cloud_range[:2])))
 
 dataset_type = 'NuScenesDataset'
 data_root = 'data/nuscenes/'
@@ -55,6 +55,7 @@
         traffic_cone=2),
     points_loader=dict(
         type='LoadPointsFromFile',
+        coord_type='LIDAR',
         load_dim=5,
         use_dim=[0, 1, 2, 3, 4],
         file_client_args=file_client_args))
diff --git a/configs/centerpoint/centerpoint_01voxel_second_secfpn_circlenms_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_01voxel_second_secfpn_circlenms_4x8_cyclic_20e_nus.py
index 240f948837..ae560321cb 100644
--- a/configs/centerpoint/centerpoint_01voxel_second_secfpn_circlenms_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_01voxel_second_secfpn_circlenms_4x8_cyclic_20e_nus.py
@@ -1,3 +1,3 @@
 _base_ = ['./centerpoint_01voxel_second_secfpn_4x8_cyclic_20e_nus.py']
 
-test_cfg = dict(pts=dict(nms_type='circle'))
+model = dict(test_cfg=dict(pts=dict(nms_type='circle')))
diff --git a/configs/centerpoint/centerpoint_01voxel_second_secfpn_dcn_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_01voxel_second_secfpn_dcn_4x8_cyclic_20e_nus.py
index ad7e8c75b1..5f31c44173 100644
--- a/configs/centerpoint/centerpoint_01voxel_second_secfpn_dcn_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_01voxel_second_secfpn_dcn_4x8_cyclic_20e_nus.py
@@ -2,8 +2,8 @@
 
 model = dict(
     pts_bbox_head=dict(
-        seperate_head=dict(
-            type='DCNSeperateHead',
+        separate_head=dict(
+            type='DCNSeparateHead',
             dcn_config=dict(
                 type='DCN',
                 in_channels=64,
diff --git a/configs/centerpoint/centerpoint_01voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_01voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
index 41770e713e..cc5488e0d5 100644
--- a/configs/centerpoint/centerpoint_01voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_01voxel_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
@@ -2,8 +2,8 @@
 
 model = dict(
     pts_bbox_head=dict(
-        seperate_head=dict(
-            type='DCNSeperateHead',
+        separate_head=dict(
+            type='DCNSeparateHead',
             dcn_config=dict(
                 type='DCN',
                 in_channels=64,
@@ -12,6 +12,5 @@
                 padding=1,
                 groups=4),
             init_bias=-2.19,
-            final_kernel=3)))
-
-test_cfg = dict(pts=dict(nms_type='circle'))
+            final_kernel=3)),
+    test_cfg=dict(pts=dict(nms_type='circle')))
diff --git a/configs/centerpoint/centerpoint_02pillar_second_secfpn_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_02pillar_second_secfpn_4x8_cyclic_20e_nus.py
index efd9061a59..691349f740 100644
--- a/configs/centerpoint/centerpoint_02pillar_second_secfpn_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_02pillar_second_secfpn_4x8_cyclic_20e_nus.py
@@ -16,10 +16,10 @@
 model = dict(
     pts_voxel_layer=dict(point_cloud_range=point_cloud_range),
     pts_voxel_encoder=dict(point_cloud_range=point_cloud_range),
-    pts_bbox_head=dict(bbox_coder=dict(pc_range=point_cloud_range[:2])))
-# model training and testing settings
-train_cfg = dict(pts=dict(point_cloud_range=point_cloud_range))
-test_cfg = dict(pts=dict(pc_range=point_cloud_range[:2]))
+    pts_bbox_head=dict(bbox_coder=dict(pc_range=point_cloud_range[:2])),
+    # model training and testing settings
+    train_cfg=dict(pts=dict(point_cloud_range=point_cloud_range)),
+    test_cfg=dict(pts=dict(pc_range=point_cloud_range[:2])))
 
 dataset_type = 'NuScenesDataset'
 data_root = 'data/nuscenes/'
@@ -56,6 +56,7 @@
         traffic_cone=2),
     points_loader=dict(
         type='LoadPointsFromFile',
+        coord_type='LIDAR',
         load_dim=5,
         use_dim=[0, 1, 2, 3, 4],
         file_client_args=file_client_args))
diff --git a/configs/centerpoint/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus.py
index 9ae815c31d..67a1cf6e7f 100644
--- a/configs/centerpoint/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus.py
@@ -1,3 +1,3 @@
 _base_ = ['./centerpoint_02pillar_second_secfpn_4x8_cyclic_20e_nus.py']
 
-test_cfg = dict(pts=dict(nms_type='circle'))
+model = dict(test_cfg=dict(pts=dict(nms_type='circle')))
diff --git a/configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_4x8_cyclic_20e_nus.py
index 555e9543de..e69489215f 100644
--- a/configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_4x8_cyclic_20e_nus.py
@@ -2,8 +2,8 @@
 
 model = dict(
     pts_bbox_head=dict(
-        seperate_head=dict(
-            type='DCNSeperateHead',
+        separate_head=dict(
+            type='DCNSeparateHead',
             dcn_config=dict(
                 type='DCN',
                 in_channels=64,
diff --git a/configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py b/configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
index c2d64fb047..c62488dfe5 100644
--- a/configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
+++ b/configs/centerpoint/centerpoint_02pillar_second_secfpn_dcn_circlenms_4x8_cyclic_20e_nus.py
@@ -2,8 +2,8 @@
 
 model = dict(
     pts_bbox_head=dict(
-        seperate_head=dict(
-            type='DCNSeperateHead',
+        separate_head=dict(
+            type='DCNSeparateHead',
             dcn_config=dict(
                 type='DCN',
                 in_channels=64,
@@ -12,6 +12,5 @@
                 padding=1,
                 groups=4),
             init_bias=-2.19,
-            final_kernel=3)))
-
-test_cfg = dict(pts=dict(nms_type='circle'))
+            final_kernel=3)),
+    test_cfg=dict(pts=dict(nms_type='circle')))
diff --git a/configs/free_anchor/README.md b/configs/free_anchor/README.md
index 5e2d4bd03f..0ab73f9656 100644
--- a/configs/free_anchor/README.md
+++ b/configs/free_anchor/README.md
@@ -70,10 +70,10 @@ model = dict(
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=0.8),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.25, 0.25]))
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg = dict(
+        pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.25, 0.25])))
 ```
 
 ## Results
diff --git a/configs/free_anchor/hv_pointpillars_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py b/configs/free_anchor/hv_pointpillars_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py
index 9834f90aaf..d0a989f121 100644
--- a/configs/free_anchor/hv_pointpillars_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py
+++ b/configs/free_anchor/hv_pointpillars_fpn_sbn-all_free-anchor_4x8_2x_nus-3d.py
@@ -42,7 +42,7 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=0.8),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.25, 0.25]))
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        pts=dict(code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.25, 0.25])))
diff --git a/configs/imvotenet/imvotenet_img_pretrain_4x2_sunrgbd-3d-10class.py b/configs/imvotenet/imvotenet_img_pretrain_4x2_sunrgbd-3d-10class.py
index 4b2f0c92a9..8b1f53c44e 100644
--- a/configs/imvotenet/imvotenet_img_pretrain_4x2_sunrgbd-3d-10class.py
+++ b/configs/imvotenet/imvotenet_img_pretrain_4x2_sunrgbd-3d-10class.py
@@ -65,73 +65,71 @@
             type='PointSAModule',
             pool_mod='max',
             use_xyz=True,
-            normalize_xyz=True)))
-
-# model training and testing settings
-train_cfg = dict(
-    img_rpn=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            pos_iou_thr=0.7,
-            neg_iou_thr=0.3,
-            min_pos_iou=0.3,
-            match_low_quality=True,
-            ignore_iof_thr=-1),
-        sampler=dict(
-            type='RandomSampler',
-            num=256,
-            pos_fraction=0.5,
-            neg_pos_ub=-1,
-            add_gt_as_proposals=False),
-        allowed_border=-1,
-        pos_weight=-1,
-        debug=False),
-    img_rpn_proposal=dict(
-        nms_across_levels=False,
-        nms_pre=2000,
-        nms_post=1000,
-        max_num=1000,
-        nms_thr=0.7,
-        min_bbox_size=0),
-    img_rcnn=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            pos_iou_thr=0.5,
-            neg_iou_thr=0.5,
-            min_pos_iou=0.5,
-            match_low_quality=False,
-            ignore_iof_thr=-1),
-        sampler=dict(
-            type='RandomSampler',
-            num=512,
-            pos_fraction=0.25,
-            neg_pos_ub=-1,
-            add_gt_as_proposals=True),
-        pos_weight=-1,
-        debug=False),
-    pos_distance_thr=0.3,
-    neg_distance_thr=0.6,
-    sample_mod='vote')
-
-test_cfg = dict(
-    img_rpn=dict(
-        nms_across_levels=False,
-        nms_pre=1000,
-        nms_post=1000,
-        max_num=1000,
-        nms_thr=0.7,
-        min_bbox_size=0),
-    img_rcnn=dict(
-        score_thr=0.05,
-        nms=dict(type='nms', iou_threshold=0.5),
-        max_per_img=100),
-    # soft-nms is also supported for rcnn testing
-    # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
-    pts=dict(
-        sample_mod='seed',
-        nms_thr=0.25,
-        score_thr=0.05,
-        per_class_proposal=True))
+            normalize_xyz=True)),
+    # model training and testing settings
+    train_cfg=dict(
+        img_rpn=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                pos_iou_thr=0.7,
+                neg_iou_thr=0.3,
+                min_pos_iou=0.3,
+                match_low_quality=True,
+                ignore_iof_thr=-1),
+            sampler=dict(
+                type='RandomSampler',
+                num=256,
+                pos_fraction=0.5,
+                neg_pos_ub=-1,
+                add_gt_as_proposals=False),
+            allowed_border=-1,
+            pos_weight=-1,
+            debug=False),
+        img_rpn_proposal=dict(
+            nms_across_levels=False,
+            nms_pre=2000,
+            nms_post=1000,
+            max_num=1000,
+            nms_thr=0.7,
+            min_bbox_size=0),
+        img_rcnn=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                pos_iou_thr=0.5,
+                neg_iou_thr=0.5,
+                min_pos_iou=0.5,
+                match_low_quality=False,
+                ignore_iof_thr=-1),
+            sampler=dict(
+                type='RandomSampler',
+                num=512,
+                pos_fraction=0.25,
+                neg_pos_ub=-1,
+                add_gt_as_proposals=True),
+            pos_weight=-1,
+            debug=False),
+        pos_distance_thr=0.3,
+        neg_distance_thr=0.6,
+        sample_mod='vote'),
+    test_cfg=dict(
+        img_rpn=dict(
+            nms_across_levels=False,
+            nms_pre=1000,
+            nms_post=1000,
+            max_num=1000,
+            nms_thr=0.7,
+            min_bbox_size=0),
+        img_rcnn=dict(
+            score_thr=0.05,
+            nms=dict(type='nms', iou_threshold=0.5),
+            max_per_img=100),
+        # soft-nms is also supported for rcnn testing
+        # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
+        pts=dict(
+            sample_mod='seed',
+            nms_thr=0.25,
+            score_thr=0.05,
+            per_class_proposal=True)))
 
 dataset_type = 'SUNRGBDDataset'
 data_root = 'data/sunrgbd/'
diff --git a/configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py b/configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py
index 453662d8a4..4ea320cd3f 100644
--- a/configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py
+++ b/configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py
@@ -87,45 +87,45 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-# model training and testing settings
-train_cfg = dict(
-    pts=dict(
-        assigner=[
-            dict(  # for Pedestrian
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.35,
-                neg_iou_thr=0.2,
-                min_pos_iou=0.2,
-                ignore_iof_thr=-1),
-            dict(  # for Cyclist
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.35,
-                neg_iou_thr=0.2,
-                min_pos_iou=0.2,
-                ignore_iof_thr=-1),
-            dict(  # for Car
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.6,
-                neg_iou_thr=0.45,
-                min_pos_iou=0.45,
-                ignore_iof_thr=-1),
-        ],
-        allowed_border=0,
-        pos_weight=-1,
-        debug=False))
-test_cfg = dict(
-    pts=dict(
-        use_rotate_nms=True,
-        nms_across_levels=False,
-        nms_thr=0.01,
-        score_thr=0.1,
-        min_bbox_size=0,
-        nms_pre=100,
-        max_num=50))
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        pts=dict(
+            assigner=[
+                dict(  # for Pedestrian
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.35,
+                    neg_iou_thr=0.2,
+                    min_pos_iou=0.2,
+                    ignore_iof_thr=-1),
+                dict(  # for Cyclist
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.35,
+                    neg_iou_thr=0.2,
+                    min_pos_iou=0.2,
+                    ignore_iof_thr=-1),
+                dict(  # for Car
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.45,
+                    min_pos_iou=0.45,
+                    ignore_iof_thr=-1),
+            ],
+            allowed_border=0,
+            pos_weight=-1,
+            debug=False)),
+    test_cfg=dict(
+        pts=dict(
+            use_rotate_nms=True,
+            nms_across_levels=False,
+            nms_thr=0.01,
+            score_thr=0.1,
+            min_bbox_size=0,
+            nms_pre=100,
+            max_num=50)))
 
 # dataset settings
 dataset_type = 'KittiDataset'
diff --git a/configs/nuimages/htc_without_semantic_r50_fpn_1x_nuim.py b/configs/nuimages/htc_without_semantic_r50_fpn_1x_nuim.py
index 67f2761792..257cfa3563 100644
--- a/configs/nuimages/htc_without_semantic_r50_fpn_1x_nuim.py
+++ b/configs/nuimages/htc_without_semantic_r50_fpn_1x_nuim.py
@@ -130,92 +130,92 @@
                 num_classes=10,
                 loss_mask=dict(
                     type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))
-        ]))
-# model training and testing settings
-train_cfg = dict(
-    rpn=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            pos_iou_thr=0.7,
-            neg_iou_thr=0.3,
-            min_pos_iou=0.3,
-            ignore_iof_thr=-1),
-        sampler=dict(
-            type='RandomSampler',
-            num=256,
-            pos_fraction=0.5,
-            neg_pos_ub=-1,
-            add_gt_as_proposals=False),
-        allowed_border=0,
-        pos_weight=-1,
-        debug=False),
-    rpn_proposal=dict(
-        nms_across_levels=False,
-        nms_pre=2000,
-        nms_post=2000,
-        max_num=2000,
-        nms_thr=0.7,
-        min_bbox_size=0),
-    rcnn=[
-        dict(
-            assigner=dict(
-                type='MaxIoUAssigner',
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.5,
-                min_pos_iou=0.5,
-                ignore_iof_thr=-1),
-            sampler=dict(
-                type='RandomSampler',
-                num=512,
-                pos_fraction=0.25,
-                neg_pos_ub=-1,
-                add_gt_as_proposals=True),
-            mask_size=28,
-            pos_weight=-1,
-            debug=False),
-        dict(
-            assigner=dict(
-                type='MaxIoUAssigner',
-                pos_iou_thr=0.6,
-                neg_iou_thr=0.6,
-                min_pos_iou=0.6,
-                ignore_iof_thr=-1),
-            sampler=dict(
-                type='RandomSampler',
-                num=512,
-                pos_fraction=0.25,
-                neg_pos_ub=-1,
-                add_gt_as_proposals=True),
-            mask_size=28,
-            pos_weight=-1,
-            debug=False),
-        dict(
+        ]),
+    # model training and testing settings
+    train_cfg=dict(
+        rpn=dict(
             assigner=dict(
                 type='MaxIoUAssigner',
                 pos_iou_thr=0.7,
-                neg_iou_thr=0.7,
-                min_pos_iou=0.7,
+                neg_iou_thr=0.3,
+                min_pos_iou=0.3,
                 ignore_iof_thr=-1),
             sampler=dict(
                 type='RandomSampler',
-                num=512,
-                pos_fraction=0.25,
+                num=256,
+                pos_fraction=0.5,
                 neg_pos_ub=-1,
-                add_gt_as_proposals=True),
-            mask_size=28,
+                add_gt_as_proposals=False),
+            allowed_border=0,
             pos_weight=-1,
-            debug=False)
-    ])
-test_cfg = dict(
-    rpn=dict(
-        nms_across_levels=False,
-        nms_pre=1000,
-        nms_post=1000,
-        max_num=1000,
-        nms_thr=0.7,
-        min_bbox_size=0),
-    rcnn=dict(
-        score_thr=0.001,
-        nms=dict(type='nms', iou_threshold=0.5),
-        max_per_img=100,
-        mask_thr_binary=0.5))
+            debug=False),
+        rpn_proposal=dict(
+            nms_across_levels=False,
+            nms_pre=2000,
+            nms_post=2000,
+            max_num=2000,
+            nms_thr=0.7,
+            min_bbox_size=0),
+        rcnn=[
+            dict(
+                assigner=dict(
+                    type='MaxIoUAssigner',
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.5,
+                    min_pos_iou=0.5,
+                    ignore_iof_thr=-1),
+                sampler=dict(
+                    type='RandomSampler',
+                    num=512,
+                    pos_fraction=0.25,
+                    neg_pos_ub=-1,
+                    add_gt_as_proposals=True),
+                mask_size=28,
+                pos_weight=-1,
+                debug=False),
+            dict(
+                assigner=dict(
+                    type='MaxIoUAssigner',
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.6,
+                    min_pos_iou=0.6,
+                    ignore_iof_thr=-1),
+                sampler=dict(
+                    type='RandomSampler',
+                    num=512,
+                    pos_fraction=0.25,
+                    neg_pos_ub=-1,
+                    add_gt_as_proposals=True),
+                mask_size=28,
+                pos_weight=-1,
+                debug=False),
+            dict(
+                assigner=dict(
+                    type='MaxIoUAssigner',
+                    pos_iou_thr=0.7,
+                    neg_iou_thr=0.7,
+                    min_pos_iou=0.7,
+                    ignore_iof_thr=-1),
+                sampler=dict(
+                    type='RandomSampler',
+                    num=512,
+                    pos_fraction=0.25,
+                    neg_pos_ub=-1,
+                    add_gt_as_proposals=True),
+                mask_size=28,
+                pos_weight=-1,
+                debug=False)
+        ]),
+    test_cfg=dict(
+        rpn=dict(
+            nms_across_levels=False,
+            nms_pre=1000,
+            nms_post=1000,
+            max_num=1000,
+            nms_thr=0.7,
+            min_bbox_size=0),
+        rcnn=dict(
+            score_thr=0.001,
+            nms=dict(type='nms', iou_threshold=0.5),
+            max_per_img=100,
+            mask_thr_binary=0.5)))
diff --git a/configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py b/configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py
index 50b885cf36..e874121ca5 100644
--- a/configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py
+++ b/configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py
@@ -112,88 +112,94 @@
                 type='CrossEntropyLoss',
                 use_sigmoid=True,
                 reduction='sum',
-                loss_weight=1.0))))
-# model training and testing settings
-train_cfg = dict(
-    rpn=dict(
-        assigner=[
-            dict(  # for Pedestrian
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.35,
-                min_pos_iou=0.35,
-                ignore_iof_thr=-1),
-            dict(  # for Cyclist
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.35,
-                min_pos_iou=0.35,
-                ignore_iof_thr=-1),
-            dict(  # for Car
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.6,
-                neg_iou_thr=0.45,
-                min_pos_iou=0.45,
-                ignore_iof_thr=-1)
-        ],
-        allowed_border=0,
-        pos_weight=-1,
-        debug=False),
-    rpn_proposal=dict(
-        nms_pre=9000,
-        nms_post=512,
-        max_num=512,
-        nms_thr=0.8,
-        score_thr=0,
-        use_rotate_nms=False),
-    rcnn=dict(
-        assigner=[
-            dict(  # for Pedestrian
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.55,
-                min_pos_iou=0.55,
-                ignore_iof_thr=-1),
-            dict(  # for Cyclist
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.55,
-                min_pos_iou=0.55,
-                ignore_iof_thr=-1),
-            dict(  # for Car
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.55,
-                min_pos_iou=0.55,
-                ignore_iof_thr=-1)
-        ],
-        sampler=dict(
-            type='IoUNegPiecewiseSampler',
-            num=128,
-            pos_fraction=0.55,
-            neg_piece_fractions=[0.8, 0.2],
-            neg_iou_piece_thrs=[0.55, 0.1],
-            neg_pos_ub=-1,
-            add_gt_as_proposals=False,
-            return_iou=True),
-        cls_pos_thr=0.75,
-        cls_neg_thr=0.25))
-test_cfg = dict(
-    rpn=dict(
-        nms_pre=1024,
-        nms_post=100,
-        max_num=100,
-        nms_thr=0.7,
-        score_thr=0,
-        use_rotate_nms=True),
-    rcnn=dict(
-        use_rotate_nms=True, use_raw_score=True, nms_thr=0.01, score_thr=0.1))
+                loss_weight=1.0))),
+    # model training and testing settings
+    train_cfg=dict(
+        rpn=dict(
+            assigner=[
+                dict(  # for Pedestrian
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1),
+                dict(  # for Cyclist
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1),
+                dict(  # for Car
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.45,
+                    min_pos_iou=0.45,
+                    ignore_iof_thr=-1)
+            ],
+            allowed_border=0,
+            pos_weight=-1,
+            debug=False),
+        rpn_proposal=dict(
+            nms_pre=9000,
+            nms_post=512,
+            max_num=512,
+            nms_thr=0.8,
+            score_thr=0,
+            use_rotate_nms=False),
+        rcnn=dict(
+            assigner=[
+                dict(  # for Pedestrian
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(
+                        type='BboxOverlaps3D', coordinate='lidar'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.55,
+                    min_pos_iou=0.55,
+                    ignore_iof_thr=-1),
+                dict(  # for Cyclist
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(
+                        type='BboxOverlaps3D', coordinate='lidar'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.55,
+                    min_pos_iou=0.55,
+                    ignore_iof_thr=-1),
+                dict(  # for Car
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(
+                        type='BboxOverlaps3D', coordinate='lidar'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.55,
+                    min_pos_iou=0.55,
+                    ignore_iof_thr=-1)
+            ],
+            sampler=dict(
+                type='IoUNegPiecewiseSampler',
+                num=128,
+                pos_fraction=0.55,
+                neg_piece_fractions=[0.8, 0.2],
+                neg_iou_piece_thrs=[0.55, 0.1],
+                neg_pos_ub=-1,
+                add_gt_as_proposals=False,
+                return_iou=True),
+            cls_pos_thr=0.75,
+            cls_neg_thr=0.25)),
+    test_cfg=dict(
+        rpn=dict(
+            nms_pre=1024,
+            nms_post=100,
+            max_num=100,
+            nms_thr=0.7,
+            score_thr=0,
+            use_rotate_nms=True),
+        rcnn=dict(
+            use_rotate_nms=True,
+            use_raw_score=True,
+            nms_thr=0.01,
+            score_thr=0.1)))
 
 # dataset settings
 dataset_type = 'KittiDataset'
diff --git a/configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-car.py b/configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-car.py
index 10100d4bcd..a7a5c122bd 100644
--- a/configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-car.py
+++ b/configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-car.py
@@ -17,57 +17,60 @@
     roi_head=dict(
         num_classes=1,
         semantic_head=dict(num_classes=1),
-        bbox_head=dict(num_classes=1)))
-# model training and testing settings
-train_cfg = dict(
-    _delete_=True,
-    rpn=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.6,
-            neg_iou_thr=0.45,
-            min_pos_iou=0.45,
-            ignore_iof_thr=-1),
-        allowed_border=0,
-        pos_weight=-1,
-        debug=False),
-    rpn_proposal=dict(
-        nms_pre=9000,
-        nms_post=512,
-        max_num=512,
-        nms_thr=0.8,
-        score_thr=0,
-        use_rotate_nms=False),
-    rcnn=dict(
-        assigner=dict(  # for Car
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
-            pos_iou_thr=0.55,
-            neg_iou_thr=0.55,
-            min_pos_iou=0.55,
-            ignore_iof_thr=-1),
-        sampler=dict(
-            type='IoUNegPiecewiseSampler',
-            num=128,
-            pos_fraction=0.55,
-            neg_piece_fractions=[0.8, 0.2],
-            neg_iou_piece_thrs=[0.55, 0.1],
-            neg_pos_ub=-1,
-            add_gt_as_proposals=False,
-            return_iou=True),
-        cls_pos_thr=0.75,
-        cls_neg_thr=0.25))
-test_cfg = dict(
-    rpn=dict(
-        nms_pre=1024,
-        nms_post=100,
-        max_num=100,
-        nms_thr=0.7,
-        score_thr=0,
-        use_rotate_nms=True),
-    rcnn=dict(
-        use_rotate_nms=True, use_raw_score=True, nms_thr=0.01, score_thr=0.1))
+        bbox_head=dict(num_classes=1)),
+    # model training and testing settings
+    train_cfg=dict(
+        _delete_=True,
+        rpn=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.6,
+                neg_iou_thr=0.45,
+                min_pos_iou=0.45,
+                ignore_iof_thr=-1),
+            allowed_border=0,
+            pos_weight=-1,
+            debug=False),
+        rpn_proposal=dict(
+            nms_pre=9000,
+            nms_post=512,
+            max_num=512,
+            nms_thr=0.8,
+            score_thr=0,
+            use_rotate_nms=False),
+        rcnn=dict(
+            assigner=dict(  # for Car
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.55,
+                min_pos_iou=0.55,
+                ignore_iof_thr=-1),
+            sampler=dict(
+                type='IoUNegPiecewiseSampler',
+                num=128,
+                pos_fraction=0.55,
+                neg_piece_fractions=[0.8, 0.2],
+                neg_iou_piece_thrs=[0.55, 0.1],
+                neg_pos_ub=-1,
+                add_gt_as_proposals=False,
+                return_iou=True),
+            cls_pos_thr=0.75,
+            cls_neg_thr=0.25)),
+    test_cfg=dict(
+        rpn=dict(
+            nms_pre=1024,
+            nms_post=100,
+            max_num=100,
+            nms_thr=0.7,
+            score_thr=0,
+            use_rotate_nms=True),
+        rcnn=dict(
+            use_rotate_nms=True,
+            use_raw_score=True,
+            nms_thr=0.01,
+            score_thr=0.1)))
 
 # dataset settings
 dataset_type = 'KittiDataset'
diff --git a/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car.py b/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car.py
index b3d03d471c..1e0f0faf9b 100644
--- a/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car.py
+++ b/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car.py
@@ -12,20 +12,20 @@
             ranges=[[0, -39.68, -1.78, 69.12, 39.68, -1.78]],
             sizes=[[1.6, 3.9, 1.56]],
             rotations=[0, 1.57],
-            reshape_out=True)))
-# model training and testing settings
-train_cfg = dict(
-    _delete_=True,
-    assigner=dict(
-        type='MaxIoUAssigner',
-        iou_calculator=dict(type='BboxOverlapsNearest3D'),
-        pos_iou_thr=0.6,
-        neg_iou_thr=0.45,
-        min_pos_iou=0.45,
-        ignore_iof_thr=-1),
-    allowed_border=0,
-    pos_weight=-1,
-    debug=False)
+            reshape_out=True)),
+    # model training and testing settings
+    train_cfg=dict(
+        _delete_=True,
+        assigner=dict(
+            type='MaxIoUAssigner',
+            iou_calculator=dict(type='BboxOverlapsNearest3D'),
+            pos_iou_thr=0.6,
+            neg_iou_thr=0.45,
+            min_pos_iou=0.45,
+            ignore_iof_thr=-1),
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False))
 
 # dataset settings
 dataset_type = 'KittiDataset'
diff --git a/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymo-3d-car.py b/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymo-3d-car.py
index 023d8b4ae8..aeac750d9e 100644
--- a/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymo-3d-car.py
+++ b/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymo-3d-car.py
@@ -19,20 +19,19 @@
             ranges=[[-74.88, -74.88, -0.0345, 74.88, 74.88, -0.0345]],
             sizes=[[2.08, 4.73, 1.77]],
             rotations=[0, 1.57],
-            reshape_out=True)))
-
-# model training and testing settings
-train_cfg = dict(
-    _delete_=True,
-    pts=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.55,
-            neg_iou_thr=0.4,
-            min_pos_iou=0.4,
-            ignore_iof_thr=-1),
-        allowed_border=0,
-        code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
-        pos_weight=-1,
-        debug=False))
+            reshape_out=True)),
+    # model training and testing settings
+    train_cfg=dict(
+        _delete_=True,
+        pts=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.4,
+                min_pos_iou=0.4,
+                ignore_iof_thr=-1),
+            allowed_border=0,
+            code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
+            pos_weight=-1,
+            debug=False)))
diff --git a/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-car.py b/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-car.py
index 21e267d0c3..1fe32fd404 100644
--- a/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-car.py
+++ b/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-car.py
@@ -16,20 +16,19 @@
             ranges=[[-74.88, -74.88, -0.0345, 74.88, 74.88, -0.0345]],
             sizes=[[2.08, 4.73, 1.77]],
             rotations=[0, 1.57],
-            reshape_out=True)))
-
-# model training and testing settings
-train_cfg = dict(
-    _delete_=True,
-    pts=dict(
-        assigner=dict(
-            type='MaxIoUAssigner',
-            iou_calculator=dict(type='BboxOverlapsNearest3D'),
-            pos_iou_thr=0.55,
-            neg_iou_thr=0.4,
-            min_pos_iou=0.4,
-            ignore_iof_thr=-1),
-        allowed_border=0,
-        code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
-        pos_weight=-1,
-        debug=False))
+            reshape_out=True)),
+    # model training and testing settings
+    train_cfg=dict(
+        _delete_=True,
+        pts=dict(
+            assigner=dict(
+                type='MaxIoUAssigner',
+                iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                pos_iou_thr=0.55,
+                neg_iou_thr=0.4,
+                min_pos_iou=0.4,
+                ignore_iof_thr=-1),
+            allowed_border=0,
+            code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
+            pos_weight=-1,
+            debug=False)))
diff --git a/configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py b/configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py
index a43fb8707b..c4f2ffd51a 100644
--- a/configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py
+++ b/configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py
@@ -14,17 +14,17 @@
             ranges=[[0, -40.0, -1.78, 70.4, 40.0, -1.78]],
             sizes=[[1.6, 3.9, 1.56]],
             rotations=[0, 1.57],
-            reshape_out=True)))
-# model training and testing settings
-train_cfg = dict(
-    _delete_=True,
-    assigner=dict(
-        type='MaxIoUAssigner',
-        iou_calculator=dict(type='BboxOverlapsNearest3D'),
-        pos_iou_thr=0.6,
-        neg_iou_thr=0.45,
-        min_pos_iou=0.45,
-        ignore_iof_thr=-1),
-    allowed_border=0,
-    pos_weight=-1,
-    debug=False)
+            reshape_out=True)),
+    # model training and testing settings
+    train_cfg=dict(
+        _delete_=True,
+        assigner=dict(
+            type='MaxIoUAssigner',
+            iou_calculator=dict(type='BboxOverlapsNearest3D'),
+            pos_iou_thr=0.6,
+            neg_iou_thr=0.45,
+            min_pos_iou=0.45,
+            ignore_iof_thr=-1),
+        allowed_border=0,
+        pos_weight=-1,
+        debug=False))
diff --git a/configs/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d.py b/configs/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d.py
index f191b99ce7..18b658b0c3 100644
--- a/configs/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d.py
+++ b/configs/ssn/hv_ssn_secfpn_sbn-all_2x16_2x_nus-3d.py
@@ -155,85 +155,84 @@
             loss_weight=1.0),
         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
         loss_dir=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)))
-
-# model training and testing settings
-train_cfg = dict(
-    _delete_=True,
-    pts=dict(
-        assigner=[
-            dict(  # bicycle
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.35,
-                min_pos_iou=0.35,
-                ignore_iof_thr=-1),
-            dict(  # motorcycle
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.3,
-                min_pos_iou=0.3,
-                ignore_iof_thr=-1),
-            dict(  # pedestrian
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.6,
-                neg_iou_thr=0.4,
-                min_pos_iou=0.4,
-                ignore_iof_thr=-1),
-            dict(  # traffic cone
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.6,
-                neg_iou_thr=0.4,
-                min_pos_iou=0.4,
-                ignore_iof_thr=-1),
-            dict(  # barrier
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.4,
-                min_pos_iou=0.4,
-                ignore_iof_thr=-1),
-            dict(  # car
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.6,
-                neg_iou_thr=0.45,
-                min_pos_iou=0.45,
-                ignore_iof_thr=-1),
-            dict(  # truck
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.4,
-                min_pos_iou=0.4,
-                ignore_iof_thr=-1),
-            dict(  # trailer
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.35,
-                min_pos_iou=0.35,
-                ignore_iof_thr=-1),
-            dict(  # bus
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.55,
-                neg_iou_thr=0.4,
-                min_pos_iou=0.4,
-                ignore_iof_thr=-1),
-            dict(  # construction vehicle
-                type='MaxIoUAssigner',
-                iou_calculator=dict(type='BboxOverlapsNearest3D'),
-                pos_iou_thr=0.5,
-                neg_iou_thr=0.35,
-                min_pos_iou=0.35,
-                ignore_iof_thr=-1)
-        ],
-        allowed_border=0,
-        code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
-        pos_weight=-1,
-        debug=False))
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
+    # model training and testing settings
+    train_cfg=dict(
+        _delete_=True,
+        pts=dict(
+            assigner=[
+                dict(  # bicycle
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1),
+                dict(  # motorcycle
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.3,
+                    min_pos_iou=0.3,
+                    ignore_iof_thr=-1),
+                dict(  # pedestrian
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # traffic cone
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # barrier
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # car
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.6,
+                    neg_iou_thr=0.45,
+                    min_pos_iou=0.45,
+                    ignore_iof_thr=-1),
+                dict(  # truck
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # trailer
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1),
+                dict(  # bus
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.55,
+                    neg_iou_thr=0.4,
+                    min_pos_iou=0.4,
+                    ignore_iof_thr=-1),
+                dict(  # construction vehicle
+                    type='MaxIoUAssigner',
+                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
+                    pos_iou_thr=0.5,
+                    neg_iou_thr=0.35,
+                    min_pos_iou=0.35,
+                    ignore_iof_thr=-1)
+            ],
+            allowed_border=0,
+            code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
+            pos_weight=-1,
+            debug=False)))
diff --git a/docs/changelog.md b/docs/changelog.md
index 9942affa6b..7077e5809b 100644
--- a/docs/changelog.md
+++ b/docs/changelog.md
@@ -1,5 +1,31 @@
 ## Changelog
 
+### v0.11.0 (1/3/2021)
+
+#### Highlights
+
+- Support more friendly visualization interfaces based on open3d
+- Support a faster and more memory-efficient implementation of DynamicScatter
+- Refactor unit tests and details of configs
+
+#### Bug Fixes
+
+- Fix an unsupported bias setting in the unit test for centerpoint head (#304)
+- Fix errors due to typos in the centerpoint head (#308)
+- Fix a minor bug in [points_in_boxes.py](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/ops/roiaware_pool3d/points_in_boxes.py) when tensors are not in the same device. (#317)
+- Fix warning of deprecated usages of nonzero during training with pytorch 1.6 (#330)
+
+#### New Features
+
+- Support new visualization methods based on open3d (#284, #323)
+
+#### Improvements
+
+- Refactor unit tests (#303)
+- Move the key `train_cfg` and `test_cfg` into the model configs (#307)
+- Update [README](https://github.com/open-mmlab/mmdetection3d/blob/master/README.md) with [Chinese version](https://github.com/open-mmlab/mmdetection3d/blob/master/README_zh-CN.md) and [instructions for getting started](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/getting_started.md). (#310, #316)
+- Support a faster and more memory-efficient implementation of DynamicScatter (#318, #326)
+
 ### v0.10.0 (1/2/2021)
 
 #### Highlights
diff --git a/docs/faq.md b/docs/faq.md
new file mode 100644
index 0000000000..bb0479c93b
--- /dev/null
+++ b/docs/faq.md
@@ -0,0 +1,18 @@
+# FAQ
+
+We list some potential troubles encountered by users and developers, along with their corresponding solutions. Feel free to enrich the list if you find any frequent issues and contribute your solutions to solve them. If you have any trouble with environment configuration, model training, etc, please create an issue using the [provided templates](https://github.com/open-mmlab/mmdetection3d/blob/master/.github/ISSUE_TEMPLATE/error-report.md) and fill in all required information in the template.
+
+## MMCV/MMDet/MMDet3D Installation
+
+- If you faced the error shown below when importing open3d:
+
+  ``OSError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found``
+
+  please downgrade open3d to 0.9.0.0, because the latest open3d needs the support of file 'GLIBC_2.27', which only exists in Ubuntu 18.04, not in Ubuntu 16.04.
+
+- If you faced the error when importing pycocotools, this is because nuscenes-devkit installs pycocotools but mmdet relies on mmpycocotools. The current workaround is as below. We will migrate to use pycocotools in the future.
+
+  ```shell
+  pip uninstall pycocotools mmpycocotools
+  pip install mmpycocotools
+  ```
diff --git a/docs/getting_started.md b/docs/getting_started.md
index dad0149df8..6af420baea 100644
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@@ -5,7 +5,21 @@
 - PyTorch 1.3+
 - CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
 - GCC 5+
-- [mmcv](https://github.com/open-mmlab/mmcv)
+- [MMCV](https://mmcv.readthedocs.io/en/latest/#installation)
+
+
+The required versions of MMCV and MMDetection for different versions of MMDetection3D are as below. Please install the correct version of MMCV and MMDetection to avoid installation issues.
+
+| MMDetection3D version | MMDetection version |    MMCV version     |
+|:-------------------:|:-------------------:|:-------------------:|
+| master              | mmdet>=2.5.0        | mmcv-full>=1.2.4, <=1.3|
+| 0.11.0              | mmdet>=2.5.0        | mmcv-full>=1.2.4, <=1.3|
+| 0.10.0              | mmdet>=2.5.0        | mmcv-full>=1.2.4, <=1.3|
+| 0.9.0               | mmdet>=2.5.0        | mmcv-full>=1.2.4, <=1.3|
+| 0.8.0               | mmdet>=2.5.0        | mmcv-full>=1.1.5, <=1.3|
+| 0.7.0               | mmdet>=2.5.0        | mmcv-full>=1.1.5, <=1.3|
+| 0.6.0               | mmdet>=2.4.0        | mmcv-full>=1.1.3, <=1.2|
+| 0.5.0               | 2.3.0               | mmcv-full==1.0.5|
 
 # Installation
 
@@ -45,18 +59,34 @@ If you build PyTorch from source instead of installing the prebuilt pacakge,
 you can use more CUDA versions such as 9.0.
 
 **c. Install [MMCV](https://mmcv.readthedocs.io/en/latest/).**
-*mmcv-full* is necessary since MMDetection3D relies on MMDetection, CUDA ops in *mmcv-full* are required. 
+*mmcv-full* is necessary since MMDetection3D relies on MMDetection, CUDA ops in *mmcv-full* are required.
 
 `e.g.` The pre-build *mmcv-full* could be installed by running: (available versions could be found [here](https://mmcv.readthedocs.io/en/latest/#install-with-pip))
 
+ ```shell
+pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
+```
+
+Please replace `{cu_version}` and `{torch_version}` in the url to your desired one. For example, to install the latest `mmcv-full` with `CUDA 11` and `PyTorch 1.7.0`, use the following command:
+
 ```shell
-pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.5.0/index.html
+pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
 ```
 
+See [here](https://github.com/open-mmlab/mmcv#install-with-pip) for different versions of MMCV compatible to different PyTorch and CUDA versions.
 Optionally, you could also build the full version from source:
 
 ```shell
-pip install mmcv-full # need a long time
+git clone https://github.com/open-mmlab/mmcv.git
+cd mmcv
+MMCV_WITH_OPS=1 pip install -e .  # package mmcv-full will be installed after this step
+cd ..
+```
+
+Or directly run
+
+```shell
+pip install mmcv-full
 ```
 
 **d. Install [MMDetection](https://github.com/open-mmlab/mmdetection).**
@@ -74,20 +104,6 @@ pip install -r requirements/build.txt
 pip install -v -e .  # or "python setup.py develop"
 ```
 
-**Important**:
-
-1. The required versions of MMCV and MMDetection for different versions of MMDetection3D are as below. Please install the correct version of MMCV and MMDetection to avoid installation issues.
-
-| MMDetection3D version | MMDetection version |    MMCV version     |
-|:-------------------:|:-------------------:|:-------------------:|
-| master              | mmdet>=2.5.0        | mmcv-full>=1.2.4, <=1.3|
-| 0.10.0              | mmdet>=2.5.0        | mmcv-full>=1.2.4, <=1.3|
-| 0.9.0               | mmdet>=2.5.0        | mmcv-full>=1.2.4, <=1.3|
-| 0.8.0               | mmdet>=2.5.0        | mmcv-full>=1.1.5, <=1.3|
-| 0.7.0               | mmdet>=2.5.0        | mmcv-full>=1.1.5, <=1.3|
-| 0.6.0               | mmdet>=2.4.0        | mmcv-full>=1.1.3, <=1.2|
-| 0.5.0               | 2.3.0               | mmcv-full==1.0.5|
-
 **e. Clone the MMDetection3D repository.**
 
 ```shell
@@ -177,7 +193,7 @@ PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
 
 ### Point cloud demo
 
-We provide a demo script to test a single sample.
+We provide a demo script to test a single sample. Pre-trained models can be downloaded from [model zoo](model_zoo.md)
 
 ```shell
 python demo/pcd_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}]
diff --git a/docs/index.rst b/docs/index.rst
index 6344f4110d..985f99a62c 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -8,7 +8,7 @@ Welcome to MMDetection3D's documentation!
    getting_started.md
    model_zoo.md
    data_preparation.md
-   
+
 .. toctree::
    :maxdepth: 2
    :caption: Quick Run
@@ -33,6 +33,7 @@ Welcome to MMDetection3D's documentation!
    :caption: Notes
 
    benchmarks.md
+   faq.md
 
 .. toctree::
    :caption: API Reference
diff --git a/docs/tutorials/config.md b/docs/tutorials/config.md
index 41cb1a4e12..1d00128098 100644
--- a/docs/tutorials/config.md
+++ b/docs/tutorials/config.md
@@ -44,6 +44,32 @@ For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and
 For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs.
 - `{dataset}`: dataset like `nus-3d`, `kitti-3d`, `lyft-3d`, `scannet-3d`, `sunrgbd-3d`. We also indicate the number of classes we are using if there exist multiple settings, e.g., `kitti-3d-3class` and `kitti-3d-car` means training on KITTI dataset with 3 classes and single class, respectively.
 
+## Deprecated train_cfg/test_cfg
+
+Following MMDetection, the `train_cfg` and `test_cfg` are deprecated in config file, please specify them in the model config. The original config structure is as below.
+
+```python
+# deprecated
+model = dict(
+   type=...,
+   ...
+)
+train_cfg=dict(...)
+test_cfg=dict(...)
+```
+
+The migration example is as below.
+
+```python
+# recommended
+model = dict(
+   type=...,
+   ...
+   train_cfg=dict(...),
+   test_cfg=dict(...)
+)
+```
+
 ## An example of VoteNet
 
 ```python
@@ -144,16 +170,16 @@ model = dict(
         semantic_loss=dict(  # Config to semantic loss
             type='CrossEntropyLoss',  # Type of loss
             reduction='sum',  # Specifies the reduction to apply to the output
-            loss_weight=1.0)))  # Loss weight of the semantic loss
-train_cfg = dict(  # Config of training hyperparameters for votenet
-    pos_distance_thr=0.3,  # distance >= threshold 0.3 will be taken as positive samples
-    neg_distance_thr=0.6,  # distance < threshold 0.6 will be taken as positive samples
-    sample_mod='vote')  # Mode of the sampling method
-test_cfg = dict(  # Config of testing hyperparameters for votenet
-    sample_mod='seed',  # Mode of the sampling method
-    nms_thr=0.25,  # The threshold to be used during NMS
-    score_thr=0.8,  # Threshold to filter out boxes
-    per_class_proposal=False)  # Whether to use per_class_proposal
+            loss_weight=1.0)),  # Loss weight of the semantic loss
+    train_cfg = dict(  # Config of training hyperparameters for votenet
+        pos_distance_thr=0.3,  # distance >= threshold 0.3 will be taken as positive samples
+        neg_distance_thr=0.6,  # distance < threshold 0.6 will be taken as positive samples
+        sample_mod='vote'),  # Mode of the sampling method
+    test_cfg = dict(  # Config of testing hyperparameters for votenet
+        sample_mod='seed',  # Mode of the sampling method
+        nms_thr=0.25,  # The threshold to be used during NMS
+        score_thr=0.8,  # Threshold to filter out boxes
+        per_class_proposal=False))  # Whether to use per_class_proposal
 dataset_type = 'ScanNetDataset'  # Type of the dataset
 data_root = './data/scannet/'  # Root path of the data
 class_names = ('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window',
diff --git a/docs/useful_tools.md b/docs/useful_tools.md
index 82dd339b5b..57dbcd9520 100644
--- a/docs/useful_tools.md
+++ b/docs/useful_tools.md
@@ -63,9 +63,16 @@ To see the points, detection results and ground truth of SUNRGBD, ScanNet or KIT
 ```bash
 python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --eval 'mAP' --options 'show=True' 'out_dir=${SHOW_DIR}'
 ```
-After running this command, you will obtain ***_points.ob, ***_pred.ply files and ***_gt.ply in `${SHOW_DIR}`.
+After running this command, you will obtain ***_points.ob, ***_pred.ply files and ***_gt.ply in `${SHOW_DIR}`. When `show` is enabled, [Open3D](http://www.open3d.org/) will be used to visualize the results online. You need to set `show=False` while running test in remote server withou GUI.
 
-You can use 3D visualization software such as the [MeshLab](http://www.meshlab.net/) to open the these files under `${SHOW_DIR}` to see the 3D detection output. Specifically, open `***_points.obj` to see the input point cloud and open `***_pred.ply` to see the predicted 3D bounding boxes. This allows the inference and results generation be done in remote server and the users can open them on their host with GUI.
+As for offline visualization, you will have two options.
+To visualize the results with `Open3D` backend, you can run the following command
+```bash
+python tools/visualize_results.py ${CONFIG_FILE} --result ${RESULTS_PATH} --show-dir ${SHOW_DIR}'
+```
+![Open3D_visualization](../resources/open3d_visual.gif)
+
+Or you can use 3D visualization software such as the [MeshLab](http://www.meshlab.net/) to open the these files under `${SHOW_DIR}` to see the 3D detection output. Specifically, open `***_points.obj` to see the input point cloud and open `***_pred.ply` to see the predicted 3D bounding boxes. This allows the inference and results generation be done in remote server and the users can open them on their host with GUI.
 
 **Notice**: The visualization API is a little unstable since we plan to refactor these parts together with MMDetection in the future.
 
diff --git a/mmdet3d/apis/inference.py b/mmdet3d/apis/inference.py
index 6c1f9a3043..dd46c11f87 100644
--- a/mmdet3d/apis/inference.py
+++ b/mmdet3d/apis/inference.py
@@ -30,7 +30,8 @@ def init_detector(config, checkpoint=None, device='cuda:0'):
         raise TypeError('config must be a filename or Config object, '
                         f'but got {type(config)}')
     config.model.pretrained = None
-    model = build_detector(config.model, test_cfg=config.test_cfg)
+    config.model.train_cfg = None
+    model = build_detector(config.model, test_cfg=config.get('test_cfg'))
     if checkpoint is not None:
         checkpoint = load_checkpoint(model, checkpoint)
         if 'CLASSES' in checkpoint['meta']:
diff --git a/mmdet3d/core/visualizer/open3d_vis.py b/mmdet3d/core/visualizer/open3d_vis.py
new file mode 100644
index 0000000000..d0fc8a1a66
--- /dev/null
+++ b/mmdet3d/core/visualizer/open3d_vis.py
@@ -0,0 +1,510 @@
+import cv2
+import numpy as np
+import torch
+from matplotlib import pyplot as plt
+
+try:
+    import open3d as o3d
+    from open3d import geometry
+except ImportError:
+    raise ImportError(
+        'Please run "pip install open3d" to install open3d first.')
+
+
+def _draw_points(points,
+                 vis,
+                 points_size=2,
+                 point_color=(0.5, 0.5, 0.5),
+                 mode='xyz'):
+    """Draw points on visualizer.
+
+    Args:
+        points (numpy.array | torch.tensor, shape=[N, 3+C]):
+            points to visualize.
+        vis (:obj:`open3d.visualization.Visualizer`): open3d visualizer.
+        points_size (int): the size of points to show on visualizer.
+            Default: 2.
+        point_color (tuple[float]): the color of points.
+            Default: (0.5, 0.5, 0.5).
+        mode (str):  indicate type of the input points, avaliable mode
+            ['xyz', 'xyzrgb']. Default: 'xyz'.
+
+    Returns:
+        tuple: points, color of each point.
+    """
+    vis.get_render_option().point_size = points_size  # set points size
+    if isinstance(points, torch.Tensor):
+        points = points.cpu().numpy()
+
+    points = points.copy()
+    pcd = geometry.PointCloud()
+    if mode == 'xyz':
+        pcd.points = o3d.utility.Vector3dVector(points[:, :3])
+        points_colors = np.tile(np.array(point_color), (points.shape[0], 1))
+    elif mode == 'xyzrgb':
+        pcd.points = o3d.utility.Vector3dVector(points[:, :3])
+        points_colors = points[:, 3:6]
+    else:
+        raise NotImplementedError
+
+    pcd.colors = o3d.utility.Vector3dVector(points_colors)
+    vis.add_geometry(pcd)
+
+    return pcd, points_colors
+
+
+def _draw_bboxes(bbox3d,
+                 vis,
+                 points_colors,
+                 pcd=None,
+                 bbox_color=(0, 1, 0),
+                 points_in_box_color=(1, 0, 0),
+                 rot_axis=2,
+                 center_mode='lidar_bottom',
+                 mode='xyz'):
+    """Draw bbox on visualizer and change the color of points inside bbox3d.
+
+    Args:
+        bbox3d (numpy.array | torch.tensor, shape=[M, 7]):
+            3d bbox (x, y, z, dx, dy, dz, yaw) to visualize.
+        vis (:obj:`open3d.visualization.Visualizer`): open3d visualizer.
+        points_colors (numpy.array): color of each points.
+        pcd (:obj:`open3d.geometry.PointCloud`): point cloud. Default: None.
+        bbox_color (tuple[float]): the color of bbox. Default: (0, 1, 0).
+        points_in_box_color (tuple[float]):
+            the color of points inside bbox3d. Default: (1, 0, 0).
+        rot_axis (int): rotation axis of bbox. Default: 2.
+        center_mode (bool): indicate the center of bbox is bottom center
+            or gravity center. avaliable mode
+            ['lidar_bottom', 'camera_bottom']. Default: 'lidar_bottom'.
+        mode (str):  indicate type of the input points, avaliable mode
+            ['xyz', 'xyzrgb']. Default: 'xyz'.
+    """
+    if isinstance(bbox3d, torch.Tensor):
+        bbox3d = bbox3d.cpu().numpy()
+    bbox3d = bbox3d.copy()
+
+    in_box_color = np.array(points_in_box_color)
+    for i in range(len(bbox3d)):
+        center = bbox3d[i, 0:3]
+        dim = bbox3d[i, 3:6]
+        yaw = np.zeros(3)
+        yaw[rot_axis] = -bbox3d[i, 6]
+        rot_mat = geometry.get_rotation_matrix_from_xyz(yaw)
+
+        if center_mode == 'lidar_bottom':
+            center[rot_axis] += dim[
+                rot_axis] / 2  # bottom center to gravity center
+        elif center_mode == 'camera_bottom':
+            center[rot_axis] -= dim[
+                rot_axis] / 2  # bottom center to gravity center
+        box3d = geometry.OrientedBoundingBox(center, rot_mat, dim)
+
+        line_set = geometry.LineSet.create_from_oriented_bounding_box(box3d)
+        line_set.paint_uniform_color(bbox_color)
+        # draw bboxes on visualizer
+        vis.add_geometry(line_set)
+
+        # change the color of points which are in box
+        if pcd is not None and mode == 'xyz':
+            indices = box3d.get_point_indices_within_bounding_box(pcd.points)
+            points_colors[indices] = in_box_color
+
+    # update points colors
+    if pcd is not None:
+        pcd.colors = o3d.utility.Vector3dVector(points_colors)
+        vis.update_geometry(pcd)
+
+
+def show_pts_boxes(points,
+                   bbox3d=None,
+                   show=True,
+                   save_path=None,
+                   points_size=2,
+                   point_color=(0.5, 0.5, 0.5),
+                   bbox_color=(0, 1, 0),
+                   points_in_box_color=(1, 0, 0),
+                   rot_axis=2,
+                   center_mode='lidar_bottom',
+                   mode='xyz'):
+    """Draw bbox and points on visualizer.
+
+    Args:
+        points (numpy.array | torch.tensor, shape=[N, 3+C]):
+            points to visualize.
+        bbox3d (numpy.array | torch.tensor, shape=[M, 7]):
+            3d bbox (x, y, z, dx, dy, dz, yaw) to visualize. Default: None.
+        show (bool): whether to show the visualization results. Default: True.
+        save_path (str): path to save visualized results. Default: None.
+        points_size (int): the size of points to show on visualizer.
+            Default: 2.
+        point_color (tuple[float]): the color of points.
+            Default: (0.5, 0.5, 0.5).
+        bbox_color (tuple[float]): the color of bbox. Default: (0, 1, 0).
+        points_in_box_color (tuple[float]):
+            the color of points which are in bbox3d. Default: (1, 0, 0).
+        rot_axis (int): rotation axis of bbox. Default: 2.
+        center_mode (bool): indicate the center of bbox is bottom center
+            or gravity center. avaliable mode
+            ['lidar_bottom', 'camera_bottom']. Default: 'lidar_bottom'.
+        mode (str):  indicate type of the input points, avaliable mode
+            ['xyz', 'xyzrgb']. Default: 'xyz'.
+    """
+    # TODO: support score and class info
+    assert 0 <= rot_axis <= 2
+
+    # init visualizer
+    vis = o3d.visualization.Visualizer()
+    vis.create_window()
+    mesh_frame = geometry.TriangleMesh.create_coordinate_frame(
+        size=1, origin=[0, 0, 0])  # create coordinate frame
+    vis.add_geometry(mesh_frame)
+
+    # draw points
+    pcd, points_colors = _draw_points(points, vis, points_size, point_color,
+                                      mode)
+
+    # draw boxes
+    if bbox3d is not None:
+        _draw_bboxes(bbox3d, vis, points_colors, pcd, bbox_color,
+                     points_in_box_color, rot_axis, center_mode, mode)
+
+    if show:
+        vis.run()
+
+    if save_path is not None:
+        vis.capture_screen_image(save_path)
+
+    vis.destroy_window()
+
+
+def _draw_bboxes_ind(bbox3d,
+                     vis,
+                     indices,
+                     points_colors,
+                     pcd=None,
+                     bbox_color=(0, 1, 0),
+                     points_in_box_color=(1, 0, 0),
+                     rot_axis=2,
+                     center_mode='lidar_bottom',
+                     mode='xyz'):
+    """Draw bbox on visualizer and change the color or points inside bbox3d
+    with indices.
+
+    Args:
+        bbox3d (numpy.array | torch.tensor, shape=[M, 7]):
+            3d bbox (x, y, z, dx, dy, dz, yaw) to visualize.
+        vis (:obj:`open3d.visualization.Visualizer`): open3d visualizer.
+        indices (numpy.array | torch.tensor, shape=[N, M]):
+            indicate which bbox3d that each point lies in.
+        points_colors (numpy.array): color of each points.
+        pcd (:obj:`open3d.geometry.PointCloud`): point cloud. Default: None.
+        bbox_color (tuple[float]): the color of bbox. Default: (0, 1, 0).
+        points_in_box_color (tuple[float]):
+            the color of points which are in bbox3d. Default: (1, 0, 0).
+        rot_axis (int): rotation axis of bbox. Default: 2.
+        center_mode (bool): indicate the center of bbox is bottom center
+            or gravity center. avaliable mode
+            ['lidar_bottom', 'camera_bottom']. Default: 'lidar_bottom'.
+        mode (str):  indicate type of the input points, avaliable mode
+            ['xyz', 'xyzrgb']. Default: 'xyz'.
+    """
+    if isinstance(bbox3d, torch.Tensor):
+        bbox3d = bbox3d.cpu().numpy()
+    if isinstance(indices, torch.Tensor):
+        indices = indices.cpu().numpy()
+    bbox3d = bbox3d.copy()
+
+    in_box_color = np.array(points_in_box_color)
+    for i in range(len(bbox3d)):
+        center = bbox3d[i, 0:3]
+        dim = bbox3d[i, 3:6]
+        yaw = np.zeros(3)
+        # TODO: fix problem of current coordinate system
+        # dim[0], dim[1] = dim[1], dim[0]  # for current coordinate
+        # yaw[rot_axis] = -(bbox3d[i, 6] - 0.5 * np.pi)
+        yaw[rot_axis] = -bbox3d[i, 6]
+        rot_mat = geometry.get_rotation_matrix_from_xyz(yaw)
+        if center_mode == 'lidar_bottom':
+            center[rot_axis] += dim[
+                rot_axis] / 2  # bottom center to gravity center
+        elif center_mode == 'camera_bottom':
+            center[rot_axis] -= dim[
+                rot_axis] / 2  # bottom center to gravity center
+        box3d = geometry.OrientedBoundingBox(center, rot_mat, dim)
+
+        line_set = geometry.LineSet.create_from_oriented_bounding_box(box3d)
+        line_set.paint_uniform_color(bbox_color)
+        # draw bboxes on visualizer
+        vis.add_geometry(line_set)
+
+        # change the color of points which are in box
+        if pcd is not None and mode == 'xyz':
+            points_colors[indices[:, i].astype(np.bool)] = in_box_color
+
+    # update points colors
+    if pcd is not None:
+        pcd.colors = o3d.utility.Vector3dVector(points_colors)
+        vis.update_geometry(pcd)
+
+
+def show_pts_index_boxes(points,
+                         bbox3d=None,
+                         show=True,
+                         indices=None,
+                         save_path=None,
+                         points_size=2,
+                         point_color=(0.5, 0.5, 0.5),
+                         bbox_color=(0, 1, 0),
+                         points_in_box_color=(1, 0, 0),
+                         rot_axis=2,
+                         center_mode='lidar_bottom',
+                         mode='xyz'):
+    """Draw bbox and points on visualizer with indices that indicate which
+    bbox3d that each point lies in.
+
+    Args:
+        points (numpy.array | torch.tensor, shape=[N, 3+C]):
+            points to visualize.
+        bbox3d (numpy.array | torch.tensor, shape=[M, 7]):
+            3d bbox (x, y, z, dx, dy, dz, yaw) to visualize. Default: None.
+        show (bool): whether to show the visualization results. Default: True.
+        indices (numpy.array | torch.tensor, shape=[N, M]):
+            indicate which bbox3d that each point lies in. Default: None.
+        save_path (str): path to save visualized results. Default: None.
+        points_size (int): the size of points to show on visualizer.
+            Default: 2.
+        point_color (tuple[float]): the color of points.
+            Default: (0.5, 0.5, 0.5).
+        bbox_color (tuple[float]): the color of bbox. Default: (0, 1, 0).
+        points_in_box_color (tuple[float]):
+            the color of points which are in bbox3d. Default: (1, 0, 0).
+        rot_axis (int): rotation axis of bbox. Default: 2.
+        center_mode (bool): indicate the center of bbox is bottom center
+            or gravity center. avaliable mode
+            ['lidar_bottom', 'camera_bottom']. Default: 'lidar_bottom'.
+        mode (str):  indicate type of the input points, avaliable mode
+            ['xyz', 'xyzrgb']. Default: 'xyz'.
+    """
+    # TODO: support score and class info
+    assert 0 <= rot_axis <= 2
+
+    # init visualizer
+    vis = o3d.visualization.Visualizer()
+    vis.create_window()
+    mesh_frame = geometry.TriangleMesh.create_coordinate_frame(
+        size=1, origin=[0, 0, 0])  # create coordinate frame
+    vis.add_geometry(mesh_frame)
+
+    # draw points
+    pcd, points_colors = _draw_points(points, vis, points_size, point_color,
+                                      mode)
+
+    # draw boxes
+    if bbox3d is not None:
+        _draw_bboxes_ind(bbox3d, vis, indices, points_colors, pcd, bbox_color,
+                         points_in_box_color, rot_axis, center_mode, mode)
+
+    if show:
+        vis.run()
+
+    if save_path is not None:
+        vis.capture_screen_image(save_path)
+
+    vis.destroy_window()
+
+
+def project_pts_on_img(points,
+                       raw_img,
+                       lidar2img_rt,
+                       max_distance=70,
+                       thickness=-1):
+    """Project the 3D points cloud on 2D image.
+
+    Args:
+        points (numpy.array): 3D points cloud (x, y, z) to visualize.
+        raw_img (numpy.array): The numpy array of image.
+        lidar2img_rt (numpy.array, shape=[4, 4]): The projection matrix
+            according to the camera intrinsic parameters.
+        max_distance (float): the max distance of the points cloud.
+            Default: 70.
+        thickness (int, optional): The thickness of 2D points. Default: -1.
+    """
+    img = raw_img.copy()
+    num_points = points.shape[0]
+    pts_4d = np.concatenate([points[:, :3], np.ones((num_points, 1))], axis=-1)
+    pts_2d = pts_4d @ lidar2img_rt.T
+
+    # cam_points is Tensor of Nx4 whose last column is 1
+    # transform camera coordinate to image coordinate
+    pts_2d[:, 2] = np.clip(pts_2d[:, 2], a_min=1e-5, a_max=99999)
+    pts_2d[:, 0] /= pts_2d[:, 2]
+    pts_2d[:, 1] /= pts_2d[:, 2]
+
+    fov_inds = ((pts_2d[:, 0] < img.shape[1])
+                & (pts_2d[:, 0] >= 0)
+                & (pts_2d[:, 1] < img.shape[0])
+                & (pts_2d[:, 1] >= 0))
+
+    imgfov_pts_2d = pts_2d[fov_inds, :3]  # u, v, d
+
+    cmap = plt.cm.get_cmap('hsv', 256)
+    cmap = np.array([cmap(i) for i in range(256)])[:, :3] * 255
+    for i in range(imgfov_pts_2d.shape[0]):
+        depth = imgfov_pts_2d[i, 2]
+        color = cmap[np.clip(int(max_distance * 10 / depth), 0, 255), :]
+        cv2.circle(
+            img,
+            center=(int(np.round(imgfov_pts_2d[i, 0])),
+                    int(np.round(imgfov_pts_2d[i, 1]))),
+            radius=1,
+            color=tuple(color),
+            thickness=thickness,
+        )
+    cv2.imshow('project_pts_img', img)
+    cv2.waitKey(100)
+
+
+def project_bbox3d_on_img(bboxes3d,
+                          raw_img,
+                          lidar2img_rt,
+                          color=(0, 255, 0),
+                          thickness=1):
+    """Project the 3D bbox on 2D image.
+
+    Args:
+        bboxes3d (numpy.array, shape=[M, 7]):
+            3d bbox (x, y, z, dx, dy, dz, yaw) to visualize.
+        raw_img (numpy.array): The numpy array of image.
+        lidar2img_rt (numpy.array, shape=[4, 4]): The projection matrix
+            according to the camera intrinsic parameters.
+        color (tuple[int]): the color to draw bboxes. Default: (0, 255, 0).
+        thickness (int, optional): The thickness of bboxes. Default: 1.
+    """
+    img = raw_img.copy()
+    corners_3d = bboxes3d.corners
+    num_bbox = corners_3d.shape[0]
+    pts_4d = np.concatenate(
+        [corners_3d.reshape(-1, 3),
+         np.ones((num_bbox * 8, 1))], axis=-1)
+    pts_2d = pts_4d @ lidar2img_rt.T
+
+    pts_2d[:, 2] = np.clip(pts_2d[:, 2], a_min=1e-5, a_max=1e5)
+    pts_2d[:, 0] /= pts_2d[:, 2]
+    pts_2d[:, 1] /= pts_2d[:, 2]
+    imgfov_pts_2d = pts_2d[..., :2].reshape(num_bbox, 8, 2)
+
+    line_indices = ((0, 1), (0, 3), (0, 4), (1, 2), (1, 5), (3, 2), (3, 7),
+                    (4, 5), (4, 7), (2, 6), (5, 6), (6, 7))
+    for i in range(num_bbox):
+        corners = imgfov_pts_2d[i].astype(np.int)
+        for start, end in line_indices:
+            cv2.line(img, (corners[start, 0], corners[start, 1]),
+                     (corners[end, 0], corners[end, 1]), color, thickness,
+                     cv2.LINE_AA)
+
+    cv2.imshow('project_bbox3d_img', img)
+    cv2.waitKey(0)
+
+
+class Visualizer(object):
+    r"""Online visualizer implemented with Open3d.
+
+    Args:
+        points (numpy.array, shape=[N, 3+C]): Points to visualize. The Points
+            cloud is in mode of Coord3DMode.DEPTH (please refer to
+            core.structures.coord_3d_mode).
+        bbox3d (numpy.array, shape=[M, 7]): 3d bbox (x, y, z, dx, dy, dz, yaw)
+            to visualize. The 3d bbox is in mode of Box3DMode.DEPTH with
+            gravity_center (please refer to core.structures.box_3d_mode).
+            Default: None.
+        save_path (str): path to save visualized results. Default: None.
+        points_size (int): the size of points to show on visualizer.
+            Default: 2.
+        point_color (tuple[float]): the color of points.
+            Default: (0.5, 0.5, 0.5).
+        bbox_color (tuple[float]): the color of bbox. Default: (0, 1, 0).
+        points_in_box_color (tuple[float]):
+            the color of points which are in bbox3d. Default: (1, 0, 0).
+        rot_axis (int): rotation axis of bbox. Default: 2.
+        center_mode (bool): indicate the center of bbox is bottom center
+            or gravity center. avaliable mode
+            ['lidar_bottom', 'camera_bottom']. Default: 'lidar_bottom'.
+        mode (str):  indicate type of the input points, avaliable mode
+            ['xyz', 'xyzrgb']. Default: 'xyz'.
+    """
+
+    def __init__(self,
+                 points,
+                 bbox3d=None,
+                 save_path=None,
+                 points_size=2,
+                 point_color=(0.5, 0.5, 0.5),
+                 bbox_color=(0, 1, 0),
+                 points_in_box_color=(1, 0, 0),
+                 rot_axis=2,
+                 center_mode='lidar_bottom',
+                 mode='xyz'):
+        super(Visualizer, self).__init__()
+        assert 0 <= rot_axis <= 2
+
+        # init visualizer
+        self.o3d_visualizer = o3d.visualization.Visualizer()
+        self.o3d_visualizer.create_window()
+        mesh_frame = geometry.TriangleMesh.create_coordinate_frame(
+            size=1, origin=[0, 0, 0])  # create coordinate frame
+        self.o3d_visualizer.add_geometry(mesh_frame)
+
+        self.points_size = points_size
+        self.point_color = point_color
+        self.bbox_color = bbox_color
+        self.points_in_box_color = points_in_box_color
+        self.rot_axis = rot_axis
+        self.center_mode = center_mode
+        self.mode = mode
+
+        # draw points
+        if points is not None:
+            self.pcd, self.points_colors = _draw_points(
+                points, self.o3d_visualizer, points_size, point_color, mode)
+
+        # draw boxes
+        if bbox3d is not None:
+            _draw_bboxes(bbox3d, self.o3d_visualizer, self.points_colors,
+                         self.pcd, bbox_color, points_in_box_color, rot_axis,
+                         center_mode, mode)
+
+    def add_bboxes(self, bbox3d, bbox_color=None, points_in_box_color=None):
+        """Add bounding box to visualizer.
+
+        Args:
+            bbox3d (numpy.array, shape=[M, 7]):
+                3D bbox (x, y, z, dx, dy, dz, yaw) to be visualized.
+                The 3d bbox is in mode of Box3DMode.DEPTH with
+                gravity_center (please refer to core.structures.box_3d_mode).
+            bbox_color (tuple[float]): the color of bbox. Defaule: None.
+            points_in_box_color (tuple[float]): the color of points which
+                are in bbox3d. Defaule: None.
+        """
+        if bbox_color is None:
+            bbox_color = self.bbox_color
+        if points_in_box_color is None:
+            points_in_box_color = self.points_in_box_color
+        _draw_bboxes(bbox3d, self.o3d_visualizer, self.points_colors, self.pcd,
+                     bbox_color, points_in_box_color, self.rot_axis,
+                     self.center_mode, self.mode)
+
+    def show(self, save_path=None):
+        """Visualize the points cloud.
+
+        Args:
+            save_path (str): path to save image. Default: None.
+        """
+
+        self.o3d_visualizer.run()
+
+        if save_path is not None:
+            self.o3d_visualizer.capture_screen_image(save_path)
+
+        self.o3d_visualizer.destroy_window()
+        return
diff --git a/mmdet3d/core/visualizer/show_result.py b/mmdet3d/core/visualizer/show_result.py
index a21502840b..522604ad79 100644
--- a/mmdet3d/core/visualizer/show_result.py
+++ b/mmdet3d/core/visualizer/show_result.py
@@ -68,7 +68,7 @@ def convert_oriented_box_to_trimesh_fmt(box):
     return
 
 
-def show_result(points, gt_bboxes, pred_bboxes, out_dir, filename):
+def show_result(points, gt_bboxes, pred_bboxes, out_dir, filename, show=True):
     """Convert results into format that is directly readable for meshlab.
 
     Args:
@@ -77,18 +77,36 @@ def show_result(points, gt_bboxes, pred_bboxes, out_dir, filename):
         pred_bboxes (np.ndarray): Predicted boxes.
         out_dir (str): Path of output directory
         filename (str): Filename of the current frame.
+        show (bool): Visualize the results online.
     """
+    from .open3d_vis import Visualizer
+
+    if show:
+        vis = Visualizer(points)
+        if pred_bboxes is not None:
+            vis.add_bboxes(bbox3d=pred_bboxes)
+        if gt_bboxes is not None:
+            vis.add_bboxes(bbox3d=gt_bboxes, bbox_color=(0, 0, 1))
+        vis.show()
+
     result_path = osp.join(out_dir, filename)
     mmcv.mkdir_or_exist(result_path)
 
+    if points is not None:
+        _write_ply(points, osp.join(result_path, f'{filename}_points.obj'))
+
     if gt_bboxes is not None:
+        # bottom center to gravity center
+        gt_bboxes[..., 2] += gt_bboxes[..., 5] / 2
+        # the positive direction for yaw in meshlab is clockwise
         gt_bboxes[:, 6] *= -1
         _write_oriented_bbox(gt_bboxes,
                              osp.join(result_path, f'{filename}_gt.ply'))
-    if points is not None:
-        _write_ply(points, osp.join(result_path, f'{filename}_points.obj'))
 
     if pred_bboxes is not None:
+        # bottom center to gravity center
+        pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
+        # the positive direction for yaw in meshlab is clockwise
         pred_bboxes[:, 6] *= -1
         _write_oriented_bbox(pred_bboxes,
                              osp.join(result_path, f'{filename}_pred.ply'))
diff --git a/mmdet3d/datasets/kitti_dataset.py b/mmdet3d/datasets/kitti_dataset.py
index d1e9db250f..79bd978ff8 100644
--- a/mmdet3d/datasets/kitti_dataset.py
+++ b/mmdet3d/datasets/kitti_dataset.py
@@ -9,7 +9,8 @@
 
 from mmdet.datasets import DATASETS
 from ..core import show_result
-from ..core.bbox import Box3DMode, CameraInstance3DBoxes, points_cam2img
+from ..core.bbox import (Box3DMode, CameraInstance3DBoxes, Coord3DMode,
+                         points_cam2img)
 from .custom_3d import Custom3DDataset
 
 
@@ -669,12 +670,13 @@ def convert_valid_bboxes(self, box_dict, info):
                 sample_idx=sample_idx,
             )
 
-    def show(self, results, out_dir):
+    def show(self, results, out_dir, show=True):
         """Results visualization.
 
         Args:
             results (list[dict]): List of bounding boxes results.
             out_dir (str): Output directory of visualization result.
+            show (bool): Visualize the results online.
         """
         assert out_dir is not None, 'Expect out_dir, got none.'
         for i, result in enumerate(results):
@@ -684,14 +686,13 @@ def show(self, results, out_dir):
             file_name = osp.split(pts_path)[-1].split('.')[0]
             # for now we convert points into depth mode
             points = example['points'][0]._data.numpy()
-            points = points[..., [1, 0, 2]]
-            points[..., 0] *= -1
+            points = Coord3DMode.convert_point(points, Coord3DMode.LIDAR,
+                                               Coord3DMode.DEPTH)
             gt_bboxes = self.get_ann_info(i)['gt_bboxes_3d'].tensor
             gt_bboxes = Box3DMode.convert(gt_bboxes, Box3DMode.LIDAR,
                                           Box3DMode.DEPTH)
-            gt_bboxes[..., 2] += gt_bboxes[..., 5] / 2
             pred_bboxes = result['boxes_3d'].tensor.numpy()
             pred_bboxes = Box3DMode.convert(pred_bboxes, Box3DMode.LIDAR,
                                             Box3DMode.DEPTH)
-            pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
-            show_result(points, gt_bboxes, pred_bboxes, out_dir, file_name)
+            show_result(points, gt_bboxes, pred_bboxes, out_dir, file_name,
+                        show)
diff --git a/mmdet3d/datasets/lyft_dataset.py b/mmdet3d/datasets/lyft_dataset.py
index c69e027506..f565ab713b 100644
--- a/mmdet3d/datasets/lyft_dataset.py
+++ b/mmdet3d/datasets/lyft_dataset.py
@@ -10,7 +10,7 @@
 from mmdet3d.core.evaluation.lyft_eval import lyft_eval
 from mmdet.datasets import DATASETS
 from ..core import show_result
-from ..core.bbox import Box3DMode, LiDARInstance3DBoxes
+from ..core.bbox import Box3DMode, Coord3DMode, LiDARInstance3DBoxes
 from .custom_3d import Custom3DDataset
 
 
@@ -412,17 +412,15 @@ def show(self, results, out_dir):
             pts_path = data_info['lidar_path']
             file_name = osp.split(pts_path)[-1].split('.')[0]
             # for now we convert points into depth mode
-            points = points[..., [1, 0, 2]]
-            points[..., 0] *= -1
+            points = Coord3DMode.convert_point(points, Coord3DMode.LIDAR,
+                                               Coord3DMode.DEPTH)
             inds = result['pts_bbox']['scores_3d'] > 0.1
             gt_bboxes = self.get_ann_info(i)['gt_bboxes_3d'].tensor
             gt_bboxes = Box3DMode.convert(gt_bboxes, Box3DMode.LIDAR,
                                           Box3DMode.DEPTH)
-            gt_bboxes[..., 2] += gt_bboxes[..., 5] / 2
             pred_bboxes = result['pts_bbox']['boxes_3d'][inds].tensor.numpy()
             pred_bboxes = Box3DMode.convert(pred_bboxes, Box3DMode.LIDAR,
                                             Box3DMode.DEPTH)
-            pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
             show_result(points, gt_bboxes, pred_bboxes, out_dir, file_name)
 
     def json2csv(self, json_path, csv_savepath):
diff --git a/mmdet3d/datasets/nuscenes_dataset.py b/mmdet3d/datasets/nuscenes_dataset.py
index 1f3a6b3654..021f188384 100644
--- a/mmdet3d/datasets/nuscenes_dataset.py
+++ b/mmdet3d/datasets/nuscenes_dataset.py
@@ -7,7 +7,7 @@
 
 from mmdet.datasets import DATASETS
 from ..core import show_result
-from ..core.bbox import Box3DMode, LiDARInstance3DBoxes
+from ..core.bbox import Box3DMode, Coord3DMode, LiDARInstance3DBoxes
 from .custom_3d import Custom3DDataset
 
 
@@ -504,17 +504,15 @@ def show(self, results, out_dir):
             pts_path = data_info['lidar_path']
             file_name = osp.split(pts_path)[-1].split('.')[0]
             # for now we convert points into depth mode
-            points = points[..., [1, 0, 2]]
-            points[..., 0] *= -1
+            points = Coord3DMode.convert_point(points, Coord3DMode.LIDAR,
+                                               Coord3DMode.DEPTH)
             inds = result['pts_bbox']['scores_3d'] > 0.1
             gt_bboxes = self.get_ann_info(i)['gt_bboxes_3d'].tensor
             gt_bboxes = Box3DMode.convert(gt_bboxes, Box3DMode.LIDAR,
                                           Box3DMode.DEPTH)
-            gt_bboxes[..., 2] += gt_bboxes[..., 5] / 2
             pred_bboxes = result['pts_bbox']['boxes_3d'][inds].tensor.numpy()
             pred_bboxes = Box3DMode.convert(pred_bboxes, Box3DMode.LIDAR,
                                             Box3DMode.DEPTH)
-            pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
             show_result(points, gt_bboxes, pred_bboxes, out_dir, file_name)
 
 
diff --git a/mmdet3d/datasets/scannet_dataset.py b/mmdet3d/datasets/scannet_dataset.py
index cded2839cf..4513ca260c 100644
--- a/mmdet3d/datasets/scannet_dataset.py
+++ b/mmdet3d/datasets/scannet_dataset.py
@@ -106,12 +106,13 @@ def get_ann_info(self, index):
             pts_semantic_mask_path=pts_semantic_mask_path)
         return anns_results
 
-    def show(self, results, out_dir):
+    def show(self, results, out_dir, show=True):
         """Results visualization.
 
         Args:
             results (list[dict]): List of bounding boxes results.
             out_dir (str): Output directory of visualization result.
+            show (bool): Visualize the results online.
         """
         assert out_dir is not None, 'Expect out_dir, got none.'
         for i, result in enumerate(results):
@@ -121,8 +122,7 @@ def show(self, results, out_dir):
             points = np.fromfile(
                 osp.join(self.data_root, pts_path),
                 dtype=np.float32).reshape(-1, 6)
-            gt_bboxes = np.pad(data_info['annos']['gt_boxes_upright_depth'],
-                               ((0, 0), (0, 1)), 'constant')
+            gt_bboxes = self.get_ann_info(i)['gt_bboxes_3d'].tensor
             pred_bboxes = result['boxes_3d'].tensor.numpy()
-            pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
-            show_result(points, gt_bboxes, pred_bboxes, out_dir, file_name)
+            show_result(points, gt_bboxes, pred_bboxes, out_dir, file_name,
+                        show)
diff --git a/mmdet3d/datasets/sunrgbd_dataset.py b/mmdet3d/datasets/sunrgbd_dataset.py
index be77afd3f3..01c214aceb 100644
--- a/mmdet3d/datasets/sunrgbd_dataset.py
+++ b/mmdet3d/datasets/sunrgbd_dataset.py
@@ -153,12 +153,13 @@ def get_ann_info(self, index):
 
         return anns_results
 
-    def show(self, results, out_dir):
+    def show(self, results, out_dir, show=True):
         """Results visualization.
 
         Args:
             results (list[dict]): List of bounding boxes results.
             out_dir (str): Output directory of visualization result.
+            show (bool): Visualize the results online.
         """
         assert out_dir is not None, 'Expect out_dir, got none.'
         for i, result in enumerate(results):
@@ -169,13 +170,10 @@ def show(self, results, out_dir):
                 osp.join(self.data_root, pts_path),
                 dtype=np.float32).reshape(-1, 6)
             points[:, 3:] *= 255
-            if data_info['annos']['gt_num'] > 0:
-                gt_bboxes = data_info['annos']['gt_boxes_upright_depth']
-            else:
-                gt_bboxes = np.zeros((0, 7))
+            gt_bboxes = self.get_ann_info(i)['gt_bboxes_3d'].tensor
             pred_bboxes = result['boxes_3d'].tensor.numpy()
-            pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
-            show_result(points, gt_bboxes, pred_bboxes, out_dir, file_name)
+            show_result(points, gt_bboxes, pred_bboxes, out_dir, file_name,
+                        show)
 
     def evaluate(self,
                  results,
diff --git a/mmdet3d/models/builder.py b/mmdet3d/models/builder.py
index c0e0ba1c21..c6275b6e2b 100644
--- a/mmdet3d/models/builder.py
+++ b/mmdet3d/models/builder.py
@@ -1,3 +1,5 @@
+import warnings
+
 from mmdet.models.builder import (BACKBONES, DETECTORS, HEADS, LOSSES, NECKS,
                                   ROI_EXTRACTORS, SHARED_HEADS, build)
 from .registry import FUSION_LAYERS, MIDDLE_ENCODERS, VOXEL_ENCODERS
@@ -35,6 +37,14 @@ def build_loss(cfg):
 
 def build_detector(cfg, train_cfg=None, test_cfg=None):
     """Build detector."""
+    if train_cfg is not None or test_cfg is not None:
+        warnings.warn(
+            'train_cfg and test_cfg is deprecated, '
+            'please specify them in model', UserWarning)
+    assert cfg.get('train_cfg') is None or train_cfg is None, \
+        'train_cfg specified in both outer field and model field '
+    assert cfg.get('test_cfg') is None or test_cfg is None, \
+        'test_cfg specified in both outer field and model field '
     return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
 
 
diff --git a/mmdet3d/models/dense_heads/anchor3d_head.py b/mmdet3d/models/dense_heads/anchor3d_head.py
index 55ea1ec936..d55b871664 100644
--- a/mmdet3d/models/dense_heads/anchor3d_head.py
+++ b/mmdet3d/models/dense_heads/anchor3d_head.py
@@ -225,7 +225,8 @@ def loss_single(self, cls_score, bbox_pred, dir_cls_preds, labels,
 
         bg_class_ind = self.num_classes
         pos_inds = ((labels >= 0)
-                    & (labels < bg_class_ind)).nonzero().reshape(-1)
+                    & (labels < bg_class_ind)).nonzero(
+                        as_tuple=False).reshape(-1)
         num_pos = len(pos_inds)
 
         pos_bbox_pred = bbox_pred[pos_inds]
diff --git a/mmdet3d/models/dense_heads/centerpoint_head.py b/mmdet3d/models/dense_heads/centerpoint_head.py
index f2abf28c3f..adf92df26e 100644
--- a/mmdet3d/models/dense_heads/centerpoint_head.py
+++ b/mmdet3d/models/dense_heads/centerpoint_head.py
@@ -119,8 +119,8 @@ def forward(self, x):
 
 
 @HEADS.register_module()
-class DCNSeperateHead(nn.Module):
-    r"""DCNSeperateHead for CenterHead.
+class DCNSeparateHead(nn.Module):
+    r"""DCNSeparateHead for CenterHead.
 
     .. code-block:: none
             /-----> DCN for heatmap task -----> heatmap task.
@@ -155,7 +155,7 @@ def __init__(self,
                  norm_cfg=dict(type='BN2d'),
                  bias='auto',
                  **kwargs):
-        super(DCNSeperateHead, self).__init__()
+        super(DCNSeparateHead, self).__init__()
         if 'heatmap' in heads:
             heads.pop('heatmap')
         # feature adaptation with dcn
@@ -250,7 +250,7 @@ class CenterHead(nn.Module):
             Default: dict(type='GaussianFocalLoss', reduction='mean').
         loss_bbox (dict): Config of regression loss function.
             Default: dict(type='L1Loss', reduction='none').
-        seperate_head (dict): Config of seperate head. Default: dict(
+        separate_head (dict): Config of separate head. Default: dict(
             type='SeparateHead', init_bias=-2.19, final_kernel=3)
         share_conv_channel (int): Output channels for share_conv_layer.
             Default: 64.
@@ -273,7 +273,7 @@ def __init__(self,
                  loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
                  loss_bbox=dict(
                      type='L1Loss', reduction='none', loss_weight=0.25),
-                 seperate_head=dict(
+                 separate_head=dict(
                      type='SeparateHead', init_bias=-2.19, final_kernel=3),
                  share_conv_channel=64,
                  num_heatmap_convs=2,
@@ -312,9 +312,9 @@ def __init__(self,
         for num_cls in num_classes:
             heads = copy.deepcopy(common_heads)
             heads.update(dict(heatmap=(num_cls, num_heatmap_convs)))
-            seperate_head.update(
+            separate_head.update(
                 in_channels=share_conv_channel, heads=heads, num_cls=num_cls)
-            self.task_heads.append(builder.build_head(seperate_head))
+            self.task_heads.append(builder.build_head(separate_head))
 
     def init_weights(self):
         """Initialize weights."""
diff --git a/mmdet3d/models/dense_heads/imvote_head.py b/mmdet3d/models/dense_heads/imvote_head.py
deleted file mode 100644
index c6921a76db..0000000000
--- a/mmdet3d/models/dense_heads/imvote_head.py
+++ /dev/null
@@ -1,32 +0,0 @@
-from torch import nn as nn
-
-from mmdet.models import HEADS
-
-
-@HEADS.register_module()
-class ImVoteHead(nn.Module):
-
-    def __init__(self,
-                 num_classes,
-                 bbox_coder,
-                 train_cfg=None,
-                 test_cfg=None,
-                 vote_module_cfg=None,
-                 vote_aggregation_cfg=None,
-                 pred_layer_cfg=None,
-                 conv_cfg=dict(type='Conv1d'),
-                 norm_cfg=dict(type='BN1d'),
-                 objectness_loss=None,
-                 center_loss=None,
-                 dir_class_loss=None,
-                 dir_res_loss=None,
-                 size_class_loss=None,
-                 size_res_loss=None,
-                 semantic_loss=None):
-        super(ImVoteHead,
-              self).__init__(num_classes, bbox_coder, train_cfg, test_cfg,
-                             vote_module_cfg, vote_aggregation_cfg,
-                             pred_layer_cfg, conv_cfg, norm_cfg,
-                             objectness_loss, center_loss, dir_class_loss,
-                             dir_res_loss, size_class_loss, size_res_loss,
-                             semantic_loss)
diff --git a/mmdet3d/models/dense_heads/ssd_3d_head.py b/mmdet3d/models/dense_heads/ssd_3d_head.py
index 07758b5e2e..8f516b2405 100644
--- a/mmdet3d/models/dense_heads/ssd_3d_head.py
+++ b/mmdet3d/models/dense_heads/ssd_3d_head.py
@@ -519,7 +519,8 @@ def multiclass_nms_single(self, obj_scores, sem_scores, bbox, points,
 
         # filter empty boxes and boxes with low score
         scores_mask = (obj_scores >= self.test_cfg.score_thr)
-        nonempty_box_inds = torch.nonzero(nonempty_box_mask).flatten()
+        nonempty_box_inds = torch.nonzero(
+            nonempty_box_mask, as_tuple=False).flatten()
         nonempty_mask = torch.zeros_like(bbox_classes).scatter(
             0, nonempty_box_inds[nms_selected], 1)
         selected = (nonempty_mask.bool() & scores_mask.bool())
diff --git a/mmdet3d/models/dense_heads/train_mixins.py b/mmdet3d/models/dense_heads/train_mixins.py
index c0af5c3522..f785a9dc06 100644
--- a/mmdet3d/models/dense_heads/train_mixins.py
+++ b/mmdet3d/models/dense_heads/train_mixins.py
@@ -277,11 +277,11 @@ def anchor_target_single_assigner(self,
             neg_inds = sampling_result.neg_inds
         else:
             pos_inds = torch.nonzero(
-                anchors.new_zeros((anchors.shape[0], ), dtype=torch.bool) > 0
-            ).squeeze(-1).unique()
+                anchors.new_zeros((anchors.shape[0], ), dtype=torch.bool) > 0,
+                as_tuple=False).squeeze(-1).unique()
             neg_inds = torch.nonzero(
-                anchors.new_zeros((anchors.shape[0], ), dtype=torch.bool) ==
-                0).squeeze(-1).unique()
+                anchors.new_zeros((anchors.shape[0], ), dtype=torch.bool) == 0,
+                as_tuple=False).squeeze(-1).unique()
 
         if gt_labels is not None:
             labels += num_classes
diff --git a/mmdet3d/models/detectors/base.py b/mmdet3d/models/detectors/base.py
index d77fbd0ed7..b079d965f5 100644
--- a/mmdet3d/models/detectors/base.py
+++ b/mmdet3d/models/detectors/base.py
@@ -1,11 +1,10 @@
-import copy
 import mmcv
 import torch
 from mmcv.parallel import DataContainer as DC
 from mmcv.runner import auto_fp16
 from os import path as osp
 
-from mmdet3d.core import Box3DMode, show_result
+from mmdet3d.core import Box3DMode, Coord3DMode, show_result
 from mmdet.models.detectors import BaseDetector
 
 
@@ -92,20 +91,17 @@ def show_results(self, data, result, out_dir):
 
             assert out_dir is not None, 'Expect out_dir, got none.'
 
-            pred_bboxes = copy.deepcopy(
-                result[batch_id]['boxes_3d'].tensor.numpy())
-            # for now we convert points into depth mode
-            if box_mode_3d == Box3DMode.DEPTH:
-                pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
-            elif (box_mode_3d == Box3DMode.CAM) or (box_mode_3d
-                                                    == Box3DMode.LIDAR):
-                points = points[..., [1, 0, 2]]
-                points[..., 0] *= -1
+            pred_bboxes = result[batch_id]['boxes_3d']
+
+            # for now we convert points and bbox into depth mode
+            if (box_mode_3d == Box3DMode.CAM) or (box_mode_3d
+                                                  == Box3DMode.LIDAR):
+                points = Coord3DMode.convert_point(points, Coord3DMode.LIDAR,
+                                                   Coord3DMode.DEPTH)
                 pred_bboxes = Box3DMode.convert(pred_bboxes, box_mode_3d,
                                                 Box3DMode.DEPTH)
-                pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
-            else:
+            elif box_mode_3d != Box3DMode.DEPTH:
                 ValueError(
                     f'Unsupported box_mode_3d {box_mode_3d} for convertion!')
-
+            pred_bboxes = pred_bboxes.tensor.cpu().numpy()
             show_result(points, None, pred_bboxes, out_dir, file_name)
diff --git a/mmdet3d/models/detectors/centerpoint.py b/mmdet3d/models/detectors/centerpoint.py
index 7c39a211e5..7705ce1a94 100644
--- a/mmdet3d/models/detectors/centerpoint.py
+++ b/mmdet3d/models/detectors/centerpoint.py
@@ -181,8 +181,6 @@ def aug_test_pts(self, feats, img_metas, rescale=False):
         else:
             for key in bbox_list[0].keys():
                 bbox_list[0][key] = bbox_list[0][key].to('cpu')
-            import pdb
-            pdb.set_trace()
             return bbox_list[0]
 
     def aug_test(self, points, img_metas, imgs=None, rescale=False):
diff --git a/mmdet3d/models/detectors/imvotenet.py b/mmdet3d/models/detectors/imvotenet.py
index c02eb7b342..5cdf3c98ac 100644
--- a/mmdet3d/models/detectors/imvotenet.py
+++ b/mmdet3d/models/detectors/imvotenet.py
@@ -1,51 +1,146 @@
+import numpy as np
+import torch
 from torch import nn as nn
 
-from mmdet.models import DETECTORS, build_backbone, build_head, build_neck
+from mmdet3d.core import bbox3d2result, merge_aug_bboxes_3d
+from mmdet3d.models.model_utils import ImageMLPModule
+from mmdet.models import DETECTORS
+from .. import builder
 from .base import Base3DDetector
 
 
+def sample_valid_seeds(mask, num_sampled_seed=1024):
+    """Randomly sample seeds from all imvotes.
+
+    Args:
+        mask (torch.Tensor): Bool tensor in shape (
+            seed_num*max_imvote_per_pixel), indicates
+            whether this imvote corresponds to a 2D bbox.
+        num_sampled_seed (int): How many to sample from all imvotes.
+
+    Returns:
+        torch.Tensor: Indices with shape (num_sampled_seed).
+    """
+
+    device = mask.device
+    batch_size = mask.shape[0]
+    sample_inds = mask.new_zeros((batch_size, num_sampled_seed),
+                                 dtype=torch.int64)
+    for bidx in range(batch_size):
+        # return index of non zero elements
+        valid_inds = torch.nonzero(mask[bidx, :]).squeeze(-1)
+        if len(valid_inds) < num_sampled_seed:
+            # compute set t1 - t2
+            t1 = torch.arange(num_sampled_seed, device=device)
+            t2 = valid_inds % num_sampled_seed
+            combined = torch.cat((t1, t2))
+            uniques, counts = combined.unique(return_counts=True)
+            difference = uniques[counts == 1]
+
+            rand_inds = torch.randperm(
+                len(difference),
+                device=device)[:num_sampled_seed - len(valid_inds)]
+            cur_sample_inds = difference[rand_inds]
+            cur_sample_inds = torch.cat((valid_inds, cur_sample_inds))
+        else:
+            rand_inds = torch.randperm(
+                len(valid_inds), device=device)[:num_sampled_seed]
+            cur_sample_inds = valid_inds[rand_inds]
+        sample_inds[bidx, :] = cur_sample_inds
+    return sample_inds
+
+
 @DETECTORS.register_module()
 class ImVoteNet(Base3DDetector):
     r"""`ImVoteNet <https://arxiv.org/abs/2001.10692>`_ for 3D detection."""
 
     def __init__(self,
                  pts_backbone=None,
-                 pts_bbox_head=None,
+                 pts_bbox_heads=None,
                  pts_neck=None,
                  img_backbone=None,
                  img_neck=None,
                  img_roi_head=None,
                  img_rpn_head=None,
+                 img_mlp=None,
+                 fusion_layer=None,
+                 num_sampled_seed=None,
                  train_cfg=None,
                  test_cfg=None,
                  pretrained=None):
 
         super(ImVoteNet, self).__init__()
+
+        # point branch
         if pts_backbone is not None:
-            self.pts_backbone = build_backbone(pts_backbone)
+            self.pts_backbone = builder.build_backbone(pts_backbone)
         if pts_neck is not None:
-            self.pts_neck = build_neck(pts_neck)
-        if pts_bbox_head is not None:
-            pts_bbox_head.update(train_cfg=train_cfg, test_cfg=test_cfg)
-            self.pts_bbox_head = build_head(pts_bbox_head)
-
+            self.pts_neck = builder.build_neck(pts_neck)
+        if pts_bbox_heads is not None:
+            pts_bbox_head_common = pts_bbox_heads.common
+            pts_bbox_head_common.update(
+                train_cfg=train_cfg.pts if train_cfg is not None else None)
+            pts_bbox_head_common.update(test_cfg=test_cfg.pts)
+            pts_bbox_head_joint = pts_bbox_head_common.copy()
+            pts_bbox_head_joint.update(pts_bbox_heads.joint)
+            pts_bbox_head_pts = pts_bbox_head_common.copy()
+            pts_bbox_head_pts.update(pts_bbox_heads.pts)
+            pts_bbox_head_img = pts_bbox_head_common.copy()
+            pts_bbox_head_img.update(pts_bbox_heads.img)
+
+            self.pts_bbox_head_joint = builder.build_head(pts_bbox_head_joint)
+            self.pts_bbox_head_pts = builder.build_head(pts_bbox_head_pts)
+            self.pts_bbox_head_img = builder.build_head(pts_bbox_head_img)
+            self.pts_bbox_heads = [
+                self.pts_bbox_head_joint, self.pts_bbox_head_pts,
+                self.pts_bbox_head_img
+            ]
+            self.loss_weights = pts_bbox_heads.loss_weights
+
+        # image branch
         if img_backbone:
-            self.img_backbone = build_backbone(img_backbone)
+            self.img_backbone = builder.build_backbone(img_backbone)
         if img_neck is not None:
-            self.img_neck = build_neck(img_neck)
+            self.img_neck = builder.build_neck(img_neck)
         if img_rpn_head is not None:
             rpn_train_cfg = train_cfg.img_rpn if train_cfg \
                 is not None else None
             img_rpn_head_ = img_rpn_head.copy()
             img_rpn_head_.update(
                 train_cfg=rpn_train_cfg, test_cfg=test_cfg.img_rpn)
-            self.img_rpn_head = build_head(img_rpn_head_)
+            self.img_rpn_head = builder.build_head(img_rpn_head_)
         if img_roi_head is not None:
             rcnn_train_cfg = train_cfg.img_rcnn if train_cfg \
                 is not None else None
             img_roi_head.update(
                 train_cfg=rcnn_train_cfg, test_cfg=test_cfg.img_rcnn)
-            self.img_roi_head = build_head(img_roi_head)
+            self.img_roi_head = builder.build_head(img_roi_head)
+
+        # fusion
+        if fusion_layer is not None:
+            self.fusion_layer = builder.build_fusion_layer(fusion_layer)
+            self.max_imvote_per_pixel = fusion_layer.max_imvote_per_pixel
+
+            # fusion layer exists -> stage 2 training -> freeze img branch
+            if self.with_img_bbox_head:
+                for param in self.img_bbox_head.parameters():
+                    param.requires_grad = False
+            if self.with_img_backbone:
+                for param in self.img_backbone.parameters():
+                    param.requires_grad = False
+            if self.with_img_neck:
+                for param in self.img_neck.parameters():
+                    param.requires_grad = False
+            if self.with_img_rpn:
+                for param in self.img_rpn_head.parameters():
+                    param.requires_grad = False
+            if self.with_img_roi_head:
+                for param in self.img_roi_head.parameters():
+                    param.requires_grad = False
+
+        if img_mlp is not None:
+            self.img_mlp = ImageMLPModule(**img_mlp)
+        self.num_sampled_seed = num_sampled_seed
 
         self.train_cfg = train_cfg
         self.test_cfg = test_cfg
@@ -87,8 +182,21 @@ def init_weights(self, pretrained=None):
             else:
                 self.pts_neck.init_weights()
 
+    def set_img_branch_eval_mode(self):
+        if self.with_img_bbox_head:
+            self.img_bbox_head.eval()
+        if self.with_img_backbone:
+            self.img_backbone.eval()
+        if self.with_img_neck:
+            self.img_neck.eval()
+        if self.with_img_rpn:
+            self.img_rpn_head.eval()
+        if self.with_img_roi_head:
+            self.img_roi_head.eval()
+
     def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
                               missing_keys, unexpected_keys, error_msgs):
+        # overload in order to load img network ckpts into img branch
         module_names = ['backbone', 'neck', 'roi_head', 'rpn_head']
         for key in list(state_dict):
             for module_name in module_names:
@@ -100,18 +208,6 @@ def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
                                       strict, missing_keys, unexpected_keys,
                                       error_msgs)
 
-    @property
-    def with_img_shared_head(self):
-        """bool: Whether the detector has a shared head in image branch."""
-        return hasattr(self,
-                       'img_shared_head') and self.img_shared_head is not None
-
-    @property
-    def with_pts_bbox(self):
-        """bool: Whether the detector has a 3D box head."""
-        return hasattr(self,
-                       'pts_bbox_head') and self.pts_bbox_head is not None
-
     @property
     def with_img_bbox(self):
         """bool: Whether the detector has a 2D image box head."""
@@ -119,26 +215,22 @@ def with_img_bbox(self):
                 or (hasattr(self, 'img_bbox_head')
                     and self.img_bbox_head is not None))
 
+    @property
+    def with_img_bbox_head(self):
+        """bool: Whether the detector has a 2D image box head (not roi)."""
+        return hasattr(self,
+                       'img_bbox_head') and self.img_bbox_head is not None
+
     @property
     def with_img_backbone(self):
         """bool: Whether the detector has a 2D image backbone."""
         return hasattr(self, 'img_backbone') and self.img_backbone is not None
 
-    @property
-    def with_pts_backbone(self):
-        """bool: Whether the detector has a 3D backbone."""
-        return hasattr(self, 'pts_backbone') and self.pts_backbone is not None
-
     @property
     def with_img_neck(self):
         """bool: Whether the detector has a neck in image branch."""
         return hasattr(self, 'img_neck') and self.img_neck is not None
 
-    @property
-    def with_pts_neck(self):
-        """bool: Whether the detector has a neck in 3D detector branch."""
-        return hasattr(self, 'pts_neck') and self.pts_neck is not None
-
     @property
     def with_img_rpn(self):
         """bool: Whether the detector has a 2D RPN in image detector branch."""
@@ -149,25 +241,124 @@ def with_img_roi_head(self):
         """bool: Whether the detector has a RoI Head in image branch."""
         return hasattr(self, 'img_roi_head') and self.img_roi_head is not None
 
+    @property
+    def with_pts_bbox(self):
+        """bool: Whether the detector has a 3D box head."""
+        return hasattr(self,
+                       'pts_bbox_head') and self.pts_bbox_head is not None
+
+    @property
+    def with_pts_backbone(self):
+        """bool: Whether the detector has a 3D backbone."""
+        return hasattr(self, 'pts_backbone') and self.pts_backbone is not None
+
+    @property
+    def with_pts_neck(self):
+        """bool: Whether the detector has a neck in 3D detector branch."""
+        return hasattr(self, 'pts_neck') and self.pts_neck is not None
+
+    def extract_feat(self, imgs):
+        """Just to inherit from abstract method."""
+        pass
+
     def extract_img_feat(self, img):
-        """Directly extract features from the backbone+neck."""
+        """Directly extract features from the img backbone+neck."""
         x = self.img_backbone(img)
         if self.with_img_neck:
             x = self.img_neck(x)
         return x
 
+    def extract_img_feats(self, imgs):
+        """Extract features from multiple images.
+
+        Args:
+            imgs (list[torch.Tensor]): A list of images. The images are
+                augmented from the same image but in different ways.
+
+        Returns:
+            list[torch.Tensor]: Features of different images
+        """
+
+        assert isinstance(imgs, list)
+        return [self.extract_img_feat(img) for img in imgs]
+
     def extract_pts_feat(self, pts):
         """Extract features of points."""
-        x = self.backbone(pts)
-        if self.with_neck:
-            x = self.neck(x)
-        return x
+        x = self.pts_backbone(pts)
+        if self.with_pts_neck:
+            x = self.pts_neck(x)
+
+        seed_points = x['fp_xyz'][-1]
+        seed_features = x['fp_features'][-1]
+        seed_indices = x['fp_indices'][-1]
+
+        return (seed_points, seed_features, seed_indices)
+
+    def extract_pts_feats(self, pts):
+        """Extract features of points from multiple samples."""
+        assert isinstance(pts, list)
+        return [self.extract_pts_feat(pt) for pt in pts]
 
-    def extract_feat(self, pts, img):
-        """Extract features from images and points."""
-        img_feats = self.extract_img_feat(img)
-        pts_feats = self.extract_pts_feat(pts)
-        return (img_feats, pts_feats)
+    def extract_bboxes_2d(self,
+                          img,
+                          img_metas,
+                          train=True,
+                          bboxes_2d=None,
+                          **kwargs):
+        """Extract bounding boxes from 2d detector.
+
+        Args:
+            img (torch.Tensor): of shape (N, C, H, W) encoding input images.
+                Typically these should be mean centered and std scaled.
+            img_metas (list[dict]): Image meta info.
+            train (bool): train-time or not.
+            bboxes_2d (list[torch.Tensor]): provided 2d bboxes,
+                not supported yet.
+
+        Return:
+            list[torch.Tensor]: a list of processed 2d bounding boxes.
+        """
+
+        if bboxes_2d is None:
+            self.set_img_branch_eval_mode()
+
+            x = self.extract_img_feat(img)
+            proposal_list = self.img_rpn_head.simple_test_rpn(x, img_metas)
+            rets = self.img_roi_head.simple_test(
+                x, proposal_list, img_metas, rescale=False)
+
+            rets_processed = []
+            for ret in rets:
+                tmp = np.concatenate(ret, axis=0)
+                sem_class = img.new_zeros((len(tmp)))
+                start = 0
+                for i, bboxes in enumerate(ret):
+                    sem_class[start:start + len(bboxes)] = i
+                    start += len(bboxes)
+                ret = img.new_tensor(tmp)
+
+                # append class index
+                ret = torch.cat([ret, sem_class[:, None]], dim=-1)
+                inds = torch.argsort(ret[:, 4], descending=True)
+                ret = ret.index_select(0, inds)
+
+                # drop half bboxes during training for better generalization
+                if train:
+                    rand_drop = torch.randperm(len(ret))[:(len(ret) + 1) // 2]
+                    rand_drop = torch.sort(rand_drop)[0]
+                    ret = ret[rand_drop]
+
+                rets_processed.append(ret.float())
+            return rets_processed
+        else:
+            rets_processed = []
+            for ret in bboxes_2d:
+                if len(ret) > 0 and train:
+                    rand_drop = torch.randperm(len(ret))[:(len(ret) + 1) // 2]
+                    rand_drop = torch.sort(rand_drop)[0]
+                    ret = ret[rand_drop]
+                rets_processed.append(ret.float())
+            return rets_processed
 
     def forward_train(self,
                       points=None,
@@ -178,28 +369,46 @@ def forward_train(self,
                       gt_bboxes_ignore=None,
                       gt_masks=None,
                       proposals=None,
+                      calibs=None,
+                      bboxes_2d=None,
+                      gt_bboxes_3d=None,
+                      gt_labels_3d=None,
+                      pts_semantic_mask=None,
+                      pts_instance_mask=None,
                       **kwargs):
-        """ Forward of training for image only or image and points.
+        """Forwarding of train for image branch pretrain or stage 2 train.
+
         Args:
-            img (Tensor): of shape (N, C, H, W) encoding input images.
+            points (list[torch.Tensor]): Points of each batch.
+            img (torch.Tensor): of shape (N, C, H, W) encoding input images.
                 Typically these should be mean centered and std scaled.
-            img_metas (list[dict]): list of image info dict where each dict
-                has: 'img_shape', 'scale_factor', 'flip', and may also contain
-                'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'.
-                For details on the values of these keys see
-                `mmdet/datasets/pipelines/formatting.py:Collect`.
-            gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
-                shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
-            gt_labels (list[Tensor]): class indices corresponding to each box
-            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
-                boxes can be ignored when computing the loss.
-            gt_masks (None | Tensor) : true segmentation masks for each box
-                used if the architecture supports a segmentation task.
-            proposals : override rpn proposals with custom proposals. Use when
-                `with_rpn` is False.
+            img_metas (list[dict]): list of image and point cloud meta info
+                dict. For example, keys include 'ori_shape', 'img_norm_cfg',
+                and 'transformation_3d_flow'. For details on the values of
+                the keys see `mmdet/datasets/pipelines/formatting.py:Collect`.
+            gt_bboxes (list[torch.Tensor]): Ground truth bboxes for each image
+                with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
+            gt_labels (list[torch.Tensor]): class indices for each
+                2d bounding box.
+            gt_bboxes_ignore (None | list[torch.Tensor]): specify which
+                2d bounding boxes can be ignored when computing the loss.
+            gt_masks (None | torch.Tensor): true segmentation masks for each
+                2d bbox, used if the architecture supports a segmentation task.
+            proposals: override rpn proposals (2d) with custom proposals.
+                Use when `with_rpn` is False.
+            calibs (dict[str, torch.Tensor]): camera calibration matrices,
+                Rt and K.
+            bboxes_2d (list[torch.Tensor]): provided 2d bboxes,
+                not supported yet.
+            gt_bboxes_3d (:obj:`BaseInstance3DBoxes`): 3d gt bboxes.
+            gt_labels_3d (list[torch.Tensor]): gt class labels for 3d bboxes.
+            pts_semantic_mask (None | list[torch.Tensor]): point-wise semantic
+                label of each batch.
+            pts_instance_mask (None | list[torch.Tensor]): point-wise instance
+                label of each batch.
 
         Returns:
-            dict[str, Tensor]: a dictionary of loss components
+            dict[str, torch.Tensor]: a dictionary of loss components.
         """
 
         if points is None:
@@ -209,7 +418,7 @@ def forward_train(self,
             # RPN forward and loss
             if self.with_img_rpn:
                 proposal_cfg = self.train_cfg.get('img_rpn_proposal',
-                                                  self.test_cfg.rpn)
+                                                  self.test_cfg.img_rpn)
                 rpn_losses, proposal_list = self.img_rpn_head.forward_train(
                     x,
                     img_metas,
@@ -227,10 +436,96 @@ def forward_train(self,
             losses.update(roi_losses)
             return losses
         else:
-            return None
+            with torch.no_grad():
+                bboxes_2d = self.extract_bboxes_2d(
+                    img, img_metas, bboxes_2d=bboxes_2d, **kwargs)
+
+            points = torch.stack(points)
+            seeds_3d, seed_3d_features, seed_indices = \
+                self.extract_pts_feat(points)
+
+            img_features, masks = self.fusion_layer(img, bboxes_2d, seeds_3d,
+                                                    img_metas, calibs)
+
+            inds = sample_valid_seeds(masks, self.num_sampled_seed)
+            batch_size, img_feat_size = img_features.shape[:2]
+            pts_feat_size = seed_3d_features.shape[1]
+            inds_img = inds.reshape(batch_size, 1,
+                                    -1).repeat(1, img_feat_size, 1)
+            img_features = img_features.gather(-1, inds_img)
+            inds = inds % inds.shape[1]
+            inds_seed_xyz = inds.reshape(batch_size, -1, 1).repeat(1, 1, 3)
+            seeds_3d = seeds_3d.gather(1, inds_seed_xyz)
+            inds_seed_feats = inds.reshape(batch_size, 1,
+                                           -1).repeat(1, pts_feat_size, 1)
+            seed_3d_features = seed_3d_features.gather(-1, inds_seed_feats)
+            seed_indices = seed_indices.gather(1, inds)
+
+            img_features = self.img_mlp(img_features)
+            fused_features = torch.cat([seed_3d_features, img_features], dim=1)
+
+            feat_dict_joint = dict(
+                seed_points=seeds_3d,
+                seed_features=fused_features,
+                seed_indices=seed_indices)
+            feat_dict_pts = dict(
+                seed_points=seeds_3d,
+                seed_features=seed_3d_features,
+                seed_indices=seed_indices)
+            feat_dict_img = dict(
+                seed_points=seeds_3d,
+                seed_features=img_features,
+                seed_indices=seed_indices)
+
+            loss_inputs = (points, gt_bboxes_3d, gt_labels_3d,
+                           pts_semantic_mask, pts_instance_mask, img_metas)
+            bbox_preds_joints = self.pts_bbox_head_joint(
+                feat_dict_joint, self.train_cfg.pts.sample_mod)
+            bbox_preds_pts = self.pts_bbox_head_pts(
+                feat_dict_pts, self.train_cfg.pts.sample_mod)
+            bbox_preds_img = self.pts_bbox_head_img(
+                feat_dict_img, self.train_cfg.pts.sample_mod)
+            losses_towers = []
+            losses_joint = self.pts_bbox_head_joint.loss(
+                bbox_preds_joints,
+                *loss_inputs,
+                gt_bboxes_ignore=gt_bboxes_ignore)
+            losses_pts = self.pts_bbox_head_pts.loss(
+                bbox_preds_pts,
+                *loss_inputs,
+                gt_bboxes_ignore=gt_bboxes_ignore)
+            losses_img = self.pts_bbox_head_img.loss(
+                bbox_preds_img,
+                *loss_inputs,
+                gt_bboxes_ignore=gt_bboxes_ignore)
+            losses_towers.append(losses_joint)
+            losses_towers.append(losses_pts)
+            losses_towers.append(losses_img)
+            combined_losses = dict()
+            for loss_term in losses_joint:
+                if 'loss' in loss_term:
+                    combined_losses[loss_term] = 0
+                    for i in range(len(losses_towers)):
+                        combined_losses[loss_term] += \
+                            losses_towers[i][loss_term] * \
+                            self.loss_weights[i]
+                else:
+                    # only save the metric of the joint head
+                    # if it is not a loss
+                    combined_losses[loss_term] = \
+                        losses_towers[0][loss_term]
+
+            return combined_losses
+
+    def forward_test(self,
+                     points=None,
+                     img_metas=None,
+                     img=None,
+                     calibs=None,
+                     bboxes_2d=None,
+                     **kwargs):
+        """Forwarding of test for image branch pretrain or stage 2 train.
 
-    def forward_test(self, points=None, img_metas=None, img=None, **kwargs):
-        """
         Args:
             points (list[torch.Tensor]): the outer list indicates test-time
                 augmentations and inner torch.Tensor should have a shape NxC,
@@ -242,7 +537,15 @@ def forward_test(self, points=None, img_metas=None, img=None, **kwargs):
                 list indicates test-time augmentations and inner
                 torch.Tensor should have a shape NxCxHxW, which contains
                 all images in the batch. Defaults to None.
+            calibs (dict[str, torch.Tensor]): camera calibration matrices,
+                Rt and K.
+            bboxes_2d (list[torch.Tensor]): provided 2d bboxes,
+                not supported yet.
+
+        Returns:
+            list[torch.Tensor]: Predicted 3d boxes.
         """
+
         if points is None:
             for var, name in [(img, 'img'), (img_metas, 'img_metas')]:
                 if not isinstance(var, list):
@@ -262,7 +565,7 @@ def forward_test(self, points=None, img_metas=None, img=None, **kwargs):
                 # proposals.
                 if 'proposals' in kwargs:
                     kwargs['proposals'] = kwargs['proposals'][0]
-                return self.simple_test(
+                return self.simple_test_img_only(
                     img=img[0], img_metas=img_metas[0], **kwargs)
             else:
                 assert img[0].size(0) == 1, 'aug test does not support ' \
@@ -270,7 +573,8 @@ def forward_test(self, points=None, img_metas=None, img=None, **kwargs):
                                          f'{img[0].size(0)}'
                 # TODO: support test augmentation for predefined proposals
                 assert 'proposals' not in kwargs
-                return self.aug_test(img=img, img_metas=img_metas, **kwargs)
+                return self.aug_test_img_only(
+                    img=img, img_metas=img_metas, **kwargs)
 
         else:
             for var, name in [(points, 'points'), (img_metas, 'img_metas')]:
@@ -285,41 +589,28 @@ def forward_test(self, points=None, img_metas=None, img=None, **kwargs):
                     format(len(points), len(img_metas)))
 
             if num_augs == 1:
-                img = [img] if img is None else img
-                return self.simple_test(points[0], img_metas[0], img[0],
-                                        **kwargs)
+                return self.simple_test(
+                    points[0],
+                    img_metas[0],
+                    img[0],
+                    calibs=calibs[0],
+                    bboxes_2d=bboxes_2d[0] if bboxes_2d is not None else None,
+                    **kwargs)
             else:
-                return self.aug_test(points, img_metas, img, **kwargs)
-
-    def simple_test(self,
-                    points=None,
-                    img_metas=None,
-                    img=None,
-                    rescale=False):
-        """Forward of testing.
-
-        Args:
-            points (list[torch.Tensor]): Points of each sample.
-            img_metas (list): Image metas.
-            img (list[torch.Tensor]): Images of each sample.
-            rescale (bool): Whether to rescale results.
-
-        Returns:
-            list: Predicted 3d boxes.
-        """
-        if points is None:
-            return self.simple_test_img_only(img, img_metas, rescale=rescale)
-        else:
-            return self.simple_test_both(
-                points, img_metas, img, rescale=rescale)
+                return self.aug_test(points, img_metas, img, calibs, bboxes_2d,
+                                     **kwargs)
 
     def simple_test_img_only(self,
                              img,
                              img_metas,
                              proposals=None,
                              rescale=False):
-        """Test without augmentation."""
-        assert self.with_img_bbox, 'Bbox head must be implemented.'
+        """Test without augmentation, image network pretrain."""
+
+        assert self.with_img_bbox, 'Img bbox head must be implemented.'
+        assert self.with_img_backbone, 'Img backbone must be implemented.'
+        assert self.with_img_rpn, 'Img rpn must be implemented.'
+        assert self.with_img_roi_head, 'Img roi head must be implemented.'
 
         x = self.extract_img_feat(img)
 
@@ -333,56 +624,136 @@ def simple_test_img_only(self,
 
         return ret
 
-    def simple_test_both(self,
-                         points=None,
-                         img_metas=None,
-                         img=None,
-                         rescale=False):
-        """Forward of testing.
-
-        Args:
-            points (list[torch.Tensor]): Points of each sample.
-            img_metas (list): Image metas.
-            img (list[torch.Tensor]): Images of each sample.
-            rescale (bool): Whether to rescale results.
-
-        Returns:
-            list: Predicted 3d boxes.
-        """
-        return None
-
-    def aug_test(self, points=None, img_metas=None, imgs=None, rescale=False):
-        """Test function with augmentaiton."""
-        if points is None:
-            return self.aug_test_img_only(imgs, img_metas, rescale=rescale)
-        else:
-            return self.aug_test_both(points, img_metas, imgs, rescale=rescale)
-
-    def aug_test_img_only(self, img, img_metas, proposals=None, rescale=False):
-        """Test with augmentations.
+    def simple_test(self,
+                    points=None,
+                    img_metas=None,
+                    img=None,
+                    calibs=None,
+                    bboxes_2d=None,
+                    rescale=False,
+                    **kwargs):
+        """Test without augmentation, stage 2."""
+
+        bboxes_2d = self.extract_bboxes_2d(
+            img, img_metas, train=False, bboxes_2d=bboxes_2d, **kwargs)
+
+        points = torch.stack(points)
+        seeds_3d, seed_3d_features, seed_indices = \
+            self.extract_pts_feat(points)
+
+        img_features, masks = self.fusion_layer(img, bboxes_2d, seeds_3d,
+                                                img_metas, calibs)
+
+        inds = sample_valid_seeds(masks, self.num_sampled_seed)
+        batch_size, img_feat_size = img_features.shape[:2]
+        pts_feat_size = seed_3d_features.shape[1]
+        inds_img = inds.reshape(batch_size, 1, -1).repeat(1, img_feat_size, 1)
+        img_features = img_features.gather(-1, inds_img)
+        inds = inds % inds.shape[1]
+        inds_seed_xyz = inds.reshape(batch_size, -1, 1).repeat(1, 1, 3)
+        seeds_3d = seeds_3d.gather(1, inds_seed_xyz)
+        inds_seed_feats = inds.reshape(batch_size, 1,
+                                       -1).repeat(1, pts_feat_size, 1)
+        seed_3d_features = seed_3d_features.gather(-1, inds_seed_feats)
+        seed_indices = seed_indices.gather(1, inds)
+
+        img_features = self.img_mlp(img_features)
+
+        fused_features = torch.cat([seed_3d_features, img_features], dim=1)
+
+        feat_dict = dict(
+            seed_points=seeds_3d,
+            seed_features=fused_features,
+            seed_indices=seed_indices)
+        bbox_preds = self.pts_bbox_head_joint(feat_dict,
+                                              self.test_cfg.pts.sample_mod)
+        bbox_list = self.pts_bbox_head_joint.get_bboxes(
+            points, bbox_preds, img_metas, rescale=rescale)
+        bbox_results = [
+            bbox3d2result(bboxes, scores, labels)
+            for bboxes, scores, labels in bbox_list
+        ]
+        return bbox_results
+
+    def aug_test_img_only(self, img, img_metas, rescale=False):
+        """Test function with augmentation, image network pretrain.
 
         If rescale is False, then returned bboxes and masks will fit the scale
         of imgs[0].
         """
+
+        assert self.with_img_bbox, 'Img bbox head must be implemented.'
+        assert self.with_img_backbone, 'Img backbone must be implemented.'
+        assert self.with_img_rpn, 'Img rpn must be implemented.'
+        assert self.with_img_roi_head, 'Img roi head must be implemented.'
+
         x = self.extract_img_feats(img)
         proposal_list = self.img_rpn_head.aug_test_rpn(x, img_metas)
+
         return self.img_roi_head.aug_test(
             x, proposal_list, img_metas, rescale=rescale)
 
-    def aug_test_both(self,
-                      points=None,
-                      img_metas=None,
-                      img=None,
-                      rescale=False):
-        """Forward of testing.
-
-        Args:
-            points (list[torch.Tensor]): Points of each sample.
-            img_metas (list): Image metas.
-            img (list[torch.Tensor]): Images of each sample.
-            rescale (bool): Whether to rescale results.
-
-        Returns:
-            list: Predicted 3d boxes.
-        """
-        return None
+    def aug_test(self,
+                 points=None,
+                 img_metas=None,
+                 imgs=None,
+                 calibs=None,
+                 bboxes_2d=None,
+                 rescale=False,
+                 **kwargs):
+        """Test function with augmentation, stage 2."""
+
+        points_cat = [torch.stack(pts) for pts in points]
+        feats = self.extract_pts_feats(points_cat, img_metas)
+
+        # only support aug_test for one sample
+        aug_bboxes = []
+        for x, pts_cat, img_meta, bbox_2d, img, calib in zip(
+                feats, points_cat, img_metas, bboxes_2d, imgs, calibs):
+
+            bbox_2d = self.extract_bboxes_2d(
+                img, img_metas, train=False, bboxes_2d=bbox_2d, **kwargs)
+
+            seeds_3d, seed_3d_features, seed_indices = x
+
+            img_features, masks = self.fusion_layer(img, bbox_2d, seeds_3d,
+                                                    img_metas, calib)
+
+            inds = sample_valid_seeds(masks, self.num_sampled_seed)
+            batch_size, img_feat_size = img_features.shape[:2]
+            pts_feat_size = seed_3d_features.shape[1]
+            inds_img = inds.reshape(batch_size, 1,
+                                    -1).repeat(1, img_feat_size, 1)
+            img_features = img_features.gather(-1, inds_img)
+            inds = inds % inds.shape[1]
+            inds_seed_xyz = inds.reshape(batch_size, -1, 1).repeat(1, 1, 3)
+            seeds_3d = seeds_3d.gather(1, inds_seed_xyz)
+            inds_seed_feats = inds.reshape(batch_size, 1,
+                                           -1).repeat(1, pts_feat_size, 1)
+            seed_3d_features = seed_3d_features.gather(-1, inds_seed_feats)
+            seed_indices = seed_indices.gather(1, inds)
+
+            img_features = self.img_mlp(img_features)
+
+            fused_features = torch.cat([seed_3d_features, img_features], dim=1)
+
+            feat_dict = dict(
+                seed_points=seeds_3d,
+                seed_features=fused_features,
+                seed_indices=seed_indices)
+            bbox_preds = self.pts_bbox_head_joint(feat_dict,
+                                                  self.test_cfg.pts.sample_mod)
+            bbox_list = self.pts_bbox_head_joint.get_bboxes(
+                pts_cat, bbox_preds, img_metas, rescale=rescale)
+
+            bbox_list = [
+                dict(boxes_3d=bboxes, scores_3d=scores, labels_3d=labels)
+                for bboxes, scores, labels in bbox_list
+            ]
+            aug_bboxes.append(bbox_list[0])
+
+        # after merging, bboxes will be rescaled to the original image size
+        merged_bboxes = merge_aug_bboxes_3d(aug_bboxes, img_metas,
+                                            self.bbox_head.test_cfg)
+
+        return [merged_bboxes]
diff --git a/mmdet3d/models/detectors/mvx_two_stage.py b/mmdet3d/models/detectors/mvx_two_stage.py
index 0e1b1a9167..8297f2e746 100644
--- a/mmdet3d/models/detectors/mvx_two_stage.py
+++ b/mmdet3d/models/detectors/mvx_two_stage.py
@@ -1,4 +1,3 @@
-import copy
 import mmcv
 import torch
 from mmcv.parallel import DataContainer as DC
@@ -7,8 +6,8 @@
 from torch import nn as nn
 from torch.nn import functional as F
 
-from mmdet3d.core import (Box3DMode, bbox3d2result, merge_aug_bboxes_3d,
-                          show_result)
+from mmdet3d.core import (Box3DMode, Coord3DMode, bbox3d2result,
+                          merge_aug_bboxes_3d, show_result)
 from mmdet3d.ops import Voxelization
 from mmdet.core import multi_apply
 from mmdet.models import DETECTORS
@@ -486,19 +485,18 @@ def show_results(self, data, result, out_dir):
 
             assert out_dir is not None, 'Expect out_dir, got none.'
             inds = result[batch_id]['pts_bbox']['scores_3d'] > 0.1
-            pred_bboxes = copy.deepcopy(
-                result[batch_id]['pts_bbox']['boxes_3d'][inds].tensor.numpy())
-            # for now we convert points into depth mode
-            if box_mode_3d == Box3DMode.DEPTH:
-                pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
-            elif (box_mode_3d == Box3DMode.CAM) or (box_mode_3d
-                                                    == Box3DMode.LIDAR):
-                points = points[..., [1, 0, 2]]
-                points[..., 0] *= -1
+            pred_bboxes = result[batch_id]['pts_bbox']['boxes_3d'][inds]
+
+            # for now we convert points and bbox into depth mode
+            if (box_mode_3d == Box3DMode.CAM) or (box_mode_3d
+                                                  == Box3DMode.LIDAR):
+                points = Coord3DMode.convert_point(points, Coord3DMode.LIDAR,
+                                                   Coord3DMode.DEPTH)
                 pred_bboxes = Box3DMode.convert(pred_bboxes, box_mode_3d,
                                                 Box3DMode.DEPTH)
-                pred_bboxes[..., 2] += pred_bboxes[..., 5] / 2
-            else:
+            elif box_mode_3d != Box3DMode.DEPTH:
                 ValueError(
                     f'Unsupported box_mode_3d {box_mode_3d} for convertion!')
+
+            pred_bboxes = pred_bboxes.tensor.cpu().numpy()
             show_result(points, None, pred_bboxes, out_dir, file_name)
diff --git a/mmdet3d/models/fusion_layers/point_fusion.py b/mmdet3d/models/fusion_layers/point_fusion.py
index b93b0a2ba1..388a225f8a 100644
--- a/mmdet3d/models/fusion_layers/point_fusion.py
+++ b/mmdet3d/models/fusion_layers/point_fusion.py
@@ -274,7 +274,7 @@ def sample_single(self, img_feats, pts, img_meta):
 
         Args:
             img_feats (torch.Tensor): Image feature map in shape
-                (N, C, H, W).
+                (1, C, H, W).
             pts (torch.Tensor): Points of a single sample.
             img_meta (dict): Meta information of the single sample.
 
diff --git a/mmdet3d/models/model_utils/__init__.py b/mmdet3d/models/model_utils/__init__.py
index b8276279b9..4f4fef0887 100644
--- a/mmdet3d/models/model_utils/__init__.py
+++ b/mmdet3d/models/model_utils/__init__.py
@@ -1,3 +1,4 @@
+from .img_feat_mlp_module import ImageMLPModule
 from .vote_module import VoteModule
 
-__all__ = ['VoteModule']
+__all__ = ['VoteModule', 'ImageMLPModule']
diff --git a/mmdet3d/models/model_utils/img_feat_mlp_module.py b/mmdet3d/models/model_utils/img_feat_mlp_module.py
new file mode 100644
index 0000000000..fdd89f9594
--- /dev/null
+++ b/mmdet3d/models/model_utils/img_feat_mlp_module.py
@@ -0,0 +1,48 @@
+from mmcv.cnn import ConvModule
+from torch import nn as nn
+
+
+class ImageMLPModule(nn.Module):
+    """Image MLP module.
+
+    Pass the image vote features through an MLP.
+
+    Args:
+        in_channels (int): Number of channels of image vote features.
+            Default: 18.
+        conv_channels (tuple[int]): Out channels of the convolution.
+            Default: (256,).
+        conv_cfg (dict): Config of convolution.
+            Default: dict(type='Conv1d').
+        norm_cfg (dict): Config of normalization.
+            Default: dict(type='BN1d').
+        act_cfg (dict): Config of activation.
+            Default: dict(type='ReLU').
+    """
+
+    def __init__(self,
+                 in_channel=18,
+                 conv_channels=(256, ),
+                 conv_cfg=dict(type='Conv1d'),
+                 norm_cfg=dict(type='BN1d'),
+                 act_cfg=dict(type='ReLU')):
+        super().__init__()
+        self.mlp = nn.Sequential()
+        prev_channels = in_channel
+        for i, conv_channel in enumerate(conv_channels):
+            self.mlp.add_module(
+                f'layer{i}',
+                ConvModule(
+                    prev_channels,
+                    conv_channels[i],
+                    1,
+                    padding=0,
+                    conv_cfg=conv_cfg,
+                    norm_cfg=norm_cfg,
+                    act_cfg=act_cfg,
+                    bias=True,
+                    inplace=True))
+            prev_channels = conv_channels[i]
+
+    def forward(self, img_features):
+        return self.mlp(img_features)
diff --git a/mmdet3d/models/roi_heads/bbox_heads/h3d_bbox_head.py b/mmdet3d/models/roi_heads/bbox_heads/h3d_bbox_head.py
index 17f1953272..f7b554b0d1 100644
--- a/mmdet3d/models/roi_heads/bbox_heads/h3d_bbox_head.py
+++ b/mmdet3d/models/roi_heads/bbox_heads/h3d_bbox_head.py
@@ -525,7 +525,8 @@ def multiclass_nms_single(self, obj_scores, sem_scores, bbox, points,
 
         # filter empty boxes and boxes with low score
         scores_mask = (obj_scores > self.test_cfg.score_thr)
-        nonempty_box_inds = torch.nonzero(nonempty_box_mask).flatten()
+        nonempty_box_inds = torch.nonzero(
+            nonempty_box_mask, as_tuple=False).flatten()
         nonempty_mask = torch.zeros_like(bbox_classes).scatter(
             0, nonempty_box_inds[nms_selected], 1)
         selected = (nonempty_mask.bool() & scores_mask.bool())
diff --git a/mmdet3d/models/roi_heads/mask_heads/primitive_head.py b/mmdet3d/models/roi_heads/mask_heads/primitive_head.py
index f4e2303a17..0cac77e2c8 100644
--- a/mmdet3d/models/roi_heads/mask_heads/primitive_head.py
+++ b/mmdet3d/models/roi_heads/mask_heads/primitive_head.py
@@ -369,7 +369,7 @@ def get_targets_single(self,
                 pts_instance_mask[background_mask] = gt_labels_3d.shape[0]
 
         instance_flag = torch.nonzero(
-            pts_semantic_mask != self.num_classes).squeeze(1)
+            pts_semantic_mask != self.num_classes, as_tuple=False).squeeze(1)
         instance_labels = pts_instance_mask[instance_flag].unique()
 
         with_yaw = gt_bboxes_3d.with_yaw
diff --git a/mmdet3d/ops/roiaware_pool3d/points_in_boxes.py b/mmdet3d/ops/roiaware_pool3d/points_in_boxes.py
index ddb1df0f4e..1298ee025c 100644
--- a/mmdet3d/ops/roiaware_pool3d/points_in_boxes.py
+++ b/mmdet3d/ops/roiaware_pool3d/points_in_boxes.py
@@ -21,6 +21,21 @@ def points_in_boxes_gpu(points, boxes):
 
     box_idxs_of_pts = points.new_zeros((batch_size, num_points),
                                        dtype=torch.int).fill_(-1)
+
+    # If manually put the tensor 'points' or 'boxes' on a device
+    # which is not the current device, some temporary variables
+    # will be created on the current device in the cuda op,
+    # and the output will be incorrect.
+    # Therefore, we force the current device to be the same
+    # as the device of the tensors if it was not.
+    # Please refer to https://github.com/open-mmlab/mmdetection3d/issues/305
+    # for the incorrect output before the fix.
+    points_device = points.get_device()
+    assert points_device == boxes.get_device(), \
+        'Points and boxes should be put on the same device'
+    if torch.cuda.current_device() != points_device:
+        torch.cuda.set_device(points_device)
+
     roiaware_pool3d_ext.points_in_boxes_gpu(boxes.contiguous(),
                                             points.contiguous(),
                                             box_idxs_of_pts)
@@ -75,6 +90,14 @@ def points_in_boxes_batch(points, boxes):
 
     box_idxs_of_pts = points.new_zeros((batch_size, num_points, num_boxes),
                                        dtype=torch.int).fill_(0)
+
+    # Same reason as line 25-32
+    points_device = points.get_device()
+    assert points_device == boxes.get_device(), \
+        'Points and boxes should be put on the same device'
+    if torch.cuda.current_device() != points_device:
+        torch.cuda.set_device(points_device)
+
     roiaware_pool3d_ext.points_in_boxes_batch(boxes.contiguous(),
                                               points.contiguous(),
                                               box_idxs_of_pts)
diff --git a/mmdet3d/ops/voxel/scatter_points.py b/mmdet3d/ops/voxel/scatter_points.py
index c2355af604..faf44ad2fb 100644
--- a/mmdet3d/ops/voxel/scatter_points.py
+++ b/mmdet3d/ops/voxel/scatter_points.py
@@ -9,57 +9,42 @@
 class _dynamic_scatter(Function):
 
     @staticmethod
-    def forward(ctx, points, coors, voxel_size, coors_range):
+    def forward(ctx, feats, coors, reduce_type='max'):
         """convert kitti points(N, >=3) to voxels.
 
         Args:
-            points: [N, ndim] float tensor. points[:, :3] contain xyz
-                points and points[:, 3:] contain other information
-                such as reflectivity.
-            voxel_size: [3] list/tuple or array, float. xyz, indicate
-                voxel size
-            coors_range: [6] list/tuple or array, float. indicate voxel range.
-                format: xyzxyz, minmax
-            max_points: int. indicate maximum points contained in a voxel.
-                if  max_points=-1, it means using dynamic_voxelize
-            max_voxels: int. indicate maximum voxels this function create.
-                for second, 20000 is a good choice. you should shuffle
-                points before call this function because max_voxels may
-                drop some points.
+            feats: [N, C] float tensor. points features to be reduced
+                into voxels.
+            coors: [N, ndim] int tensor. corresponding voxel coordinates
+                (specifically multi-dim voxel index) of each points.
+            reduce_type: str. reduce op. support 'max', 'sum' and 'mean'
         Returns:
             tuple
-            voxels: [M, max_points, ndim] float tensor. only contain points
-                    and returned when max_points != -1.
-            coordinates: [M, 3] int32 tensor, always returned.
-            num_points_per_voxel: [M] int32 tensor. Only returned when
-            max_points != -1.
+            voxel_feats: [M, C] float tensor. reduced features. input features
+                that shares the same voxel coordinates are reduced to one row
+            coordinates: [M, ndim] int tensor, voxel coordinates.
         """
-        results = dynamic_point_to_voxel_forward(points, coors, voxel_size,
-                                                 coors_range)
-        (voxels, voxel_coors, num_points_per_voxel, point_to_voxelidx,
-         coor_to_voxelidx) = results
-        ctx.save_for_backward(num_points_per_voxel, point_to_voxelidx,
-                              coor_to_voxelidx)
-        return voxels, voxel_coors, num_points_per_voxel.float()
+        results = dynamic_point_to_voxel_forward(feats, coors, reduce_type)
+        (voxel_feats, voxel_coors, point2voxel_map,
+         voxel_points_count) = results
+        ctx.reduce_type = reduce_type
+        ctx.save_for_backward(feats, voxel_feats, point2voxel_map,
+                              voxel_points_count)
+        ctx.mark_non_differentiable(voxel_coors)
+        return voxel_feats, voxel_coors
 
     @staticmethod
-    def backward(ctx,
-                 grad_output_voxel,
-                 grad_output_voxel_coors=None,
-                 grad_output_num_points=None):
-        (num_points_per_voxel, point_to_voxelidx,
-         coor_to_voxelidx) = ctx.saved_tensors
-        # grad_output_voxel shape: NxMxC
-        num_points = point_to_voxelidx.size(0)
-        num_features = grad_output_voxel.size(-1)
-        grad_points = grad_output_voxel.new_zeros(
-            size=(num_points, num_features))
+    def backward(ctx, grad_voxel_feats, grad_voxel_coors=None):
+        (feats, voxel_feats, point2voxel_map,
+         voxel_points_count) = ctx.saved_tensors
+        grad_feats = torch.zeros_like(feats)
         # TODO: whether to use index put or use cuda_backward
         # To use index put, need point to voxel index
-        dynamic_point_to_voxel_backward(grad_points,
-                                        grad_output_voxel.contiguous(),
-                                        point_to_voxelidx, coor_to_voxelidx)
-        return grad_points, None, None, None
+        dynamic_point_to_voxel_backward(grad_feats,
+                                        grad_voxel_feats.contiguous(), feats,
+                                        voxel_feats, point2voxel_map,
+                                        voxel_points_count, ctx.reduce_type)
+        return grad_feats, None, None
 
 
 dynamic_scatter = _dynamic_scatter.apply
@@ -87,15 +72,8 @@ def __init__(self, voxel_size, point_cloud_range, average_points: bool):
         self.average_points = average_points
 
     def forward_single(self, points, coors):
-        voxels, voxel_coors, num_points = dynamic_scatter(
-            points.contiguous(), coors.contiguous(), self.voxel_size,
-            self.point_cloud_range)
-        if not self.average_points:
-            voxels = torch.max(voxels, dim=1)[0]  # voxels: NxMxC -> NxC
-        else:
-            voxels = (
-                voxels.sum(dim=1, keepdim=False).div(num_points.view(-1, 1)))
-        return voxels, voxel_coors
+        reduce = 'mean' if self.average_points else 'max'
+        return dynamic_scatter(points.contiguous(), coors.contiguous(), reduce)
 
     def forward(self, points, coors):
         """
diff --git a/mmdet3d/ops/voxel/src/scatter_points_cuda.cu b/mmdet3d/ops/voxel/src/scatter_points_cuda.cu
index eb320fab8b..52a70543df 100644
--- a/mmdet3d/ops/voxel/src/scatter_points_cuda.cu
+++ b/mmdet3d/ops/voxel/src/scatter_points_cuda.cu
@@ -1,284 +1,384 @@
 #include <ATen/ATen.h>
 #include <ATen/cuda/CUDAContext.h>
-#include <c10/cuda/CUDAGuard.h>
 #include <torch/types.h>
-
 #include <ATen/cuda/CUDAApplyUtils.cuh>
 
-#define CHECK_CUDA(x) \
+typedef enum { SUM = 0, MEAN = 1, MAX = 2 } reduce_t;
+
+#define CHECK_CUDA(x)                                                          \
   TORCH_CHECK(x.device().is_cuda(), #x " must be a CUDA tensor")
-#define CHECK_CONTIGUOUS(x) \
+#define CHECK_CONTIGUOUS(x)                                                    \
   TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
-#define CHECK_INPUT(x) \
-  CHECK_CUDA(x);       \
+#define CHECK_INPUT(x)                                                         \
+  CHECK_CUDA(x);                                                               \
   CHECK_CONTIGUOUS(x)
 
 namespace {
-int const threadsPerBlock = sizeof(unsigned long long) * 8;
+int const threadsPerBlock = 512;
+int const maxGridDim = 50000;
+} // namespace
+
+__device__ __forceinline__ static void reduceMax(float *address, float val) {
+  int *address_as_i = reinterpret_cast<int *>(address);
+  int old = *address_as_i, assumed;
+  do {
+    assumed = old;
+    old = atomicCAS(address_as_i, assumed,
+                    __float_as_int(fmaxf(val, __int_as_float(assumed))));
+  } while (assumed != old || __int_as_float(old) < val);
 }
 
-template <typename T, typename T_int>
-__global__ void scatter_point_to_voxel_kernel(
-    const T* points, T_int* coor, T_int* point_to_voxelidx,
-    T_int* coor_to_voxelidx, T* voxels, T_int* coors, const int num_features,
-    const int num_points, const int max_points, const int NDim) {
-  const int index = blockIdx.x * threadsPerBlock + threadIdx.x;
-  if (index >= num_points) return;
-
-  int num = point_to_voxelidx[index];
-  int voxelidx = coor_to_voxelidx[index];
-  if (num > -1 && voxelidx > -1) {
-    const int feature_per_thread = 1;
-
-    int start = threadIdx.y * feature_per_thread;
-    auto voxels_offset =
-        voxels + voxelidx * max_points * num_features + num * num_features;
-    auto points_offset = points + index * num_features;
-    for (int k = start; k < start + feature_per_thread; k++) {
-      voxels_offset[k] = points_offset[k];
+__device__ __forceinline__ static void reduceMax(double *address, double val) {
+  unsigned long long *address_as_ull =
+      reinterpret_cast<unsigned long long *>(address);
+  unsigned long long old = *address_as_ull, assumed;
+  do {
+    assumed = old;
+    old = atomicCAS(
+        address_as_ull, assumed,
+        __double_as_longlong(fmax(val, __longlong_as_double(assumed))));
+  } while (assumed != old || __longlong_as_double(old) < val);
+}
+
+// get rid of meaningless warnings when compiling host code
+#ifdef __CUDA_ARCH__
+__device__ __forceinline__ static void reduceAdd(float *address, float val) {
+#if (__CUDA_ARCH__ < 200)
+#warning                                                                       \
+    "compute capability lower than 2.x. fall back to use CAS version of atomicAdd for float32"
+  int *address_as_i = reinterpret_cast<int *>(address);
+  int old = *address_as_i, assumed;
+  do {
+    assumed = old;
+    old = atomicCAS(address_as_i, assumed,
+                    __float_as_int(val + __int_as_float(assumed)));
+  } while (assumed != old);
+#else
+  atomicAdd(address, val);
+#endif
+}
+
+__device__ __forceinline__ static void reduceAdd(double *address, double val) {
+#if (__CUDA_ARCH__ < 600)
+#warning                                                                       \
+    "compute capability lower than 6.x. fall back to use CAS version of atomicAdd for float64"
+  unsigned long long *address_as_ull =
+      reinterpret_cast<unsigned long long *>(address);
+  unsigned long long old = *address_as_ull, assumed;
+  do {
+    assumed = old;
+    old = atomicCAS(address_as_ull, assumed,
+                    __double_as_longlong(val + __longlong_as_double(assumed)));
+  } while (assumed != old);
+#else
+  atomicAdd(address, val);
+#endif
+}
+#endif
+
+template <typename T_int>
+__global__ void coors_id_kernel(const T_int *coors, const T_int *dim,
+                                int64_t *coors_id, const int num_input,
+                                const int NDim) {
+  for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_input;
+       x += gridDim.x * blockDim.x) {
+    const T_int *coor_x = coors + x * NDim;
+    auto coor_id = 0;
+    for (int i = 0; i < NDim && coor_id != -1; i++) {
+      coor_id *= dim[i];
+      auto t = static_cast<int64_t>(coor_x[i]);
+      coor_id = (t < 0) ? -1 : coor_id + t;
     }
-    if (num == 0 && start < NDim) {
-      auto coors_offset = coors + voxelidx * NDim;
-      auto coor_offset = coor + index * NDim;
-      for (int k = start; k < NDim; k++) {
-        coors_offset[k] = coor_offset[k];
+    coors_id[x] = coor_id;
+  }
+}
+
+template <typename T_int>
+__global__ void coors_map_init_kernel(const int64_t *coors_id,
+                                      const T_int *coors_id_argsort,
+                                      int32_t *coors_map, const int num_input) {
+  for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_input;
+       x += gridDim.x * blockDim.x) {
+    auto here = coors_id[coors_id_argsort[x]];
+    if (x == 0) {
+      if (here == -1) { // there is invalid points
+        coors_map[0] = -1;
+      } else {
+        coors_map[0] = 0;
       }
+      continue;
     }
+    auto left = coors_id[coors_id_argsort[x - 1]];
+    coors_map[x] = (left < here) ? 1 : 0;
   }
 }
 
 template <typename T, typename T_int>
-__global__ void map_voxel_to_point_kernel(
-    T* points, T* voxels, T_int* point_to_voxelidx, T_int* coor_to_voxelidx,
-    const int num_features, const int num_points, const int max_points) {
-  const int index = blockIdx.x * threadsPerBlock + threadIdx.x;
-  if (index >= num_points) return;
-  auto num = point_to_voxelidx[index];
-  if (num > -1) {
-    const int feature_per_thread = 1;
-    auto voxelidx = coor_to_voxelidx[index];
-
-    int start = threadIdx.y * feature_per_thread;
-    auto voxels_offset =
-        voxels + voxelidx * max_points * num_features + num * num_features;
-    auto points_offset = points + index * num_features;
-    for (int k = start; k < start + feature_per_thread; k++) {
-      points_offset[k] = voxels_offset[k];
+__global__ void
+feats_reduce_kernel(const T *feats, const T_int *coors, int32_t *coors_map,
+                    int32_t *reduce_count, // shall be 0 at initialization
+                    T *reduced_feats,      // shall be 0 at initialization
+                    T_int *out_coors, const int num_input, const int num_feats,
+                    const int NDim, const reduce_t reduce_type) {
+  for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_input;
+       x += gridDim.x * blockDim.x) {
+    int32_t reduce_to = coors_map[x];
+    if (reduce_to == -1)
+      continue;
+
+    const T_int *coors_offset = coors + x * NDim;
+    T_int *out_coors_offset = out_coors + reduce_to * NDim;
+    for (int i = 0; i < NDim; i++) {
+      out_coors_offset[i] = coors_offset[i];
+    }
+
+    const T *feats_offset = feats + x * num_feats;
+    T *reduced_feats_offset = reduced_feats + reduce_to * num_feats;
+    if (reduce_type == reduce_t::MAX) {
+      for (int i = 0; i < num_feats; i++) {
+        reduceMax(&reduced_feats_offset[i], feats_offset[i]);
+      }
+    } else {
+      if (reduce_type == reduce_t::MEAN) {
+        atomicAdd(&reduce_count[reduce_to], static_cast<int32_t>(1));
+      }
+      for (int i = 0; i < num_feats; i++) {
+        reduceAdd(&reduced_feats_offset[i], feats_offset[i]);
+      }
     }
   }
 }
 
-template <typename T_int>
-__global__ void point_to_voxelidx_kernel(const T_int* coor,
-                                         T_int* point_to_voxelidx,
-                                         T_int* point_to_pointidx,
-                                         const int num_points, const int NDim) {
-  const int index = blockIdx.x * threadsPerBlock + threadIdx.x;
-  auto coor_offset = coor + index * NDim;
-  // skip invalid points
-  if ((index >= num_points) || (coor_offset[0] == -1)) return;
-
-  int num = 0;
-  int coor_x = coor_offset[0];
-  int coor_y = coor_offset[1];
-  int coor_z = coor_offset[2];
-  // only calculate the coors before this coor[index]
-  for (int i = 0; i < index; ++i) {
-    auto prev_coor = coor + i * NDim;
-    if (prev_coor[0] == -1) continue;
-
-    // record voxel
-    if ((prev_coor[0] == coor_x) && (prev_coor[1] == coor_y) &&
-        (prev_coor[2] == coor_z)) {
-      num++;
-      if (num == 1) {
-        point_to_pointidx[index] = i;
+template <typename T>
+__global__ void add_reduce_traceback_grad_kernel(
+    T *grad_feats, const T *grad_reduced_feats, const int32_t *coors_map,
+    const int32_t *reduce_count, const int num_input, const int num_feats,
+    const reduce_t reduce_type) {
+  for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_input;
+       x += gridDim.x * blockDim.x) {
+    int32_t reduce_to = coors_map[x];
+    if (reduce_to == -1) {
+      continue;
+    }
+
+    const int input_offset = x * num_feats;
+    T *grad_feats_offset = grad_feats + input_offset;
+    const int reduced_offset = reduce_to * num_feats;
+    const T *grad_reduced_feats_offset = grad_reduced_feats + reduced_offset;
+
+    if (reduce_type == reduce_t::SUM) {
+      for (int i = 0; i < num_feats; i++) {
+        grad_feats_offset[i] = grad_reduced_feats_offset[i];
+      }
+    } else if (reduce_type == reduce_t::MEAN) {
+      for (int i = 0; i < num_feats; i++) {
+        grad_feats_offset[i] = grad_reduced_feats_offset[i] /
+                               static_cast<T>(reduce_count[reduce_to]);
       }
     }
   }
-  if (num == 0) {
-    point_to_pointidx[index] = index;
-  }
-  point_to_voxelidx[index] = num;
 }
 
-template <typename T_int>
-__global__ void determin_voxel_num(
-    const T_int* coor, T_int* num_points_per_voxel, T_int* point_to_voxelidx,
-    T_int* point_to_pointidx, T_int* coor_to_voxelidx, T_int* voxel_num,
-    T_int* max_points, const int num_points, const int NDim) {
-  // only calculate the coors before this coor[index]
-  for (int i = 0; i < num_points; ++i) {
-    auto coor_offset = coor + i * NDim;
-    if (coor_offset[0] == -1) continue;
-    int point_pos_in_voxel = point_to_voxelidx[i];
-    // record voxel
-    if (point_pos_in_voxel == -1) {
-      // out of max_points or invalid point
-      printf("point_pos_in_voxel == -1, point:%d", i);
+template <typename T>
+__global__ void max_reduce_traceback_scatter_idx_kernel(
+    const T *feats, const T *reduced_feats, int32_t *reduce_from,
+    const int32_t *coors_map, const int num_input, const int num_feats) {
+  for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_input;
+       x += gridDim.x * blockDim.x) {
+    int32_t reduce_to = coors_map[x];
+
+    const int input_offset = x * num_feats;
+    const T *feats_offset = feats + input_offset;
+
+    if (reduce_to == -1) {
       continue;
-    } else if (point_pos_in_voxel == 0) {
-      // record new voxel
-      int voxelidx = voxel_num[0];
-      voxel_num[0] += 1;
-      coor_to_voxelidx[i] = voxelidx;
-      num_points_per_voxel[voxelidx] = 1;
-    } else {
-      int point_idx = point_to_pointidx[i];
-      int voxelidx = coor_to_voxelidx[point_idx];
-      if (voxelidx != -1) {
-        num_points_per_voxel[voxelidx] += 1;
-        coor_to_voxelidx[i] = voxelidx;
-        max_points[0] = max(max_points[0], point_pos_in_voxel + 1);
-      } else {
-        printf("voxelidx = -1, point:%d", i);
+    }
+
+    const int reduced_offset = reduce_to * num_feats;
+    const T *reduced_feats_offset = reduced_feats + reduced_offset;
+    int32_t *reduce_from_offset = reduce_from + reduced_offset;
+
+    for (int i = 0; i < num_feats; i++) {
+      if (feats_offset[i] == reduced_feats_offset[i]) {
+        atomicMin(&reduce_from_offset[i], static_cast<int32_t>(x));
       }
     }
   }
 }
 
+template <typename T>
+__global__ void
+max_reduce_scatter_grad_kernel(T *grad_feats, const T *grad_reduced_feats,
+                               const int32_t *reduce_from,
+                               const int num_reduced, const int num_feats) {
+  for (int x = blockIdx.x * blockDim.x + threadIdx.x; x < num_reduced;
+       x += gridDim.x * blockDim.x) {
+
+    const int reduced_offset = x * num_feats;
+    const int32_t *scatter_to_offset = reduce_from + reduced_offset;
+    const T *grad_reduced_feats_offset = grad_reduced_feats + reduced_offset;
+
+    for (int i = 0; i < num_feats; i++) {
+      grad_feats[scatter_to_offset[i] * num_feats + i] =
+          grad_reduced_feats_offset[i];
+    }
+  }
+}
+
 namespace voxelization {
 
-std::vector<at::Tensor> dynamic_point_to_voxel_forward_gpu(
-    const at::Tensor& points, const at::Tensor& voxel_mapping,
-    const std::vector<float> voxel_size, const std::vector<float> coors_range) {
-  CHECK_INPUT(points);
-  at::cuda::CUDAGuard device_guard(points.device());
+std::vector<at::Tensor>
+dynamic_point_to_voxel_forward_gpu(const at::Tensor &feats,
+                                   const at::Tensor &coors,
+                                   const reduce_t reduce_type) {
+  CHECK_INPUT(feats);
+  CHECK_INPUT(coors);
 
-  const int NDim = voxel_mapping.size(1);
-  const int num_points = points.size(0);
-  const int num_features = points.size(1);
+  const int NDim = coors.size(1);
+  const int num_input = feats.size(0);
+  const int num_feats = feats.size(1);
 
-  std::vector<int> grid_size(NDim);
-  for (int i = 0; i < NDim; ++i) {
-    grid_size[i] =
-        round((coors_range[NDim + i] - coors_range[i]) / voxel_size[i]);
-  }
+  auto coors_id = at::empty({num_input}, coors.options().dtype(torch::kInt64));
+  auto coor_space_dim = coors.max_values(0) + 1;
+  auto coors_map_sorted =
+      at::empty({num_input}, coors.options().dtype(torch::kInt32));
+  auto coors_map =
+      at::empty({num_input}, coors.options().dtype(torch::kInt32));
+  auto num_coors = at::zeros({1}, coors.options().dtype(torch::kInt32));
 
-  // assume the mapping is already given
-  auto point_to_pointidx = -at::ones(
-      {
-          num_points,
-      },
-      voxel_mapping.options());
-  auto point_to_voxelidx = -at::ones(
-      {
-          num_points,
-      },
-      voxel_mapping.options());
-  auto max_points = at::zeros(
-      {
-          1,
-      },
-      voxel_mapping.options());  // must be zero from the begining
-
-  int col_blocks = at::cuda::ATenCeilDiv(num_points, threadsPerBlock);
-  dim3 blocks(col_blocks);
-  dim3 threads(threadsPerBlock);
-  cudaStream_t map_stream = at::cuda::getCurrentCUDAStream();
-  AT_DISPATCH_ALL_TYPES(
-      voxel_mapping.scalar_type(), "determin_duplicate", ([&] {
-        point_to_voxelidx_kernel<int><<<blocks, threads, 0, map_stream>>>(
-            voxel_mapping.data_ptr<int>(), point_to_voxelidx.data_ptr<int>(),
-            point_to_pointidx.data_ptr<int>(), num_points, NDim);
+  AT_DISPATCH_INTEGRAL_TYPES(
+      coors.scalar_type(), "coors_id_kernel", ([&] {
+        dim3 blocks(std::min(at::cuda::ATenCeilDiv(num_input, threadsPerBlock),
+                             maxGridDim));
+        dim3 threads(threadsPerBlock);
+        coors_id_kernel<<<blocks, threads>>>(
+            coors.data_ptr<scalar_t>(), coor_space_dim.data_ptr<scalar_t>(),
+            coors_id.data_ptr<int64_t>(), num_input, NDim);
       }));
-  cudaDeviceSynchronize();
   AT_CUDA_CHECK(cudaGetLastError());
 
-  // make the logic in the CUDA device could accelerate about 10 times
-  auto num_points_per_voxel = at::zeros(
-      {
-          num_points,
-      },
-      voxel_mapping.options());
-  auto coor_to_voxelidx = -at::ones(
-      {
-          num_points,
-      },
-      voxel_mapping.options());
-  auto voxel_num = at::zeros(
-      {
-          1,
-      },
-      voxel_mapping.options());  // must be zero from the begining
-  cudaStream_t logic_stream = at::cuda::getCurrentCUDAStream();
-  AT_DISPATCH_ALL_TYPES(
-      voxel_mapping.scalar_type(), "determin_duplicate", ([&] {
-        determin_voxel_num<int><<<1, 1, 0, logic_stream>>>(
-            voxel_mapping.data_ptr<int>(), num_points_per_voxel.data_ptr<int>(),
-            point_to_voxelidx.data_ptr<int>(),
-            point_to_pointidx.data_ptr<int>(), coor_to_voxelidx.data_ptr<int>(),
-            voxel_num.data_ptr<int>(), max_points.data_ptr<int>(), num_points,
-            NDim);
+  auto coors_id_argsort = coors_id.argsort();
+
+  AT_DISPATCH_INTEGRAL_TYPES(
+      coors_id_argsort.scalar_type(), "coors_map_init_kernel", ([&] {
+        dim3 blocks(std::min(at::cuda::ATenCeilDiv(num_input, threadsPerBlock),
+                             maxGridDim));
+        dim3 threads(threadsPerBlock);
+        coors_map_init_kernel<<<blocks, threads>>>(
+            coors_id.data_ptr<int64_t>(), coors_id_argsort.data_ptr<scalar_t>(),
+            coors_map_sorted.data_ptr<int32_t>(), num_input);
       }));
-  cudaDeviceSynchronize();
   AT_CUDA_CHECK(cudaGetLastError());
 
-  // some temporary data
-  auto max_points_cpu = max_points.to(at::kCPU);
-  int max_points_int = max_points_cpu.data_ptr<int>()[0];
-  auto voxel_num_cpu = voxel_num.to(at::kCPU);
-  int voxel_num_int = voxel_num_cpu.data_ptr<int>()[0];
-  at::Tensor coors =
-      at::zeros({voxel_num_int, NDim}, points.options().dtype(at::kInt));
-  at::Tensor voxels = at::zeros({voxel_num_int, max_points_int, num_features},
-                                points.options());
-
-  // copy point features to voxels
-  dim3 cp_threads(threadsPerBlock, num_features);
-  cudaStream_t cp_stream = at::cuda::getCurrentCUDAStream();
-  AT_DISPATCH_ALL_TYPES(
-      points.scalar_type(), "scatter_point_to_voxel", ([&] {
-        scatter_point_to_voxel_kernel<float, int>
-            <<<blocks, cp_threads, 0, cp_stream>>>(
-                points.data_ptr<float>(), voxel_mapping.data_ptr<int>(),
-                point_to_voxelidx.data_ptr<int>(),
-                coor_to_voxelidx.data_ptr<int>(), voxels.data_ptr<float>(),
-                coors.data_ptr<int>(), num_features, num_points, max_points_int,
-                NDim);
+  coors_map_sorted = coors_map_sorted.cumsum(0, torch::kInt32);
+  coors_map.index_put_(coors_id_argsort, coors_map_sorted);
+
+  const int num_coors_cpu =
+      coors_map_sorted[-1].cpu().data_ptr<int32_t>()[0] + 1;
+  auto out_coors = at::empty({num_coors_cpu, NDim}, coors.options());
+  auto reduced_feats =
+      at::empty({num_coors_cpu, num_feats}, feats.options());
+  auto reduce_count =
+      at::zeros({num_coors_cpu}, coors.options().dtype(torch::kInt32));
+
+  AT_DISPATCH_FLOATING_TYPES(
+      feats.scalar_type(), "feats_reduce_kernel", ([&] {
+        using F_t = scalar_t;
+        AT_DISPATCH_INTEGRAL_TYPES(
+            coors.scalar_type(), "feats_reduce_kernel", ([&] {
+              using I_t = scalar_t;
+
+              if (reduce_type == reduce_t::MAX)
+                reduced_feats.fill_(-std::numeric_limits<F_t>::infinity());
+              else
+                reduced_feats.fill_(static_cast<F_t>(0));
+
+              dim3 blocks(
+                  std::min(at::cuda::ATenCeilDiv(num_input, threadsPerBlock),
+                           maxGridDim));
+              dim3 threads(threadsPerBlock);
+              feats_reduce_kernel<<<blocks, threads>>>(
+                  feats.data_ptr<F_t>(), coors.data_ptr<I_t>(),
+                  coors_map.data_ptr<int32_t>(),
+                  reduce_count.data_ptr<int32_t>(),
+                  reduced_feats.data_ptr<F_t>(), out_coors.data_ptr<I_t>(),
+                  num_input, num_feats, NDim, reduce_type);
+              if (reduce_type == reduce_t::MEAN)
+                reduced_feats /=
+                    reduce_count.unsqueeze(-1).to(reduced_feats.dtype());
+            }));
       }));
-  cudaDeviceSynchronize();
   AT_CUDA_CHECK(cudaGetLastError());
 
-  at::Tensor num_points_per_voxel_out =
-      num_points_per_voxel.slice(/*dim=*/0, /*start=*/0, /*end=*/voxel_num_int);
-  return {voxels, coors, num_points_per_voxel_out, point_to_voxelidx,
-          coor_to_voxelidx};
+  return {reduced_feats, out_coors, coors_map, reduce_count};
 }
 
-void dynamic_point_to_voxel_backward_gpu(at::Tensor& grad_input_points,
-                                         const at::Tensor& grad_output_voxels,
-                                         const at::Tensor& point_to_voxelidx,
-                                         const at::Tensor& coor_to_voxelidx) {
-  CHECK_INPUT(grad_input_points);
-  CHECK_INPUT(grad_output_voxels);
-  CHECK_INPUT(point_to_voxelidx);
-  CHECK_INPUT(coor_to_voxelidx);
-  at::cuda::CUDAGuard device_guard(grad_input_points.device());
+void dynamic_point_to_voxel_backward_gpu(
+    at::Tensor &grad_feats, const at::Tensor &grad_reduced_feats,
+    const at::Tensor &feats, const at::Tensor &reduced_feats,
+    const at::Tensor &coors_map, const at::Tensor &reduce_count,
+    const reduce_t reduce_type) {
+  CHECK_INPUT(grad_feats);
+  CHECK_INPUT(grad_reduced_feats);
+  CHECK_INPUT(feats);
+  CHECK_INPUT(reduced_feats);
+  CHECK_INPUT(coors_map);
+  CHECK_INPUT(reduce_count);
 
-  const int num_points = grad_input_points.size(0);
-  const int num_features = grad_input_points.size(1);
-  const int max_points = grad_output_voxels.size(1);
+  const int num_input = feats.size(0);
+  const int num_reduced = reduced_feats.size(0);
+  const int num_feats = feats.size(1);
 
+  grad_feats.fill_(0);
   // copy voxel grad to points
-  int col_blocks = at::cuda::ATenCeilDiv(num_points, threadsPerBlock);
-  dim3 blocks(col_blocks);
-  dim3 cp_threads(threadsPerBlock, num_features);
-  cudaStream_t cp_stream = at::cuda::getCurrentCUDAStream();
-  AT_DISPATCH_ALL_TYPES(grad_input_points.scalar_type(),
-                        "scatter_point_to_voxel", ([&] {
-                          map_voxel_to_point_kernel<float, int>
-                              <<<blocks, cp_threads, 0, cp_stream>>>(
-                                  grad_input_points.data_ptr<float>(),
-                                  grad_output_voxels.data_ptr<float>(),
-                                  point_to_voxelidx.data_ptr<int>(),
-                                  coor_to_voxelidx.data_ptr<int>(),
-                                  num_features, num_points, max_points);
-                        }));
-  cudaDeviceSynchronize();
-  AT_CUDA_CHECK(cudaGetLastError());
 
+  if (reduce_type == reduce_t::MEAN || reduce_type == reduce_t::SUM) {
+    AT_DISPATCH_FLOATING_TYPES(
+        grad_reduced_feats.scalar_type(), "add_reduce_traceback_grad_kernel",
+        ([&] {
+          dim3 blocks
+              (std::min(at::cuda::ATenCeilDiv(num_input, threadsPerBlock),
+                        maxGridDim));
+          dim3 threads(threadsPerBlock);
+          add_reduce_traceback_grad_kernel<<<blocks, threads>>>(
+              grad_feats.data_ptr<scalar_t>(),
+              grad_reduced_feats.data_ptr<scalar_t>(),
+              coors_map.data_ptr<int32_t>(), reduce_count.data_ptr<int32_t>(),
+              num_input, num_feats, reduce_type);
+        }));
+    AT_CUDA_CHECK(cudaGetLastError());
+  } else {
+    auto reduce_from = at::full({num_reduced, num_feats}, num_input,
+                                coors_map.options().dtype(torch::kInt32));
+    AT_DISPATCH_FLOATING_TYPES(
+        grad_reduced_feats.scalar_type(),
+        "max_reduce_traceback_scatter_idx_kernel", ([&] {
+          dim3 blocks
+              (std::min(at::cuda::ATenCeilDiv(num_input, threadsPerBlock),
+                        maxGridDim));
+          dim3 threads(threadsPerBlock);
+          max_reduce_traceback_scatter_idx_kernel<<<blocks, threads>>>(
+              feats.data_ptr<scalar_t>(), reduced_feats.data_ptr<scalar_t>(),
+              reduce_from.data_ptr<int32_t>(), coors_map.data_ptr<int32_t>(),
+              num_input, num_feats);
+        }));
+    AT_CUDA_CHECK(cudaGetLastError());
+
+    AT_DISPATCH_FLOATING_TYPES(
+        grad_reduced_feats.scalar_type(),
+        "max_reduce_traceback_scatter_idx_kernel", ([&] {
+          dim3 blocks(
+              std::min(at::cuda::ATenCeilDiv(num_reduced, threadsPerBlock),
+                       maxGridDim));
+          dim3 threads(threadsPerBlock);
+          max_reduce_scatter_grad_kernel<<<blocks, threads>>>(
+              grad_feats.data_ptr<scalar_t>(),
+              grad_reduced_feats.data_ptr<scalar_t>(),
+              reduce_from.data_ptr<int32_t>(), num_reduced, num_feats);
+        }));
+    AT_CUDA_CHECK(cudaGetLastError());
+  }
   return;
 }
 
-}  // namespace voxelization
+} // namespace voxelization
diff --git a/mmdet3d/ops/voxel/src/voxelization.h b/mmdet3d/ops/voxel/src/voxelization.h
index 59cd65dd24..47df1d34ec 100644
--- a/mmdet3d/ops/voxel/src/voxelization.h
+++ b/mmdet3d/ops/voxel/src/voxelization.h
@@ -1,50 +1,55 @@
 #pragma once
 #include <torch/extension.h>
 
+typedef enum { SUM = 0, MEAN = 1, MAX = 2 } reduce_t;
+
 namespace voxelization {
 
-int hard_voxelize_cpu(const at::Tensor& points, at::Tensor& voxels,
-                      at::Tensor& coors, at::Tensor& num_points_per_voxel,
+int hard_voxelize_cpu(const at::Tensor &points, at::Tensor &voxels,
+                      at::Tensor &coors, at::Tensor &num_points_per_voxel,
                       const std::vector<float> voxel_size,
                       const std::vector<float> coors_range,
                       const int max_points, const int max_voxels,
                       const int NDim = 3);
 
-void dynamic_voxelize_cpu(const at::Tensor& points, at::Tensor& coors,
+void dynamic_voxelize_cpu(const at::Tensor &points, at::Tensor &coors,
                           const std::vector<float> voxel_size,
                           const std::vector<float> coors_range,
                           const int NDim = 3);
 
 std::vector<at::Tensor> dynamic_point_to_voxel_cpu(
-    const at::Tensor& points, const at::Tensor& voxel_mapping,
+    const at::Tensor &points, const at::Tensor &voxel_mapping,
     const std::vector<float> voxel_size, const std::vector<float> coors_range);
 
 #ifdef WITH_CUDA
-int hard_voxelize_gpu(const at::Tensor& points, at::Tensor& voxels,
-                      at::Tensor& coors, at::Tensor& num_points_per_voxel,
+int hard_voxelize_gpu(const at::Tensor &points, at::Tensor &voxels,
+                      at::Tensor &coors, at::Tensor &num_points_per_voxel,
                       const std::vector<float> voxel_size,
                       const std::vector<float> coors_range,
                       const int max_points, const int max_voxels,
                       const int NDim = 3);
 
-void dynamic_voxelize_gpu(const at::Tensor& points, at::Tensor& coors,
+void dynamic_voxelize_gpu(const at::Tensor &points, at::Tensor &coors,
                           const std::vector<float> voxel_size,
                           const std::vector<float> coors_range,
                           const int NDim = 3);
 
-std::vector<at::Tensor> dynamic_point_to_voxel_forward_gpu(
-    const at::Tensor& points, const at::Tensor& voxel_mapping,
-    const std::vector<float> voxel_size, const std::vector<float> coors_range);
+std::vector<torch::Tensor> dynamic_point_to_voxel_forward_gpu(const torch::Tensor &feats,
+                                                              const torch::Tensor &coors,
+                                                              const reduce_t reduce_type);
 
-void dynamic_point_to_voxel_backward_gpu(at::Tensor& grad_input_points,
-                                         const at::Tensor& grad_output_voxels,
-                                         const at::Tensor& point_to_voxelidx,
-                                         const at::Tensor& coor_to_voxelidx);
+void dynamic_point_to_voxel_backward_gpu(torch::Tensor &grad_feats,
+                                         const torch::Tensor &grad_reduced_feats,
+                                         const torch::Tensor &feats,
+                                         const torch::Tensor &reduced_feats,
+                                         const torch::Tensor &coors_idx,
+                                         const torch::Tensor &reduce_count,
+                                         const reduce_t reduce_type);
 #endif
 
 // Interface for Python
-inline int hard_voxelize(const at::Tensor& points, at::Tensor& voxels,
-                         at::Tensor& coors, at::Tensor& num_points_per_voxel,
+inline int hard_voxelize(const at::Tensor &points, at::Tensor &voxels,
+                         at::Tensor &coors, at::Tensor &num_points_per_voxel,
                          const std::vector<float> voxel_size,
                          const std::vector<float> coors_range,
                          const int max_points, const int max_voxels,
@@ -63,7 +68,7 @@ inline int hard_voxelize(const at::Tensor& points, at::Tensor& voxels,
                            NDim);
 }
 
-inline void dynamic_voxelize(const at::Tensor& points, at::Tensor& coors,
+inline void dynamic_voxelize(const at::Tensor &points, at::Tensor &coors,
                              const std::vector<float> voxel_size,
                              const std::vector<float> coors_range,
                              const int NDim = 3) {
@@ -77,37 +82,49 @@ inline void dynamic_voxelize(const at::Tensor& points, at::Tensor& coors,
   return dynamic_voxelize_cpu(points, coors, voxel_size, coors_range, NDim);
 }
 
-inline std::vector<torch::Tensor> dynamic_point_to_voxel_forward(
-    const at::Tensor& points, const at::Tensor& voxel_mapping,
-    const std::vector<float> voxel_size, const std::vector<float> coors_range) {
-  if (points.device().is_cuda()) {
+inline reduce_t convert_reduce_type(const std::string &reduce_type) {
+  if (reduce_type == "max")
+    return reduce_t::MAX;
+  else if (reduce_type == "sum")
+    return reduce_t::SUM;
+  else if (reduce_type == "mean")
+    return reduce_t::MEAN;
+  else TORCH_CHECK(false, "do not support reduce type " + reduce_type)
+  return reduce_t::SUM;
+}
+
+inline std::vector<torch::Tensor> dynamic_point_to_voxel_forward(const torch::Tensor &feats,
+                                                                 const torch::Tensor &coors,
+                                                                 const std::string &reduce_type) {
+  if (feats.device().is_cuda()) {
 #ifdef WITH_CUDA
-    return dynamic_point_to_voxel_forward_gpu(points, voxel_mapping, voxel_size,
-                                              coors_range);
+    return dynamic_point_to_voxel_forward_gpu(feats, coors, convert_reduce_type(reduce_type));
 #else
-    AT_ERROR("Not compiled with GPU support");
+    TORCH_CHECK(false, "Not compiled with GPU support");
 #endif
   }
-  return dynamic_point_to_voxel_cpu(points, voxel_mapping, voxel_size,
-                                    coors_range);
+  TORCH_CHECK(false, "do not support cpu yet");
+  return std::vector<torch::Tensor>();
 }
 
-inline void dynamic_point_to_voxel_backward(
-    at::Tensor& grad_input_points, const at::Tensor& grad_output_voxels,
-    const at::Tensor& point_to_voxelidx, const at::Tensor& coor_to_voxelidx) {
-  if (grad_input_points.device().is_cuda()) {
+inline void dynamic_point_to_voxel_backward(torch::Tensor &grad_feats,
+                                            const torch::Tensor &grad_reduced_feats,
+                                            const torch::Tensor &feats,
+                                            const torch::Tensor &reduced_feats,
+                                            const torch::Tensor &coors_idx,
+                                            const torch::Tensor &reduce_count,
+                                            const std::string &reduce_type) {
+  if (grad_feats.device().is_cuda()) {
 #ifdef WITH_CUDA
-    return dynamic_point_to_voxel_backward_gpu(
-        grad_input_points, grad_output_voxels, point_to_voxelidx,
-        coor_to_voxelidx);
+    dynamic_point_to_voxel_backward_gpu(
+        grad_feats, grad_reduced_feats, feats, reduced_feats, coors_idx, reduce_count,
+        convert_reduce_type(reduce_type));
+    return;
 #else
-    AT_ERROR("Not compiled with GPU support");
+    TORCH_CHECK(false, "Not compiled with GPU support");
 #endif
   }
-  // return dynamic_point_to_voxel_cpu(points,
-  //                                  voxel_mapping,
-  //                                  voxel_size,
-  //                                  coors_range);
+  TORCH_CHECK(false, "do not support cpu yet");
 }
 
 }  // namespace voxelization
diff --git a/mmdet3d/version.py b/mmdet3d/version.py
index 78b3d45dc4..66df47efb6 100644
--- a/mmdet3d/version.py
+++ b/mmdet3d/version.py
@@ -1,6 +1,6 @@
 # Copyright (c) Open-MMLab. All rights reserved.
 
-__version__ = '0.10.0'
+__version__ = '0.11.0'
 short_version = __version__
 
 
diff --git a/requirements/optional.txt b/requirements/optional.txt
index a9b8ad8e25..d3434d5046 100644
--- a/requirements/optional.txt
+++ b/requirements/optional.txt
@@ -1 +1,2 @@
+open3d
 waymo-open-dataset-tf-2-1-0==1.2.0
diff --git a/resources/open3d_visual.gif b/resources/open3d_visual.gif
new file mode 100644
index 0000000000..02b1f86977
Binary files /dev/null and b/resources/open3d_visual.gif differ
diff --git a/setup.cfg b/setup.cfg
index a0521648c2..b4c6c44fee 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -8,6 +8,6 @@ line_length = 79
 multi_line_output = 0
 known_standard_library = setuptools
 known_first_party = mmdet,mmdet3d
-known_third_party = load_scannet_data,lyft_dataset_sdk,m2r,matplotlib,mmcv,nuimages,numba,numpy,nuscenes,pandas,plyfile,pycocotools,pyquaternion,pytest,recommonmark,scannet_utils,scipy,seaborn,shapely,skimage,tensorflow,terminaltables,torch,trimesh,waymo_open_dataset
+known_third_party = cv2,load_scannet_data,lyft_dataset_sdk,m2r,matplotlib,mmcv,nuimages,numba,numpy,nuscenes,pandas,plyfile,pycocotools,pyquaternion,pytest,recommonmark,scannet_utils,scipy,seaborn,shapely,skimage,tensorflow,terminaltables,torch,trimesh,waymo_open_dataset
 no_lines_before = STDLIB,LOCALFOLDER
 default_section = THIRDPARTY
diff --git a/tests/test_data/test_datasets/test_kitti_dataset.py b/tests/test_data/test_datasets/test_kitti_dataset.py
index 15f2b7a6fd..ebfbbc5484 100644
--- a/tests/test_data/test_datasets/test_kitti_dataset.py
+++ b/tests/test_data/test_datasets/test_kitti_dataset.py
@@ -157,7 +157,7 @@ def test_show():
     labels_3d = torch.tensor([0, 0, 1, 1, 2])
     result = dict(boxes_3d=boxes_3d, scores_3d=scores_3d, labels_3d=labels_3d)
     results = [result]
-    kitti_dataset.show(results, temp_dir)
+    kitti_dataset.show(results, temp_dir, show=False)
     pts_file_path = osp.join(temp_dir, '000000', '000000_points.obj')
     gt_file_path = osp.join(temp_dir, '000000', '000000_gt.ply')
     pred_file_path = osp.join(temp_dir, '000000', '000000_pred.ply')
diff --git a/tests/test_data/test_datasets/test_scannet_dataset.py b/tests/test_data/test_datasets/test_scannet_dataset.py
index ca5c1f5b42..04997fee29 100644
--- a/tests/test_data/test_datasets/test_scannet_dataset.py
+++ b/tests/test_data/test_datasets/test_scannet_dataset.py
@@ -201,7 +201,7 @@ def test_show():
     labels_3d = torch.tensor([0, 0, 0, 0, 0])
     result = dict(boxes_3d=boxes_3d, scores_3d=scores_3d, labels_3d=labels_3d)
     results = [result]
-    scannet_dataset.show(results, temp_dir)
+    scannet_dataset.show(results, temp_dir, show=False)
     pts_file_path = osp.join(temp_dir, 'scene0000_00',
                              'scene0000_00_points.obj')
     gt_file_path = osp.join(temp_dir, 'scene0000_00', 'scene0000_00_gt.ply')
diff --git a/tests/test_data/test_datasets/test_sunrgbd_dataset.py b/tests/test_data/test_datasets/test_sunrgbd_dataset.py
index 532e920e0a..80f8c99cf5 100644
--- a/tests/test_data/test_datasets/test_sunrgbd_dataset.py
+++ b/tests/test_data/test_datasets/test_sunrgbd_dataset.py
@@ -145,7 +145,7 @@ def test_show():
     labels_3d = torch.tensor([0, 0, 0, 0, 0])
     result = dict(boxes_3d=boxes_3d, scores_3d=scores_3d, labels_3d=labels_3d)
     results = [result]
-    sunrgbd_dataset.show(results, temp_dir)
+    sunrgbd_dataset.show(results, temp_dir, show=False)
     pts_file_path = osp.join(temp_dir, '000001', '000001_points.obj')
     gt_file_path = osp.join(temp_dir, '000001', '000001_gt.ply')
     pred_file_path = osp.join(temp_dir, '000001', '000001_pred.ply')
diff --git a/tests/test_models/test_common_modules/test_roiaware_pool3d.py b/tests/test_models/test_common_modules/test_roiaware_pool3d.py
index b85d941c6e..5b40decf86 100644
--- a/tests/test_models/test_common_modules/test_roiaware_pool3d.py
+++ b/tests/test_models/test_common_modules/test_roiaware_pool3d.py
@@ -63,6 +63,14 @@ def test_points_in_boxes_gpu():
     assert point_indices.shape == torch.Size([2, 8])
     assert (point_indices == expected_point_indices).all()
 
+    if torch.cuda.device_count() > 1:
+        pts = pts.to('cuda:1')
+        boxes = boxes.to('cuda:1')
+        expected_point_indices = expected_point_indices.to('cuda:1')
+        point_indices = points_in_boxes_gpu(points=pts, boxes=boxes)
+        assert point_indices.shape == torch.Size([2, 8])
+        assert (point_indices == expected_point_indices).all()
+
 
 def test_points_in_boxes_cpu():
     boxes = torch.tensor(
@@ -110,3 +118,11 @@ def test_points_in_boxes_batch():
         dtype=torch.int32).cuda()
     assert point_indices.shape == torch.Size([1, 15, 2])
     assert (point_indices == expected_point_indices).all()
+
+    if torch.cuda.device_count() > 1:
+        pts = pts.to('cuda:1')
+        boxes = boxes.to('cuda:1')
+        expected_point_indices = expected_point_indices.to('cuda:1')
+        point_indices = points_in_boxes_batch(points=pts, boxes=boxes)
+        assert point_indices.shape == torch.Size([1, 15, 2])
+        assert (point_indices == expected_point_indices).all()
diff --git a/tests/test_models/test_detectors.py b/tests/test_models/test_detectors.py
index bc7ae5d00d..90ad69566f 100644
--- a/tests/test_models/test_detectors.py
+++ b/tests/test_models/test_detectors.py
@@ -62,8 +62,8 @@ def _get_detector_cfg(fname):
     import mmcv
     config = _get_config_module(fname)
     model = copy.deepcopy(config.model)
-    train_cfg = mmcv.Config(copy.deepcopy(config.train_cfg))
-    test_cfg = mmcv.Config(copy.deepcopy(config.test_cfg))
+    train_cfg = mmcv.Config(copy.deepcopy(config.model.train_cfg))
+    test_cfg = mmcv.Config(copy.deepcopy(config.model.test_cfg))
 
     model.update(train_cfg=train_cfg)
     model.update(test_cfg=test_cfg)
diff --git a/tests/test_models/test_forward.py b/tests/test_models/test_forward.py
index b30bfe1a11..aa8c5679d6 100644
--- a/tests/test_models/test_forward.py
+++ b/tests/test_models/test_forward.py
@@ -1,8 +1,8 @@
 """Test model forward process.
 
 CommandLine:
-    pytest tests/test_forward.py
-    xdoctest tests/test_forward.py zero
+    pytest tests/test_models/test_forward.py
+    xdoctest tests/test_models/test_forward.py zero
 """
 import copy
 import numpy as np
@@ -40,20 +40,17 @@ def _get_detector_cfg(fname):
     These are deep copied to allow for safe modification of parameters without
     influencing other tests.
     """
-    import mmcv
     config = _get_config_module(fname)
     model = copy.deepcopy(config.model)
-    train_cfg = mmcv.Config(copy.deepcopy(config.train_cfg))
-    test_cfg = mmcv.Config(copy.deepcopy(config.test_cfg))
-    return model, train_cfg, test_cfg
+    return model
 
 
 def _test_two_stage_forward(cfg_file):
-    model, train_cfg, test_cfg = _get_detector_cfg(cfg_file)
+    model = _get_detector_cfg(cfg_file)
     model['pretrained'] = None
 
     from mmdet.models import build_detector
-    detector = build_detector(model, train_cfg=train_cfg, test_cfg=test_cfg)
+    detector = build_detector(model)
 
     input_shape = (1, 3, 256, 256)
 
@@ -107,11 +104,11 @@ def _test_two_stage_forward(cfg_file):
 
 
 def _test_single_stage_forward(cfg_file):
-    model, train_cfg, test_cfg = _get_detector_cfg(cfg_file)
+    model = _get_detector_cfg(cfg_file)
     model['pretrained'] = None
 
     from mmdet.models import build_detector
-    detector = build_detector(model, train_cfg=train_cfg, test_cfg=test_cfg)
+    detector = build_detector(model)
 
     input_shape = (1, 3, 300, 300)
     mm_inputs = _demo_mm_inputs(input_shape)
diff --git a/tests/test_models/test_heads/test_heads.py b/tests/test_models/test_heads/test_heads.py
index f167aba357..cc1c208be5 100644
--- a/tests/test_models/test_heads/test_heads.py
+++ b/tests/test_models/test_heads/test_heads.py
@@ -52,8 +52,8 @@ def _get_head_cfg(fname):
     import mmcv
     config = _get_config_module(fname)
     model = copy.deepcopy(config.model)
-    train_cfg = mmcv.Config(copy.deepcopy(config.train_cfg))
-    test_cfg = mmcv.Config(copy.deepcopy(config.test_cfg))
+    train_cfg = mmcv.Config(copy.deepcopy(config.model.train_cfg))
+    test_cfg = mmcv.Config(copy.deepcopy(config.model.test_cfg))
 
     bbox_head = model.bbox_head
     bbox_head.update(train_cfg=train_cfg)
@@ -70,8 +70,8 @@ def _get_rpn_head_cfg(fname):
     import mmcv
     config = _get_config_module(fname)
     model = copy.deepcopy(config.model)
-    train_cfg = mmcv.Config(copy.deepcopy(config.train_cfg))
-    test_cfg = mmcv.Config(copy.deepcopy(config.test_cfg))
+    train_cfg = mmcv.Config(copy.deepcopy(config.model.train_cfg))
+    test_cfg = mmcv.Config(copy.deepcopy(config.model.test_cfg))
 
     rpn_head = model.rpn_head
     rpn_head.update(train_cfg=train_cfg.rpn)
@@ -88,8 +88,8 @@ def _get_roi_head_cfg(fname):
     import mmcv
     config = _get_config_module(fname)
     model = copy.deepcopy(config.model)
-    train_cfg = mmcv.Config(copy.deepcopy(config.train_cfg))
-    test_cfg = mmcv.Config(copy.deepcopy(config.test_cfg))
+    train_cfg = mmcv.Config(copy.deepcopy(config.model.train_cfg))
+    test_cfg = mmcv.Config(copy.deepcopy(config.model.test_cfg))
 
     roi_head = model.roi_head
     roi_head.update(train_cfg=train_cfg.rcnn)
@@ -106,8 +106,8 @@ def _get_pts_bbox_head_cfg(fname):
     import mmcv
     config = _get_config_module(fname)
     model = copy.deepcopy(config.model)
-    train_cfg = mmcv.Config(copy.deepcopy(config.train_cfg.pts))
-    test_cfg = mmcv.Config(copy.deepcopy(config.test_cfg.pts))
+    train_cfg = mmcv.Config(copy.deepcopy(config.model.train_cfg.pts))
+    test_cfg = mmcv.Config(copy.deepcopy(config.model.test_cfg.pts))
 
     pts_bbox_head = model.pts_bbox_head
     pts_bbox_head.update(train_cfg=train_cfg)
@@ -124,8 +124,8 @@ def _get_vote_head_cfg(fname):
     import mmcv
     config = _get_config_module(fname)
     model = copy.deepcopy(config.model)
-    train_cfg = mmcv.Config(copy.deepcopy(config.train_cfg))
-    test_cfg = mmcv.Config(copy.deepcopy(config.test_cfg))
+    train_cfg = mmcv.Config(copy.deepcopy(config.model.train_cfg))
+    test_cfg = mmcv.Config(copy.deepcopy(config.model.test_cfg))
 
     vote_head = model.bbox_head
     vote_head.update(train_cfg=train_cfg)
@@ -806,8 +806,8 @@ def test_dcn_center_head():
             out_size_factor=4,
             voxel_size=voxel_size[:2],
             code_size=9),
-        seperate_head=dict(
-            type='DCNSeperateHead',
+        separate_head=dict(
+            type='DCNSeparateHead',
             dcn_config=dict(
                 type='DCN',
                 in_channels=64,
@@ -815,7 +815,7 @@ def test_dcn_center_head():
                 kernel_size=3,
                 padding=1,
                 groups=4,
-                bias=True),
+                bias=False),  # mmcv 1.2.6 doesn't support bias=True anymore
             init_bias=-2.19,
             final_kernel=3),
         loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
diff --git a/tests/test_models/test_voxel_encoder/test_dynamic_scatter.py b/tests/test_models/test_voxel_encoder/test_dynamic_scatter.py
new file mode 100644
index 0000000000..2c0bb4a23b
--- /dev/null
+++ b/tests/test_models/test_voxel_encoder/test_dynamic_scatter.py
@@ -0,0 +1,60 @@
+import pytest
+import torch
+from torch.autograd import gradcheck
+
+from mmdet3d.ops import DynamicScatter
+
+
+def test_dynamic_scatter():
+    if not torch.cuda.is_available():
+        pytest.skip('test requires GPU and torch+cuda')
+
+    feats = torch.rand(
+        size=(200000, 3), dtype=torch.float32, device='cuda') * 100 - 50
+    coors = torch.randint(
+        low=-1, high=20, size=(200000, 3), dtype=torch.int32, device='cuda')
+    coors[coors.min(dim=-1).values < 0] = -1
+
+    dsmean = DynamicScatter([0.32, 0.32, 6],
+                            [-74.88, -74.88, -2, 74.88, 74.88, 4], True)
+    dsmax = DynamicScatter([0.32, 0.32, 6],
+                           [-74.88, -74.88, -2, 74.88, 74.88, 4], False)
+
+    ref_voxel_coors = coors.unique(dim=0, sorted=True)
+    ref_voxel_coors = ref_voxel_coors[ref_voxel_coors.min(dim=-1).values >= 0]
+    ref_voxel_feats_mean = []
+    ref_voxel_feats_max = []
+    for ref_voxel_coor in ref_voxel_coors:
+        voxel_mask = (coors == ref_voxel_coor).all(dim=-1)
+        ref_voxel_feats_mean.append(feats[voxel_mask].mean(dim=0))
+        ref_voxel_feats_max.append(feats[voxel_mask].max(dim=0).values)
+    ref_voxel_feats_mean = torch.stack(ref_voxel_feats_mean)
+    ref_voxel_feats_max = torch.stack(ref_voxel_feats_max)
+
+    feats_out_mean, coors_out_mean = dsmean(feats, coors)
+    seq_mean = (coors_out_mean[:, 0] * 400 + coors_out_mean[:, 1] * 20 +
+                coors_out_mean[:, 2]).argsort()
+    feats_out_mean = feats_out_mean[seq_mean]
+    coors_out_mean = coors_out_mean[seq_mean]
+
+    feats_out_max, coors_out_max = dsmax(feats, coors)
+    seq_max = (coors_out_max[:, 0] * 400 + coors_out_max[:, 1] * 20 +
+               coors_out_max[:, 2]).argsort()
+    feats_out_max = feats_out_max[seq_max]
+    coors_cout_max = coors_out_max[seq_max]
+
+    assert (coors_out_mean == ref_voxel_coors).all()
+    assert torch.allclose(
+        feats_out_mean, ref_voxel_feats_mean, atol=1e-2, rtol=1e-5)
+    assert (coors_cout_max == ref_voxel_coors).all()
+    assert torch.allclose(
+        feats_out_max, ref_voxel_feats_max, atol=1e-2, rtol=1e-5)
+
+    # test grad #
+    feats = torch.rand(
+        size=(100, 4), dtype=torch.float32, device='cuda') * 100 - 50
+    coors = torch.randint(
+        low=-1, high=3, size=(100, 3), dtype=torch.int32, device='cuda')
+    feats.requires_grad_()
+    gradcheck(dsmean, (feats, coors), eps=1e-2, atol=1e-2, rtol=1e-5)
+    gradcheck(dsmax, (feats, coors), eps=1e-2, atol=1e-2, rtol=1e-5)
diff --git a/tests/test_runtime/test_apis.py b/tests/test_runtime/test_apis.py
index a78b1b6efa..edf531c3e5 100644
--- a/tests/test_runtime/test_apis.py
+++ b/tests/test_runtime/test_apis.py
@@ -51,7 +51,8 @@ def test_single_gpu_test():
     if not torch.cuda.is_available():
         pytest.skip('test requires GPU and torch+cuda')
     cfg = _get_config_module('votenet/votenet_16x8_sunrgbd-3d-10class.py')
-    model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
+    cfg.model.train_cfg = None
+    model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg'))
     dataset_cfg = cfg.data.test
     dataset_cfg.data_root = './tests/data/sunrgbd'
     dataset_cfg.ann_file = 'tests/data/sunrgbd/sunrgbd_infos.pkl'
diff --git a/tests/test_runtime/test_config.py b/tests/test_runtime/test_config.py
index 8774fc3f3c..22d995ce3d 100644
--- a/tests/test_runtime/test_config.py
+++ b/tests/test_runtime/test_config.py
@@ -38,18 +38,15 @@ def test_config_build_detector():
         config_mod = Config.fromfile(config_fpath)
 
         config_mod.model
-        config_mod.train_cfg
-        config_mod.test_cfg
+        config_mod.model.train_cfg
+        config_mod.model.test_cfg
         print('Building detector, config_fpath = {!r}'.format(config_fpath))
 
         # Remove pretrained keys to allow for testing in an offline environment
         if 'pretrained' in config_mod.model:
             config_mod.model['pretrained'] = None
 
-        detector = build_detector(
-            config_mod.model,
-            train_cfg=config_mod.train_cfg,
-            test_cfg=config_mod.test_cfg)
+        detector = build_detector(config_mod.model)
         assert detector is not None
 
         if 'roi_head' in config_mod.model.keys():
diff --git a/tests/test_utils/test_anchors.py b/tests/test_utils/test_anchors.py
index 4e73a7c458..eb6172d6d7 100644
--- a/tests/test_utils/test_anchors.py
+++ b/tests/test_utils/test_anchors.py
@@ -1,7 +1,7 @@
 """
 CommandLine:
-    pytest tests/test_anchor.py
-    xdoctest tests/test_anchor.py zero
+    pytest tests/test_utils/test_anchor.py
+    xdoctest tests/test_utils/test_anchor.py zero
 
 """
 import torch
diff --git a/tests/test_utils/test_assigners.py b/tests/test_utils/test_assigners.py
index b681a48c2b..38e58bfe47 100644
--- a/tests/test_utils/test_assigners.py
+++ b/tests/test_utils/test_assigners.py
@@ -1,8 +1,8 @@
 """Tests the Assigner objects.
 
 CommandLine:
-    pytest tests/test_assigner.py
-    xdoctest tests/test_assigner.py zero
+    pytest tests/test_utils/test_assigner.py
+    xdoctest tests/test_utils/test_assigner.py zero
 """
 import torch
 
diff --git a/tools/analyze_logs.py b/tools/analysis_tools/analyze_logs.py
similarity index 100%
rename from tools/analyze_logs.py
rename to tools/analysis_tools/analyze_logs.py
diff --git a/tools/benchmark.py b/tools/analysis_tools/benchmark.py
similarity index 96%
rename from tools/benchmark.py
rename to tools/analysis_tools/benchmark.py
index b77b2cfc82..919e258b68 100644
--- a/tools/benchmark.py
+++ b/tools/analysis_tools/benchmark.py
@@ -48,7 +48,8 @@ def main():
         shuffle=False)
 
     # build the model and load checkpoint
-    model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
+    cfg.model.train_cfg = None
+    model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg'))
     fp16_cfg = cfg.get('fp16', None)
     if fp16_cfg is not None:
         wrap_fp16_model(model)
diff --git a/tools/fuse_conv_bn.py b/tools/misc/fuse_conv_bn.py
similarity index 100%
rename from tools/fuse_conv_bn.py
rename to tools/misc/fuse_conv_bn.py
diff --git a/tools/print_config.py b/tools/misc/print_config.py
similarity index 100%
rename from tools/print_config.py
rename to tools/misc/print_config.py
diff --git a/tools/misc/visualize_results.py b/tools/misc/visualize_results.py
new file mode 100644
index 0000000000..38b4309eba
--- /dev/null
+++ b/tools/misc/visualize_results.py
@@ -0,0 +1,43 @@
+import argparse
+import mmcv
+from mmcv import Config
+
+from mmdet3d.datasets import build_dataset
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='MMDet3D visualize the results')
+    parser.add_argument('config', help='test config file path')
+    parser.add_argument('--result', help='results file in pickle format')
+    parser.add_argument(
+        '--show-dir', help='directory where visualize results will be saved')
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    args = parse_args()
+
+    if args.result is not None and \
+            not args.result.endswith(('.pkl', '.pickle')):
+        raise ValueError('The results file must be a pkl file.')
+
+    cfg = Config.fromfile(args.config)
+    cfg.data.test.test_mode = True
+
+    # build the dataset
+    dataset = build_dataset(cfg.data.test)
+    results = mmcv.load(args.result)
+
+    if getattr(dataset, 'show', None) is not None:
+        dataset.show(results, args.show_dir)
+    else:
+        raise NotImplementedError(
+            'Show is not implemented for dataset {}!'.format(
+                type(dataset).__name__))
+
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/convert_votenet_checkpoints.py b/tools/model_converters/convert_votenet_checkpoints.py
similarity index 97%
rename from tools/convert_votenet_checkpoints.py
rename to tools/model_converters/convert_votenet_checkpoints.py
index 5996ec5af8..15db9ac02d 100644
--- a/tools/convert_votenet_checkpoints.py
+++ b/tools/model_converters/convert_votenet_checkpoints.py
@@ -77,7 +77,10 @@ def main():
     checkpoint = torch.load(args.checkpoint)
     cfg = parse_config(checkpoint['meta']['config'])
     # Build the model and load checkpoint
-    model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
+    model = build_detector(
+        cfg.model,
+        train_cfg=cfg.get('train_cfg'),
+        test_cfg=cfg.get('test_cfg'))
     orig_ckpt = checkpoint['state_dict']
     converted_ckpt = orig_ckpt.copy()
 
diff --git a/tools/publish_model.py b/tools/model_converters/publish_model.py
similarity index 100%
rename from tools/publish_model.py
rename to tools/model_converters/publish_model.py
diff --git a/tools/regnet2mmdet.py b/tools/model_converters/regnet2mmdet.py
similarity index 100%
rename from tools/regnet2mmdet.py
rename to tools/model_converters/regnet2mmdet.py
diff --git a/tools/test.py b/tools/test.py
index 260d360e02..967cfcd182 100644
--- a/tools/test.py
+++ b/tools/test.py
@@ -2,16 +2,18 @@
 import mmcv
 import os
 import torch
+import warnings
 from mmcv import Config, DictAction
+from mmcv.cnn import fuse_conv_bn
 from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
-from mmcv.runner import get_dist_info, init_dist, load_checkpoint
+from mmcv.runner import (get_dist_info, init_dist, load_checkpoint,
+                         wrap_fp16_model)
 
 from mmdet3d.apis import single_gpu_test
 from mmdet3d.datasets import build_dataloader, build_dataset
 from mmdet3d.models import build_detector
 from mmdet.apis import multi_gpu_test, set_random_seed
-from mmdet.core import wrap_fp16_model
-from tools.fuse_conv_bn import fuse_module
+from mmdet.datasets import replace_ImageToTensor
 
 
 def parse_args():
@@ -47,14 +49,35 @@ def parse_args():
     parser.add_argument(
         '--tmpdir',
         help='tmp directory used for collecting results from multiple '
-        'workers, available when gpu_collect is not specified')
+        'workers, available when gpu-collect is not specified')
     parser.add_argument('--seed', type=int, default=0, help='random seed')
     parser.add_argument(
         '--deterministic',
         action='store_true',
         help='whether to set deterministic options for CUDNN backend.')
     parser.add_argument(
-        '--options', nargs='+', action=DictAction, help='custom options')
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    parser.add_argument(
+        '--options',
+        nargs='+',
+        action=DictAction,
+        help='custom options for evaluation, the key-value pair in xxx=yyy '
+        'format will be kwargs for dataset.evaluate() function (deprecate), '
+        'change to --eval-options instead.')
+    parser.add_argument(
+        '--eval-options',
+        nargs='+',
+        action=DictAction,
+        help='custom options for evaluation, the key-value pair in xxx=yyy '
+        'format will be kwargs for dataset.evaluate() function')
     parser.add_argument(
         '--launcher',
         choices=['none', 'pytorch', 'slurm', 'mpi'],
@@ -64,16 +87,25 @@ def parse_args():
     args = parser.parse_args()
     if 'LOCAL_RANK' not in os.environ:
         os.environ['LOCAL_RANK'] = str(args.local_rank)
+
+    if args.options and args.eval_options:
+        raise ValueError(
+            '--options and --eval-options cannot be both specified, '
+            '--options is deprecated in favor of --eval-options')
+    if args.options:
+        warnings.warn('--options is deprecated in favor of --eval-options')
+        args.eval_options = args.options
     return args
 
 
 def main():
     args = parse_args()
 
-    assert args.out or args.eval or args.format_only or args.show, \
+    assert args.out or args.eval or args.format_only or args.show \
+        or args.show_dir, \
         ('Please specify at least one operation (save/eval/format/show the '
-         'results) with the argument "--out", "--eval", "--format_only" '
-         'or "--show"')
+         'results / save the results) with the argument "--out", "--eval"'
+         ', "--format-only", "--show" or "--show-dir"')
 
     if args.eval and args.format_only:
         raise ValueError('--eval and --format_only cannot be both specified')
@@ -82,12 +114,34 @@ def main():
         raise ValueError('The output file must be a pkl file.')
 
     cfg = Config.fromfile(args.config)
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+    # import modules from string list.
+    if cfg.get('custom_imports', None):
+        from mmcv.utils import import_modules_from_strings
+        import_modules_from_strings(**cfg['custom_imports'])
     # set cudnn_benchmark
     if cfg.get('cudnn_benchmark', False):
         torch.backends.cudnn.benchmark = True
 
     cfg.model.pretrained = None
-    cfg.data.test.test_mode = True
+    # in case the test dataset is concatenated
+    samples_per_gpu = 1
+    if isinstance(cfg.data.test, dict):
+        cfg.data.test.test_mode = True
+        samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1)
+        if samples_per_gpu > 1:
+            # Replace 'ImageToTensor' to 'DefaultFormatBundle'
+            cfg.data.test.pipeline = replace_ImageToTensor(
+                cfg.data.test.pipeline)
+    elif isinstance(cfg.data.test, list):
+        for ds_cfg in cfg.data.test:
+            ds_cfg.test_mode = True
+        samples_per_gpu = max(
+            [ds_cfg.pop('samples_per_gpu', 1) for ds_cfg in cfg.data.test])
+        if samples_per_gpu > 1:
+            for ds_cfg in cfg.data.test:
+                ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline)
 
     # init distributed env first, since logger depends on the dist info.
     if args.launcher == 'none':
@@ -101,7 +155,6 @@ def main():
         set_random_seed(args.seed, deterministic=args.deterministic)
 
     # build the dataloader
-    samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1)
     dataset = build_dataset(cfg.data.test)
     data_loader = build_dataloader(
         dataset,
@@ -111,16 +164,17 @@ def main():
         shuffle=False)
 
     # build the model and load checkpoint
-    model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
+    cfg.model.train_cfg = None
+    model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg'))
     fp16_cfg = cfg.get('fp16', None)
     if fp16_cfg is not None:
         wrap_fp16_model(model)
     checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
     if args.fuse_conv_bn:
-        model = fuse_module(model)
+        model = fuse_conv_bn(model)
     # old versions did not save class info in checkpoints, this walkaround is
     # for backward compatibility
-    if 'CLASSES' in checkpoint['meta']:
+    if 'CLASSES' in checkpoint.get('meta', {}):
         model.CLASSES = checkpoint['meta']['CLASSES']
     else:
         model.CLASSES = dataset.CLASSES
@@ -141,11 +195,19 @@ def main():
         if args.out:
             print(f'\nwriting results to {args.out}')
             mmcv.dump(outputs, args.out)
-        kwargs = {} if args.options is None else args.options
+        kwargs = {} if args.eval_options is None else args.eval_options
         if args.format_only:
             dataset.format_results(outputs, **kwargs)
         if args.eval:
-            dataset.evaluate(outputs, args.eval, **kwargs)
+            eval_kwargs = cfg.get('evaluation', {}).copy()
+            # hard-code way to remove EvalHook args
+            for key in [
+                    'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',
+                    'rule'
+            ]:
+                eval_kwargs.pop(key, None)
+            eval_kwargs.update(dict(metric=args.eval, **kwargs))
+            print(dataset.evaluate(outputs, **eval_kwargs))
 
 
 if __name__ == '__main__':
diff --git a/tools/train.py b/tools/train.py
index e38b2a6e2a..d6626898f0 100644
--- a/tools/train.py
+++ b/tools/train.py
@@ -7,8 +7,9 @@
 import os
 import time
 import torch
+import warnings
 from mmcv import Config, DictAction
-from mmcv.runner import init_dist
+from mmcv.runner import get_dist_info, init_dist
 from os import path as osp
 
 from mmdet3d import __version__
@@ -46,7 +47,22 @@ def parse_args():
         action='store_true',
         help='whether to set deterministic options for CUDNN backend.')
     parser.add_argument(
-        '--options', nargs='+', action=DictAction, help='arguments in dict')
+        '--options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file (deprecate), '
+        'change to --cfg-options instead.')
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
     parser.add_argument(
         '--launcher',
         choices=['none', 'pytorch', 'slurm', 'mpi'],
@@ -61,6 +77,14 @@ def parse_args():
     if 'LOCAL_RANK' not in os.environ:
         os.environ['LOCAL_RANK'] = str(args.local_rank)
 
+    if args.options and args.cfg_options:
+        raise ValueError(
+            '--options and --cfg-options cannot be both specified, '
+            '--options is deprecated in favor of --cfg-options')
+    if args.options:
+        warnings.warn('--options is deprecated in favor of --cfg-options')
+        args.cfg_options = args.options
+
     return args
 
 
@@ -68,8 +92,12 @@ def main():
     args = parse_args()
 
     cfg = Config.fromfile(args.config)
-    if args.options is not None:
-        cfg.merge_from_dict(args.options)
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+    # import modules from string list.
+    if cfg.get('custom_imports', None):
+        from mmcv.utils import import_modules_from_strings
+        import_modules_from_strings(**cfg['custom_imports'])
 
     # set cudnn_benchmark
     if cfg.get('cudnn_benchmark', False):
@@ -100,9 +128,14 @@ def main():
     else:
         distributed = True
         init_dist(args.launcher, **cfg.dist_params)
+        # re-set gpu_ids with distributed training mode
+        _, world_size = get_dist_info()
+        cfg.gpu_ids = range(world_size)
 
     # create work_dir
     mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
+    # dump config
+    cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))
     # init the logger before other steps
     timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
     log_file = osp.join(cfg.work_dir, f'{timestamp}.log')
@@ -122,6 +155,7 @@ def main():
     logger.info('Environment info:\n' + dash_line + env_info + '\n' +
                 dash_line)
     meta['env_info'] = env_info
+    meta['config'] = cfg.pretty_text
 
     # log some basic info
     logger.info(f'Distributed training: {distributed}')
@@ -134,9 +168,13 @@ def main():
         set_random_seed(args.seed, deterministic=args.deterministic)
     cfg.seed = args.seed
     meta['seed'] = args.seed
+    meta['exp_name'] = osp.basename(args.config)
 
     model = build_detector(
-        cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
+        cfg.model,
+        train_cfg=cfg.get('train_cfg'),
+        test_cfg=cfg.get('test_cfg'))
+
     logger.info(f'Model:\n{model}')
     datasets = [build_dataset(cfg.data.train)]
     if len(cfg.workflow) == 2: