-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improve] Migrating DIVUP to GET_BLOCKS #1586
Conversation
Please @AllentDan have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly replace DIVUP
with GET_BLOCKS
may need kernel loop then
@@ -9,10 +9,10 @@ | |||
|
|||
#define THREADS_PER_BLOCK 512 | |||
|
|||
#define DIVUP(m, n) ((m) / (n) + ((m) % (n) > 0)) | |||
// #define DIVUP(m, n) ((m) / (n) + ((m) % (n) > 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be removed if useless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
mmcv/ops/csrc/parrots/iou3d.cpp
Outdated
@@ -134,7 +134,7 @@ void iou3d_nms_forward(Tensor boxes, Tensor keep, Tensor keep_num, | |||
int64_t *keep_data = keep.data_ptr<int64_t>(); | |||
int64_t *keep_num_data = keep_num.data_ptr<int64_t>(); | |||
|
|||
const int col_blocks = DIVUP(boxes_num, THREADS_PER_BLOCK_NMS); | |||
const int col_blocks = GET_BLOCKS(boxes_num, THREADS_PER_BLOCK_NMS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if actual col_blocks
is greater than 4096?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be unsafe and we should only use GET_BLOCKS
for block allocation. I will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
mmcv/ops/csrc/parrots/iou3d.cpp
Outdated
@@ -189,7 +189,7 @@ void iou3d_nms_normal_forward(Tensor boxes, Tensor keep, Tensor keep_num, | |||
int64_t *keep_data = keep.data_ptr<int64_t>(); | |||
int64_t *keep_num_data = keep_num.data_ptr<int64_t>(); | |||
|
|||
const int col_blocks = DIVUP(boxes_num, THREADS_PER_BLOCK_NMS); | |||
const int col_blocks = GET_BLOCKS(boxes_num, THREADS_PER_BLOCK_NMS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if actual col_blocks
is greater than 4096?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after removing useless lines
const int blocks = | ||
(boxes_num + THREADS_PER_BLOCK_NMS - 1) / THREADS_PER_BLOCK_NMS; | ||
CUDA_2D_KERNEL_BLOCK_LOOP(col_start, blocks, row_start, blocks) { | ||
// if (row_start > col_start) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may remove if useless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This annotation was left before. Shall we delete it? And this statement seems like https://github.com/open-mmlab/mmcv/blame/master/mmcv/ops/csrc/common/cuda/nms_cuda_kernel.cuh#L37, I don't know which is more suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that line seems to depend on how the block is initialized. It is useful when the block is a square or a wide rectangle. As we don't know how the user may initialize the block. Just keep it commented then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it.
const int blocks = | ||
(boxes_num + THREADS_PER_BLOCK_NMS - 1) / THREADS_PER_BLOCK_NMS; | ||
CUDA_2D_KERNEL_BLOCK_LOOP(col_start, blocks, row_start, blocks) { | ||
// if (row_start > col_start) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may remove if useless
Need to resolve conflicts. |
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
resolves #1400
Migrating DIVUP to GET_BLOCKS because GET_BLOCKS looks safer.
Modification
As above.
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist
Before PR:
After PR: