Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOPI] Fix GPU Dynamic Op Schedule #7117

Merged
merged 4 commits into from
Dec 17, 2020

Conversation

kevinthesun
Copy link
Contributor

This PR limits the resources used by dynamic shape gpu kernels to avoid runtime errors. It also skips CallPacked in vm if kernel has only one output and this output is empty, like (1, 0, 6).

After this PR, TF and PT object detection models should be runnable on Nvidia GPU.

@zhiics @Laurawly @mbrookhart

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nitpicks, but overall, it looks great, awesome work.

mod,
[np_indices_result, np_valid_box_count],
only_vm=False,
disable_targets=["nvptx"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the empty output VM change 👍
Why disable nvptx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is issue causing segfault from dynamic nms for nvptx, and generally we need thrust for any dynamic shape sorting. For now nvptx is not ready for these operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'm trying to fix the default sort kernel in #7099, if you want to take a look

@@ -199,6 +199,15 @@ def test_any_concat():
ref = np.concatenate([x_np - 3.0, y_np * 5.0], axis=0)
check_result([x_np, y_np], mod, ref)

num_inputs = 25
x = [relay.var("x", shape=(relay.Any(),), dtype="float32") for _ in range(num_inputs)]
z = relay.op.concatenate(x, axis=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tests the injective schedule 👍

@@ -754,7 +782,22 @@ def non_max_suppression(
)
score_axis = score_index
score_shape = (batch_size, num_anchors)
score_tensor = te.compute(score_shape, lambda i, j: data[i, j, score_axis], tag=tag.ELEMWISE)
data_buf = tvm.tir.decl_buffer(data.shape, data.dtype, "data_buf", data_alignment=8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, but I'm a little surprised it's necessary. Do you have a test case that breaks the current code, or is this mostly for performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the nms workload is large like in RCNN models, general cuda injective schedule can still cause runtime error even with the improvement of this PR. It's common that any dynamic injective op can have runtime issue with current uniform cuda injective schedule.

This problem is not directly related to nms, but cuda injective schedule. Later we might need to revisit this part for gpu dynamic ops and have a better and more general solution(together with more tests).

@@ -194,6 +197,8 @@ def _callback(op):

if cfg.is_fallback:
N, F, Y, X = get_const_tuple(conv.shape)
if not isinstance(N, int):
N = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that hits this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we do have a test for this. Now I enabled all targets.

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevinthesun kevinthesun merged commit bad149e into apache:main Dec 17, 2020
@kevinthesun
Copy link
Contributor Author

Thanks @mbrookhart

masahi pushed a commit to masahi/tvm that referenced this pull request Dec 18, 2020
* Fix GPU dynamic op schedules

* Fix dynamic shape nms

* Fix

* Fix test format
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Jan 20, 2021
* Fix GPU dynamic op schedules

* Fix dynamic shape nms

* Fix

* Fix test format
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jan 21, 2021
* Fix GPU dynamic op schedules

* Fix dynamic shape nms

* Fix

* Fix test format
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
* Fix GPU dynamic op schedules

* Fix dynamic shape nms

* Fix

* Fix test format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants