Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Torch, CI] Support PyTorch 1.7 #6594

Closed
4 tasks done
masahi opened this issue Sep 30, 2020 · 9 comments · Fixed by #6828
Closed
4 tasks done

[Torch, CI] Support PyTorch 1.7 #6594

masahi opened this issue Sep 30, 2020 · 9 comments · Fixed by #6828

Comments

@masahi
Copy link
Member

masahi commented Sep 30, 2020

Let's summarize steps needed to update our PyTorch support for the latest version, 1.6 (the CI is at 1.4).

cc @t-vi @siju-samuel @kevinthesun @yongwww @alexwong @tqchen

@masahi
Copy link
Member Author

masahi commented Oct 1, 2020

@kevinthesun @yongwww I've taken a look at supporting detection models in 1.6. It turned out we need to convert a function batched_nms that was traced in 1.4, but now it is scripted. This brings a very nasty typing problem.

Looking at this code,
https://github.com/pytorch/vision/blob/6e10e3f88158f12b7a304d3c2f803d2bbdde0823/torchvision/ops/boxes.py#L75-L86

One of the types of if branch is a zero dim empty tensor of dtype int64 https://github.com/pytorch/vision/blob/6e10e3f88158f12b7a304d3c2f803d2bbdde0823/torchvision/ops/boxes.py#L76
while the type of other branch is a dynamic 1D tensor of dtype int32 https://github.com/pytorch/vision/blob/6e10e3f88158f12b7a304d3c2f803d2bbdde0823/torchvision/ops/boxes.py#L85-L86

We cannot type check the converted Relay model with If, because dtype of two branches are different. Even if I force the dtype to be the same, if one branch is Any shape while the other is static, Relay type inference chooses a static shape for the type of If. So the return type of above function becomes a zero dim tensor.

Is there a way to turn a static shape tensor to dynamic, when one of If branch is dynamic while other is static?

@masahi
Copy link
Member Author

masahi commented Oct 1, 2020

@yongwww It seems torchvision nms op https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py#L35 returns int64 indices, while nms in Relay returns int32 bit indices. We need to cast the output indices to int64, that should resolve the one of typing problem (two branches of If having different dtypes, int32 and int64).

@kevinthesun
Copy link
Contributor

@masahi That shape problem coming from If branch looks like a Relay type inference issue to me. Type inference should generate dynamic shape in this case.

@kevinthesun
Copy link
Contributor

kevinthesun commented Oct 10, 2020

@masahi

import tvm  
from tvm import relay

dtype = "float32"
branch_a = relay.var("a", shape=(relay.Any(),), dtype=dtype)
branch_b = relay.var("b", shape=(0,), dtype=dtype)
cond = relay.var("cond", shape=(), dtype="bool")

out = relay.If(cond, branch_a, branch_b)

mod = tvm.IRModule()
mod["main"] = relay.Function([cond, branch_a,branch_b], out)
mod = relay.transform.InferType()(mod)
print(mod["main"])

Will generate

fn (%cond: bool, %a: Tensor[(0), float32], %b: Tensor[(0), float32]) -> Tensor[(0), float32] {
  if (%cond) {
    %a
  } else {
    %b
  }
}

while input a should be with shape(?,). There is an issue when inferring such an expression.

@masahi
Copy link
Member Author

masahi commented Oct 10, 2020

@kevinthesun thanks for pointing this out and providing a minimum test case. I'm going to take a look at what's going on inside type checker and hopefully fix it. With this fixed, I can send a PR to support PyTorch 1.6 mask rcnn and faster rcnn.

@masahi
Copy link
Member Author

masahi commented Oct 10, 2020

Finding suspicious code was easy, here, when lhs is IntImmNode and rhs is AnyNode, ulhs is returned.

https://github.com/apache/incubator-tvm/blob/98c2096f4944bdbdbbb2b7b20ccd35c6c11dfbf6/src/relay/analysis/type_solver.cc#L195-L198

Returning urhs there seems to fix this issue. I'm not sure if this is intended or a bug.

@masahi
Copy link
Member Author

masahi commented Oct 10, 2020

@kevinthesun @jroesch @lixiaoquan @MarisaKirisame I found that this change was introduced in #5795

If I make that above change, that effectively undos that PR and breaks the test case introduced there. Basically, what we want from type checking If involving both static and dynamic shape are complete opposite to the motivation of #5795.

What should we do about this? I think #5795 should be reverted, since type inference is supposed to pick the most general types.

@kevinthesun
Copy link
Contributor

kevinthesun commented Oct 10, 2020

Hmmm, I think that PR is an optimization for some cases but is actually not correct since static dim is just a case of Any but not vise versa. The case provided in that PR only stands when False is fed with (Any, 1) tensor at runtime.

In TF frontend there are lots of cases when we want to make expression as static as possible. My guessing is that False actually has shape (Any, 1) but somehow not fully optimized to be less dynamic. Usually such kind of optimization is done in the frontend.

@lixiaoquan We should revert it?

@masahi
Copy link
Member Author

masahi commented Oct 10, 2020

I've sent the revert to #6658

With this and other minor fixes, I can finally run torchvision detection models from v1.6. See #6659.

@masahi masahi changed the title [Torch, CI] Support PyTorch 1.6 [Torch, CI] Support PyTorch 1.7 Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants