-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cuda kernel launch of grid sampler #33100
Fix cuda kernel launch of grid sampler #33100
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议增加一个输入较大的单测case
关于benchmark CI failing的说明对于inputs:
在该PR之前, block_size = 4256246 / 512, grid_size = 512, block_size超过了max_thread_num, 所以计算结果实际是错误的。 benchmark部分超时log如下:
|
self.mode = "bilinear" | ||
|
||
def test_check_grad_normal(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- large input导致慢的原因是单测框架的期望梯度算的较慢吧?有测过大概需要多久吗
- 如果已经用了skip_check_grad_ci,下面259~260就不需要写了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 是的,单测框架的期望梯度算的较慢。单个case跑了20min,还没有完成。
- 已删除259~260
53903ae
to
47dfe55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for skip_check_grad_ci
47dfe55
to
1175e54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Bug fixes
PR changes
OPs
Describe
Fix cuda kernel launch of grid sampler
Fix #29066