Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix linalg vector norm backward bug #8015

Conversation

BBuf
Copy link
Contributor

@BBuf BBuf commented Apr 13, 2022

之前linalg.vector_norm在ord=0的情况下,是用ScalarLogicalNotEqual+ReduceSum来做的,这会导致在反向的时候后向图断掉。这里修改新增了一个Unary算子NotEqualZero解决了这个问题。

需要说明的是pytorch的linalg.vector_norm在ord=0的时候求梯度是直接用的flow.zeros_like来设置,见:https://github.com/pytorch/pytorch/pull/59135/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5L231-L232 。导致只要ord=0,那么输入的梯度就永远都是0。但这明显是不符合这个api的语义的,倾向于这是一个pytorch bug。所以我还是坚持我们的做法,感觉我们这样才是正确的。

image

例子:

import torch as flow
from torch import linalg as LA
import numpy as np
a = flow.tensor([1.0]).cuda()
a.requires_grad=True
res = LA.vector_norm(a, ord=0)

print(res)
res.sum().backward()
print(a.grad)

pytorch输出0, oneflow输出1. 按照语义来看,梯度确实应该是1.

@BBuf BBuf changed the base branch from master to fix_linalg_vector_norm_and_clip_grad_bug April 13, 2022 09:58
@Flowingsun007
Copy link
Contributor

pytorch输出0, oneflow输出1. 按照语义来看,梯度确实应该是1.

感觉可以试试其他框架譬如paddle的?

@BBuf
Copy link
Contributor Author

BBuf commented Apr 13, 2022

pytorch输出0, oneflow输出1. 按照语义来看,梯度确实应该是1.

感觉可以试试其他框架譬如paddle的?

@BBuf
Copy link
Contributor Author

BBuf commented Apr 13, 2022

pytorch输出0, oneflow输出1. 按照语义来看,梯度确实应该是1.

感觉可以试试其他框架譬如paddle的?

看了一下,paddle没这个接口

@BBuf BBuf merged commit a2f3499 into fix_linalg_vector_norm_and_clip_grad_bug Apr 14, 2022
@BBuf BBuf deleted the fix_linalg_vector_norm_backward_bug branch April 14, 2022 01:41
@liufengwei0103
Copy link
Contributor

pytorch输出0, oneflow输出1. 按照语义来看,梯度确实应该是1.

感觉可以试试其他框架譬如paddle的?

看了一下,paddle没这个接口

我用paddle试了一下,也是0

@@ -153,6 +153,13 @@ struct AtanhFunctor<float> {
}
};

template<>
struct NotEqualZeroFunctor<float> {
static OF_DEVICE_FUNC float Forward(const float x) { return x != 0; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该用 static_cast(0.0), 你这样是拿着float和int比较吧

mergify bot added a commit that referenced this pull request Apr 18, 2022
* fix reduce_sum scalar check bug

* fix linalg vector norm and clip grad bug

* fix comment

* auto format by CI

* Fix linalg vector norm backward bug (#8015)

* has multi definition bug

* fix bug

* fix commnet

* fix bug

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants