-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 5 No.5】为 Paddle 增强 scatter API #57748
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
Sorry to inform you that 7dff65c's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
Sorry to inform you that b7aafba's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
抱歉因为这个PR比较复杂,内部评估耽误了一些时间,使用当前这个写法是OK的。辛苦处理下冲突,并再按意见修改下呢~
old = atomicCAS(address_as_ui, assumed, old); | ||
} while (assumed != old); | ||
hsum.x = (size_t)address & 2 ? (old >> 16) : (old & 0xffff); // NOLINT | ||
return hsum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这几个新增的atomic操作的实现算法,有对应的参考的地方吗;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考stackoverflow里的写的。
def test_check_output(self): | ||
self.check_output() | ||
|
||
def test_check_grad(self): | ||
self.check_grad(["X", "Updates"], "Out", check_prim=True) | ||
self.check_grad(["X", "Updates"], "Out") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是新增的单侧,具体不清楚为什么会出错。 试了单独调用api看结果,都是正确的。按套路加上optest后会出错。 看了结果应该是这里面自动推导梯度的结果不正确导致的。实际执行的梯度是正确的,和torch比对过了。
这里要不贴一下这个地方测试的情况吧
@luotao1 @zoooo0820 麻烦再review下,谢谢。 |
@lisamhy 你好,目前还有几个问题
|
相比原有API增加很多功能,所以新能下降是合理的。 |
没有xpu的机器无法测试。不过从单侧看应该是之前就存在的bug。麻烦也帮忙找相关同事看看问题吧。 xpu的代码不变,放在cpu上跑,结果是正常的。 |
这个我处理下。 |
@zoooo0820 @luotao1 麻烦review下,谢谢。 |
你好,从之前CI测试的数据来看,在原测试用例的场景下慢了2.x倍,性能下降比较严重。而理论上新增功能是额外扩展的功能,能辛苦分析下这部分性能问题是哪里带来的吗,以及是否可以优化呢 |
光代码量就可以看出来功能增加很多。PR时间过长了,以上可能有遗漏处。所以慢是合理的。 @zoooo0820 @luotao1 这个PR什么时候可以合入。一直有conflict。 |
@luotao1 @zoooo0820 麻烦review下,谢谢。 |
Sorry to inform you that 25b7d8f's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
你好,从之前api-benchmark的数据来看是前向慢了2倍左右,这个问题还是需要确认下的。
|
![]() ![]() 为了支撑0-dim又新增了一些操作。 新增这么多功能,前向怎么可能还是原来的速度呢? @zoooo0820 @luotao1 麻烦评估下。谢谢。 |
if (index_type == phi::DataType::INT32) { | ||
phi::funcs::ScatterAssign<T, int32_t>(ctx, updates, index, out); | ||
} else { | ||
phi::funcs::ScatterAssign<T, int64_t>(ctx, updates, index, out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce是'assign'时的场景,整合到IndexReduceBaseKernel
是否必要呢,看起来是IndexReduceBaseKernel
因为计算需求调用了很多其他kernel导致的性能问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有必要呀,代码可以统一,也便于维护。IndexReduceBaseKernel性能提升了,这个kernel就提升了。而且ci里的性能是统计这一个api的调用,功能都有,不会只比较assign的。
![image](https://private-user-images.githubusercontent.com/22932751/285445953-0e335b94-2209-412e-8c53-a08e6c99f2ca.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNDU5NjAsIm5iZiI6MTczOTI0NTY2MCwicGF0aCI6Ii8yMjkzMjc1MS8yODU0NDU5NTMtMGUzMzViOTQtMjIwOS00MTJlLThjNTMtYTA4ZTZjOTlmMmNhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDAzNDc0MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTkyOTJiZDI0ODc4NDhmNDU0NTkyYjIzMDBkMjM2NTk1YjdlZjk3YzllNzIyZDY2ZmM2OTcwNDIyOTM4MmY0YTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.KezuD8cdZAl0mJsVnIWKE34wmoqyW4yUp_I3ADZtySQ)
@lisamhy 建议不要整合到 IndexReduceBaseKernel 内,避免影响原来功能 |
新增功能,且ci都过了,所以不会影响原有的功能。 |
感谢对黑客松的支持以及付出的精力,关于这个PR中提出的问题,有这么几点需要解释下:
诚然,这个PR完成了题目要求的功能内容,但对于该PR带来的一些问题是仍然需要解决的,这是对每一个PR的要求而非“新需求”,仍然需要解决上述问题后才能合入。 |
|
声明,版权归个人所有,仅供本人用于参加Hackthon 5 的比赛使用,其它个人或组织不得已任何形式基于此PR修改使用。 The declaration, copyrighted by the individual, is only for personal use in participating in Hackthon 5 competition. No other individual or organization is allowed to modify or use this PR in any form. |
PR types
Function optimization
PR changes
APIs
Description
PaddlePaddle/community#631
#57262
声明,版权归个人所有,仅供本人用于参加Hackthon 5 的比赛使用,其它个人或组织不得已任何形式基于此PR修复使用。
The declaration, copyrighted by the individual, is only for personal use in participating in Hackthon 5 competition. No other individual or organization is allowed to modify or use this PR in any form.