【Hackathon No.19】Add ASGD RFC #68

tiancaishaonvjituizi · 2022-03-28T13:03:01Z

添加 Hackathon 任务 19 ASGD 的 RFC

Signed-off-by: tiancaishaonvjituizi <452565578@qq.com>

CLAassistant · 2022-03-28T13:03:09Z

All committers have signed the CLA.

paddle-bot-old · 2022-03-28T13:03:26Z

你的PR提交成功，感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备，具体请参考示例和模版。
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

tiancaishaonvjituizi · 2022-03-28T13:26:48Z

在调研代码的过程中，我提交了一个 PR 修复了一些 Paddle 中有问题的代码。可以一起 Review 一下~ PaddlePaddle/Paddle#41045

paddle-bot-old · 2022-03-29T01:59:13Z

PR格式检查通过，你的PR将接受Paddle专家以及开源社区的review，请及时关注PR动态。
The format inspection passed. Your PR will be reviewed by experts of Paddle and developers from the open-source community. Stay tuned.

zhiboniu · 2022-03-30T04:18:21Z

rfcs/APIs/20220327_api_design_for_ASGD.md

+
+## 底层OP设计
+
+基本可以仿照飞桨 SGD 优化器的实现，实现 paddle/fluid/operators/optimizers/asgd_op.cc 和相应的 asgd_kernel.h/.cc/.cu。


对PyTorch的调研很详细，关于OP实现这部分也写一下详细点的思路吧，让其他开发者根据设计文档也有实现的思路。

好的，补充啦

还是太简介啦😂 ，这样对其他开发者起不到实际指导的作用。
希望能够把论文中的思想与代码结合的方式展现，所以最好包含以下两部分吧：
1）说明ASGD的思想方法
2）方法如何在paddle的optimizer中实现

添加了较为详细的伪代码，请再看一下

zhiboniu · 2022-03-31T10:01:52Z

rfcs/APIs/20220327_api_design_for_ASGD.md

+class paddle.fluid.optimizer.ASGDOptimizer(learning_rate, parameter_list=None, regularization=None, name=None)
+```
+
+和飞桨中其它优化器的风格保持一致。weight_decay 通过 `regularization` 参数设置，支持 L1/L2 正则。而学习率用 LR Scheduler 来控制，不内置在优化器内。并新增一个 LRScheduler 实现 [2] 中提出的学习率更新策略（具体名字可以后续决定），用户也可以通过使用其它 LRScheduler 或者 LambdaLR，自由选择其它的学习率更新策略。


asgd的学习率更新方法是否是asgd本身所要求的？如果学习率更新方法是asgd所特有，就不应该另外增加一个LR Scheduler。
【建议将asgd论文中的理论公式与代码实现做对应分析】。

完全参考PyTorch 代码实现并不可靠

可能是我写的不够清楚。ASGD 本身没有假设某一种学习率，仅仅只是对参数做平均而已，这一点可以在维基百科和一些学校教材里看到。
此外，ASGD 是在 https://dl.acm.org/doi/10.1137/0330046 和 https://ecommons.cornell.edu/handle/1813/8664 提出的，而 PyTorch ASGD 所用的学习率是在很晚很晚之后的 https://arxiv.org/abs/1107.2490 才提出的，这也可以证明 PyTorch ASGD 所用的学习率更新方法不会是 ASGD 本身要求的，这一点在 PyTorch ASGD 所参考的 bottou-sgd README 里有明确的说明。

我把上面这段解释加入到 RFC 里了，请再看一下。

原论文Acceleration of stochastic approximation by averaging中的t^-1是否说明参数更新量也随着时间需要逐渐减少

bottou-sgd README中也说明了ASGD需要配合学习率的逐渐衰减。
所以我理解这个学习率更新方法应该是ASGD所要求的。（注意这个学习率衰减的原因与深度学习的学习率下降不是一回事，虽然有些LR Schduler也能起到类似的作用，但是彼此是独立的）

其实我更关心的不是学习率是否衰减的问题，而是我们是否对ASGD方法和论文已经分析透彻了

原论文Acceleration of stochastic approximation by averaging中的t^-1是否说明参数更新量也随着时间需要逐渐减少

这段话不是讲 ASGD 的，是说 “SGD 在满足这个条件的情况下也有和 ASGD 一样最佳的收敛速度，但这个条件在实际中很难满足”。

注意这个学习率衰减的原因与深度学习的学习率下降不是一回事，虽然有些LR Schduler也能起到类似的作用，但是彼此是独立的

这句话想表达的意思是不是这样：“ASGD 对学习率衰减策略是有理论上的要求的，不满足要求 ASGD 在理论上就不会有效果” 。

这句话是对的，ASGD 原论文对学习率有一些假设：

但可以看到这些假设是比较宽泛的，可以构造出无数种具体的衰减策略满足这些假设。此外，不满足这些假设的学习率衰减策略就需要被禁止吗？我认为不是的，研究者可能就是想研究 Averaged SGD 在某些特殊条件下的表现。这篇文章 https://arxiv.org/abs/1107.2490 就提到了两个使用固定学习率的研究：

因此，不管是从 ASGD 算法本身，还是从 API 的正交性、可组合性上来说，我认为按照 LR Scheduler + 一个不对学习率有任何假设的 Averaged SGD 的思路来实现都是更好的做法。

再附上 sklearn 的 SGD 实现作为另一个旁证：https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier ，可以看到 average 只是 SGD 的参数之一，是和 learning_rate、penalty 这些参数同等且正交的，并不会因为 average=True 就禁止用户自由设置学习率。

add ASGD rfc

8a36802

Signed-off-by: tiancaishaonvjituizi <452565578@qq.com>

paddle-bot-old bot added contributor status: proposed labels Mar 28, 2022

tiancaishaonvjituizi mentioned this pull request Mar 28, 2022

【PaddlePaddle Hackathon 第二期】任务总览 PaddlePaddle/Paddle#40234

Closed

Refine ASGD RFC

ffd494a

dingjiaweiww added status: open review and removed status: proposed labels Mar 29, 2022

dingjiaweiww assigned zhiboniu and DDDivano Mar 29, 2022

tiancaishaonvjituizi added 4 commits March 29, 2022 10:05

Add PyTorch issue link

4e987e9

Refine RFC

f8a5914

Fix typo

c1588d7

Refine the test

05b4053

zhiboniu reviewed Mar 30, 2022

View reviewed changes

tiancaishaonvjituizi added 2 commits March 30, 2022 17:13

Add more content about op implementation

145d2c1

Add pseudo code

423d6a2

zhiboniu reviewed Mar 31, 2022

View reviewed changes

tiancaishaonvjituizi added 2 commits March 31, 2022 21:58

Address reviews

2daa584

Add more references

a52851e

zhiboniu approved these changes Apr 7, 2022

View reviewed changes

zhiboniu merged commit 9eaf7de into PaddlePaddle:master Apr 8, 2022

tiancaishaonvjituizi mentioned this pull request May 2, 2022

【Hackathon No.19】Implement ASGD optimizer PaddlePaddle/Paddle#42431

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon No.19】Add ASGD RFC #68

【Hackathon No.19】Add ASGD RFC #68

tiancaishaonvjituizi commented Mar 28, 2022 •

edited

Loading

CLAassistant commented Mar 28, 2022 •

edited

Loading

paddle-bot-old bot commented Mar 28, 2022

tiancaishaonvjituizi commented Mar 28, 2022

paddle-bot-old bot commented Mar 29, 2022

zhiboniu Mar 30, 2022

tiancaishaonvjituizi Mar 30, 2022

zhiboniu Mar 31, 2022

tiancaishaonvjituizi Mar 31, 2022

zhiboniu Mar 31, 2022

tiancaishaonvjituizi Mar 31, 2022 •

edited

Loading

tiancaishaonvjituizi Mar 31, 2022

zhiboniu Apr 1, 2022

zhiboniu Apr 1, 2022

tiancaishaonvjituizi Apr 1, 2022 •

edited

Loading

zhiboniu Apr 7, 2022


		## 底层OP设计

		基本可以仿照飞桨 SGD 优化器的实现，实现 paddle/fluid/operators/optimizers/asgd_op.cc 和相应的 asgd_kernel.h/.cc/.cu。

【Hackathon No.19】Add ASGD RFC #68

【Hackathon No.19】Add ASGD RFC #68

Conversation

tiancaishaonvjituizi commented Mar 28, 2022 • edited Loading

CLAassistant commented Mar 28, 2022 • edited Loading

paddle-bot-old bot commented Mar 28, 2022

tiancaishaonvjituizi commented Mar 28, 2022

paddle-bot-old bot commented Mar 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaishaonvjituizi Mar 31, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaishaonvjituizi Apr 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaishaonvjituizi commented Mar 28, 2022 •

edited

Loading

CLAassistant commented Mar 28, 2022 •

edited

Loading

tiancaishaonvjituizi Mar 31, 2022 •

edited

Loading

tiancaishaonvjituizi Apr 1, 2022 •

edited

Loading