-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
修改COPY-FROM No.13 distributed #6004
Conversation
感谢你贡献飞桨文档,文档预览构建中,Docs-New 跑完后即可预览,预览链接:http://preview-pr-6004.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先改这些吧..感觉有挺多坑
strategy.recompute = True | ||
strategy.recompute_configs = {"checkpoints": ["x"]} | ||
strategy.save_to_prototxt("dist_strategy.prototxt") | ||
COPY-FROM: paddle.distributed.fleet.DistributedStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COPY-FROM: paddle.distributed.fleet.DistributedStrategy | |
COPY-FROM: paddle.distributed.fleet.DistributedStrategy.save_to_prototxt |
|
||
import paddle.distributed.fleet as fleet | ||
fleet.init() | ||
COPY-FROM: paddle.distributed.fleet.Fleet:code-example1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
import paddle.distributed.fleet as fleet | ||
fleet.init(is_collective=True) | ||
COPY-FROM: paddle.distributed.fleet.Fleet:code-example2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COPY-FROM: paddle.distributed.fleet.Fleet:code-example2 | |
COPY-FROM: paddle.distributed.fleet.Fleet.init:code-example2 |
import paddle.distributed.fleet as fleet | ||
role = fleet.PaddleCloudRoleMaker() | ||
fleet.init(role) | ||
COPY-FROM: paddle.distributed.fleet.Fleet:code-example3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COPY-FROM: paddle.distributed.fleet.Fleet:code-example3 | |
COPY-FROM: paddle.distributed.fleet.Fleet.init:code-example3 |
|
||
adam.step() | ||
adam.clear_grad() | ||
COPY-FROM: paddle.distributed.fleet.Fleet.clear_grad | ||
|
||
|
||
minimize(loss, startup_program=None, parameter_list=None, no_grad_set=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# [8, 12] | ||
if __name__ == "__main__": | ||
train() | ||
COPY-FROM: paddle.distributed.fleet.UtilBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COPY-FROM: paddle.distributed.fleet.UtilBase | |
COPY-FROM: paddle.distributed.fleet.UtilBase.all_reduce |
**代码示例** | ||
.. code-block:: text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**代码示例** | |
.. code-block:: text | |
**代码示例** | |
.. code-block:: text | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SigureMo DistributedStrategy所有代码示例都没有成功过copyfrom,001师傅你看一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SigureMo Fleet有部分代码示例copy from不了,可能和tensor_cn.rst情况差不多,要不这个skip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
要不都 skip 吧,分布式太折磨了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
要不都 skip 吧,分布式太折磨了
+1,这样这个任务可以早点结了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
要不都 skip 吧,分布式太折磨了
其实可以把 paddle.distributed.fleet
下的都skip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4ce6d38
to
6ca612c
Compare
Signed-off-by: jjyaoao <jjyaoao@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaddleCloudRoleMaker 和 UserDefinedRoleMaker 里的代码直接copy from 吧,因为我看paddle端都加上了。其他没有太大问题~
.. code-block:: python | ||
.. code-block:: text | ||
|
||
import os | ||
import paddle.distributed.fleet as fleet | ||
|
||
os.environ["PADDLE_PSERVER_NUMS"] = "2" | ||
os.environ["PADDLE_TRAINERS_NUM"] = "2" | ||
|
||
os.environ["POD_IP"] = "127.0.0.1" | ||
os.environ["PADDLE_PORT"] = "36001" | ||
os.environ["TRAINING_ROLE"] = "PSERVER" | ||
os.environ["PADDLE_PSERVERS_IP_PORT_LIST"] = \ | ||
"127.0.0.1:36001,127.0.0.2:36001" | ||
|
||
os.environ["PADDLE_TRAINER_ID"] = "0" | ||
|
||
fleet.PaddleCloudRoleMaker(is_collective=False) | ||
from paddle.distributed.fleet.base.role_maker import Role | ||
fleet.UserDefinedRoleMaker( | ||
current_id=0, | ||
role=Role.SERVER, | ||
worker_num=2, | ||
server_endpoints=["127.0.0.1:36011", "127.0.0.1:36012"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块儿直接copy from PaddleCloudRoleMaker
吧,我看 PaddlePaddle/Paddle#55236 英文源码那儿都加了
@@ -38,15 +37,13 @@ string | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
以上代码都直接copy from UserDefinedRoleMaker
吧,因为英文源码都加上了
Signed-off-by: jjyaoao <jjyaoao@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最后一个修改,queueDataset就不要copy from了,直接改成text..T.T
|
||
os.remove("./test_queue_dataset_run_a.txt") | ||
os.remove("./test_queue_dataset_run_b.txt") | ||
COPY-FROM: paddle.distributed.QueueDataset.init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个init
方法还是别用copy from了...英文代码预览都不行,还是把block改成text
吧
Signed-off-by: jjyaoao <jjyaoao@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
将代码块 .. code-block:: 使用 COPY-FROM 指令代替
PADDLEPADDLE_PR=55236