Design document of RemoteExecutor. #7720

gongweibao · 2018-01-22T04:20:19Z

No description provided.

… arch2

helinwang

Thanks, very good starting point! I put some comments as my suggestions, would love to know how you think about them.

helinwang · 2018-01-22T19:22:58Z

doc/design/dist_refactor/remote_executor.md

+We propose some details to implement `RemoteExecutor`:
+
+- Each `Partitioned IR(intermediate representation)` has unique `IRID`.
+- Each `Partitioned IR` with its relationship with others and its resource need are stored into etcd.


What is "others"? Maybe replace others with what others really are here.

helinwang · 2018-01-22T19:29:04Z

doc/design/dist_refactor/remote_executor.md

+
+- Each `Partitioned IR(intermediate representation)` has unique `IRID`.
+- Each `Partitioned IR` with its relationship with others and its resource need are stored into etcd.
+- Each `PaddlePaddle Runtime` runs `Partitioned IR` got from etcd by `IRID`. 


This makes our runtime depend on etcd. I think the dependency direction is wrong. If we really want to use etcd as a communication method, it should be our runtime define some communication interface, and implement an etcd adapter class that implements the interface. So that the framework does not depend on etcd, instead, only the adapter depends on etcd. This is called Dependency inversion principle. (in my opinion, a major benefit of using an OOP programming language).

Anway, is etcd necessary, why communicate with etcd is better than communicate with rpc? As described in the PR, it's for "So the executed Partitioned IR can communicate with each other by IRID even if some of them restart.". However, wouldn't fault tolerance not related to communication method? We can still communicate using rpc, when we want to support fault tolerant, the runtime can store state in disk/etcd.

Anway, is etcd necessary, why communicate with etcd is better than communicate with rpc

The partition IR communicate to other IR with RPC.When paddle RunTime starts, it gets IR from etcd(storage) and cached it to local.The local cache should be updated when send/get meets error(the remote may not exists).

helinwang · 2018-01-22T19:36:43Z

doc/design/dist_refactor/remote_executor.md

+
+## Architect graph
+<div style="align: center">
+<img src="src/remote_executor2.png" width="700" align=center/>


I think it makes more sense for the transpiler running in the cloud, because the transpiler should know the resource (e.g., how many trainers running) information (e.g., with autoscaling, the number of trainers running changes).

helinwang · 2018-01-22T19:38:15Z

doc/design/dist_refactor/remote_executor.md

+
+## Architect graph
+<div style="align: center">
+<img src="src/remote_executor2.png" width="700" align=center/>


I think the controller should tell how remote executor how many trainers currently running, not the other way around. In this way, we can run remote executor on bare medals, where there is no controller.

helinwang · 2018-01-22T19:40:13Z

doc/design/dist_refactor/remote_executor.md

+
+## Architect graph
+<div style="align: center">
+<img src="src/remote_executor2.png" width="700" align=center/>


Suggestion: draw a graph that specify how this system runs on bare medal cloud (e.g., no k8s controller, no etcd), and another graph specifying how we integrate k8s cloud with it. In the current graph, remote executor can not run without etcd or controller.

helinwang · 2018-01-22T19:50:23Z

doc/design/dist_refactor/remote_executor.md

+
+## Architect graph
+<div style="align: center">
+<img src="src/remote_executor2.png" width="700" align=center/>


I really think that the controller (Paddle job) should be controlled by another Python API:

I have one question:
Where can RemoteExecutor save state(IR)? On distribution system?

This looks to me that there is one global gatekeeper (RemoteExecutor) to orchestrate all the roles and traffic in and out of the cluster? I think this kind of remote executor should be one per job?

helinwang · 2018-01-22T19:51:06Z

doc/design/dist_refactor/remote_executor.md

+- Foreground Job: when the client exits the jobs will be killed.
+    -  It's convenient for the users to debug their program.
+    -  It needs a `HeartBeat` to `RemoteExecutor` to report that client is living.Otherwise, the `RemoteExecutor` will kill the job.
+- Background Job: client's death doesn't affect the job.


I think we just need background job, all jobs should be killed explicitly by the paddle cloud management Python API.

helinwang · 2018-01-22T19:52:42Z

doc/design/dist_refactor/remote_executor.md

+There is no fundamental difference between the `Trainer` and the `Parameter server`, they are all `PaddleRunTime`.They do the different tasks just because they execute different `ProgramDesc`.  
+Although we reserve `Trainer` and `Pserver` concepts in the pseudo codes below, it's just for users to distinguish among different `ProgramDesc`s.They are just names for `ProgramDesc`s.
+
+## Peudo code of users


I think we should aim for the user only have to change the executor to a remote executor in Python code, and nothing else. This peudo code requires the user to change too many lines.
I know currently we need the user to change many lines, but I think we should aim for only changing few lines.

helinwang · 2018-01-22T19:56:46Z

doc/design/dist_refactor/remote_executor.md

+```
+
+
+## Storage 


I think we are mixing fault tolerance with communication here (as mentioned in #7720 (comment)). In my opinion we should separate them, because they two are very different, and have different libraries / tools that fit best.

Another point is if we store all the IR communication in etcd, who will delete IRs that are no longer necessary? I my opinion, we use the best library for communication, don't save the message. And the best library for fault tolerance, and override the necessary states.

gongweibao added 14 commits January 20, 2018 23:03

add design doc

3e695a0

modify

bf584b8

fix typo

6297eae

add redis

c106124

add graphs

10fced7

add graphs

1bac0c5

modify

0b4a45b

moidfy

0da5720

fix bugs

fe66b10

rename

4f3148d

fix graph to IR

d43f1e8

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6cc9540

… arch2

add title

127bc0f

clean png

e849fd6

gongweibao requested review from Yancey1989, helinwang, putcn and typhoonzero January 22, 2018 04:20

gongweibao added 5 commits January 22, 2018 13:37

fix typo

b98f6eb

fix typo

af23204

add graffle

cccec64

fix typo

a7931dc

clean md

df46675

abhinavarora self-requested a review January 22, 2018 19:40

helinwang reviewed Jan 22, 2018

View reviewed changes

abhinavarora changed the title ~~Desgin document of RemoteExecutor.~~ Design document of RemoteExecutor. Jan 24, 2018

gongweibao closed this Jan 24, 2018

gongweibao deleted the remoteexecutor branch January 17, 2021 07:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design document of RemoteExecutor. #7720

Design document of RemoteExecutor. #7720

gongweibao commented Jan 22, 2018

helinwang left a comment •

edited

Loading

helinwang Jan 22, 2018 •

edited

Loading

helinwang Jan 22, 2018

gongweibao Jan 22, 2018 •

edited

Loading

helinwang Jan 22, 2018

helinwang Jan 22, 2018

helinwang Jan 22, 2018

helinwang Jan 22, 2018 •

edited

Loading

gongweibao Jan 22, 2018 •

edited

Loading

putcn Jan 23, 2018

helinwang Jan 22, 2018

helinwang Jan 22, 2018 •

edited

Loading

helinwang Jan 22, 2018 •

edited

Loading

		```


		## Storage

Design document of RemoteExecutor. #7720

Design document of RemoteExecutor. #7720

Conversation

gongweibao commented Jan 22, 2018

helinwang left a comment • edited Loading

Choose a reason for hiding this comment

helinwang Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

helinwang Jan 22, 2018

Choose a reason for hiding this comment

gongweibao Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

helinwang Jan 22, 2018

Choose a reason for hiding this comment

helinwang Jan 22, 2018

Choose a reason for hiding this comment

helinwang Jan 22, 2018

Choose a reason for hiding this comment

helinwang Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

gongweibao Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

putcn Jan 23, 2018

Choose a reason for hiding this comment

helinwang Jan 22, 2018

Choose a reason for hiding this comment

helinwang Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

helinwang Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

helinwang left a comment •

edited

Loading

helinwang Jan 22, 2018 •

edited

Loading

gongweibao Jan 22, 2018 •

edited

Loading

helinwang Jan 22, 2018 •

edited

Loading

gongweibao Jan 22, 2018 •

edited

Loading

helinwang Jan 22, 2018 •

edited

Loading

helinwang Jan 22, 2018 •

edited

Loading