-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic Ray execution backend #2921
Basic Ray execution backend #2921
Conversation
# client will go to ray server first, then the ray server will ray call to other actors. So the ray server need to | ||
# register ray serializers. | ||
# TODO Need a way to check whether current process is a ray server. | ||
register_ray_serializers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ray client server doesn't register_ray_serializers
, mars objects serialization may get error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially for ray ObjectRef and ActorHandle, since thay are serialized by pickler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ray client server doesn't
register_ray_serializers
, mars objects serialization may get error?
Currently, there is no case covers this code. I believe these lines fix some issues, but current implementation is too rough.
I think we can find a better solution after this PR. For example,
register_ray_serializers()
only when necessary.- Add some test cases cover this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ray client server is started at a different process from driver, skip register_ray_serializers
in client will cause serializatin issue. Curretn test case test_ray.py::test_ray_client
didn't start a ray client server in another process, this is why removing it didn't throws error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip register_ray_serializers
issue in this PR and fix it in next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will create a new PR with fixes and test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only have a doubt about backend and execution_backend, is it possible to unify them? Other parts look good to me.
) -> ClientType: | ||
backend = backend or "oscar" | ||
session = await _new_session( | ||
cluster.external_address, backend=backend, default=True, timeout=timeout | ||
cluster.external_address, | ||
backend=backend, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's weird to see the backend
here actually takes no effect, both Mars & ray pass the backend with value oscar
, I wonder if we can unify the backend
and execution_backend
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, We can resuse the backend for ray execution backend? e.g.
- backend == 'oscar' for Mars & Mars on Ray
- backend == 'ray' for Mars on Ray DAG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just name mars for mars, ray for mars on ray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, the backend
is bound to the session cls. The execution backend only for the execution implementation. I am not sure if it is a good idea to mix them to one.
Currenty,
- backend
- oscar - _IsolatedSession
- test - CheckedSession
- execution backend
- mars - Mars, Mars on Ray
- ray - Mars on Ray DAG
I can mix them to one,
- backend
- mars - Mars, Mars on Ray
- ray - Mars on Ray DAG
The backend will never bounds to the session cls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's ok, backend for different backend not for session cls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will push a commit to fix this. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What do these changes do?
mars.utils.get_chunk_params
.get_available_band_slots
toget_available_band_resources
.execute_subtask_graph
fromList[ExecutionChunkResult]
toDict[Chunk, ExecutionChunkResult]
.ExecutionChunkResult
contains the meta to be updated.ExecutionChunkResult
.Fetcher
to fetch data from different execution backends.xgboost < 1.6.0
because the latest xgboost breaks the API.Related issue number
#2893
Fixes #xxxx
Check code requirements