-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Improved naming of Ray Data map
tasks
#32585
Changes from 21 commits
3bb31a5
b82b028
5756e47
9467992
9f07bc8
9b49e12
b977c4c
16295f8
48a67d9
4dc1ea2
003c521
7770385
92aa93c
501372d
05adb73
5b9e2b9
2e89515
20200a8
66f3668
58bb391
cf2de1f
2f5845c
b1b3fbe
01eac28
69c6d1c
40fd9ea
6f70ef8
d2df035
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,7 +93,7 @@ def _start_actor(self): | |
"""Start a new actor and add it to the actor pool as a pending actor.""" | ||
assert self._cls is not None | ||
ctx = DatasetContext.get_current() | ||
actor = self._cls.remote(ctx) | ||
actor = self._cls.remote(ctx, src_fn_name=self.name) | ||
self._actor_pool.add_pending_actor(actor, actor.get_location.remote()) | ||
|
||
def _add_bundled_input(self, bundle: RefBundle): | ||
|
@@ -122,7 +122,7 @@ def _dispatch_tasks(self): | |
bundle = self._bundle_queue.popleft() | ||
input_blocks = [block for block, _ in bundle.blocks] | ||
ctx = TaskContext(task_idx=self._next_task_idx) | ||
ref = actor.submit.options(num_returns="dynamic").remote( | ||
ref = actor.submit.options(num_returns="dynamic", name=self.name).remote( | ||
self._transform_fn_ref, ctx, *input_blocks | ||
) | ||
self._next_task_idx += 1 | ||
|
@@ -284,8 +284,9 @@ def _apply_default_remote_args(ray_remote_args: Dict[str, Any]) -> Dict[str, Any | |
class _MapWorker: | ||
"""An actor worker for MapOperator.""" | ||
|
||
def __init__(self, ctx: DatasetContext): | ||
def __init__(self, ctx: DatasetContext, src_fn_name: str): | ||
DatasetContext._set_current(ctx) | ||
self.src_fn_name: str = src_fn_name | ||
|
||
def get_location(self) -> NodeIdStr: | ||
return ray.get_runtime_context().get_node_id() | ||
|
@@ -298,6 +299,9 @@ def submit( | |
) -> Iterator[Union[Block, List[BlockMetadata]]]: | ||
yield from _map_task(fn, ctx, *blocks) | ||
|
||
def __repr__(self): | ||
return f"MapWorker({self.src_fn_name})" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am actually feeling the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, but I think that the current There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once #32922 is implemented, this should allow us to send this actor name to the dash |
||
|
||
|
||
# TODO(Clark): Promote this to a public config once we deprecate the legacy compute | ||
# strategies. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkooo567 - I thought you mentioned there's problem that actor should not be set name here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we look at the dash screenshot above, I believe this isn't actually setting the name of the actor -- as we can see
_MapWorker
still being used in theActive Actors by Name
chart. I think this part instead sets the name of the submitted task instead (e.g.fn_actors
in above example).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the name of actors should be unique across the cluster. So let's avoid setting the name for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is just setting the name of the actor task, so this should be fine, right?