Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] MapOperator.num_active_tasks should exclude pending actors #46364

Merged
merged 9 commits into from
Jul 2, 2024

Conversation

raulchen
Copy link
Contributor

@raulchen raulchen commented Jul 1, 2024

Why are these changes needed?

MapOperator.num_active_tasks should exclude the pending actors. Because

  1. PhysicalOperator.completed checks num_active_tasks. The operator should be considered completed if there are still pending actors.
  2. The number of active tasks in the progress bar will be more accurate to reflect the actual data processing tasks.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a short-term fix, but I'm wondering if there are alternative solutions that could avoid the issue I commented about

@@ -414,6 +414,17 @@ def implements_accurate_memory_accounting(self) -> bool:
def supports_fusion(self) -> bool:
return self._supports_fusion

def num_active_tasks(self) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, this change makes len(get_active_tasks()) != num_active_tasks for map operators. I think that might get confusing.

@raulchen do have any ideas for how we can address the issues while avoiding this discrepancy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of this as well. It's indeed a bit confusing.
I also thought of separating data and metadata tasks. But that seems overkill.
I think maybe I'll just update the comments to clarify for now.

Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
* Displaying active task info in the progress bar.
Thus, the return value can be less than `len(get_active_tasks())`,
if some tasks are not needed for the above purposes. E.g., for the
actor pool map operator, readiness checking tasks can be excluded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "readiness checking task" referring to in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's _MapWorker.get_location.

@raulchen raulchen enabled auto-merge (squash) July 2, 2024 00:33
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jul 2, 2024
@github-actions github-actions bot disabled auto-merge July 2, 2024 17:21
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
@raulchen raulchen enabled auto-merge (squash) July 2, 2024 17:56
@raulchen raulchen merged commit acf792e into ray-project:master Jul 2, 2024
7 checks passed
@raulchen raulchen deleted the map-num-active branch July 2, 2024 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants