Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[train] Use the actual task name being executed for _RayTrainWorker__execute. #33065

Merged
merged 12 commits into from
Mar 13, 2023

Conversation

xwjiang2010
Copy link
Contributor

@xwjiang2010 xwjiang2010 commented Mar 6, 2023

Why are these changes needed?

So that dashboard can show more meaningful names for tasks being executed.

Related issue number

#32763

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
@xwjiang2010 xwjiang2010 changed the title [train] Propagate task name being executed for _RayTrainWorker__execute. [train] Use the actual task name being executed for _RayTrainWorker__execute. Mar 9, 2023
Copy link
Contributor

@amogkam amogkam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, thanks! but we don't need to add the name for all the CI tests right?

@xwjiang2010
Copy link
Contributor Author

Lgtm, thanks! but we don't need to add the name for all the CI tests right?

it's not strictly needed, but want them to simulate the real case if possible. I don't think that's a big deal tbh.

@xwjiang2010 xwjiang2010 reopened this Mar 13, 2023
@xwjiang2010 xwjiang2010 merged commit b8d6d00 into ray-project:master Mar 13, 2023
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request Mar 21, 2023
…execute. (ray-project#33065)

* [train] Propagate task name being executed for _RayTrainWorker__execute.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* add to args doc string.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix tensorflow/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* update torch/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* horovod config.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix worker_group

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix train tests.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test_training_iterator.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* nit

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

---------

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…execute. (ray-project#33065)

* [train] Propagate task name being executed for _RayTrainWorker__execute.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* add to args doc string.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix tensorflow/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* update torch/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* horovod config.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix worker_group

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix train tests.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test_training_iterator.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* nit

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

---------

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
peytondmurray pushed a commit to peytondmurray/ray that referenced this pull request Mar 22, 2023
…execute. (ray-project#33065)

* [train] Propagate task name being executed for _RayTrainWorker__execute.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* add to args doc string.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix tensorflow/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* update torch/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* horovod config.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix worker_group

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix train tests.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test_training_iterator.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* nit

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

---------

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…execute. (ray-project#33065)

* [train] Propagate task name being executed for _RayTrainWorker__execute.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* add to args doc string.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix tensorflow/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* update torch/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* horovod config.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix worker_group

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix train tests.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test_training_iterator.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* nit

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

---------

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: elliottower <elliot@elliottower.com>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…execute. (ray-project#33065)

* [train] Propagate task name being executed for _RayTrainWorker__execute.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* add to args doc string.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix tensorflow/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* update torch/config.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* horovod config.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix worker_group

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix train tests.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test_training_iterator.py

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* fix test

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* address comments

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* nit

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

---------

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
@xwjiang2010 xwjiang2010 deleted the executor branch July 26, 2023 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants