-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Disable Ray Data operator progress bars when running in Ray job #46826
Conversation
Signed-off-by: Scott Lee <sjl@anyscale.com>
progress_bar_enabled = ( | ||
DataContext.get_current().enable_progress_bars | ||
and (is_all_to_all or verbose_progress) | ||
and not is_ray_job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead make the default enable_progress_bars
False for ray jobs?
So users can still set it to True in case needed.
Signed-off-by: Scott Lee <sjl@anyscale.com>
"Running", | ||
dag.num_outputs_total(), | ||
unit="bundle", | ||
enabled=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always enables the global progress bar.
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
…47029) #46826 introduced a bug, where the info log regarding hiding operator-level progress bars is always shown, regardless of whether the code was run via a Ray Job or not. This PR fixes the bug by moving the check for whether the code is run via a Ray Job or not into the `DataContext.__post_init__()` method, so that the check is done only after the DataContext singleton is initialized. --------- Signed-off-by: Scott Lee <sjl@anyscale.com>
Why are these changes needed?
When using Ray Data in a Ray job, logs are spammed in an incoherent fashion. This is due to how the
ray_tqdm
module handles multiple progress bars (or rather, how it does not handle them) -- with each progress bar writing independently and the outputs including\r
(normally used to overwrite the progress bar in console in real time), this results in empty spaces/lines/overlapping progress bar outputs.To clean up the logs from Ray jobs, we disable individual operator progress bars, and only keep the "global" overall progress bar for the Ray Dataset. Existing progress bar behavior outside of Ray jobs is unaffected (e.g. using Ray Data outside Ray jobs in console / Jupyter notebooks) and shows all progress bars.
For the following script:
Output in console, not in Ray Job:
Output from Ray Job:
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.