Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray Core] Fix the Paths in the Generated Monitoring Configs to Consider temp-dir #47871

Merged
merged 8 commits into from
Oct 15, 2024

Conversation

MengjinYan
Copy link
Collaborator

@MengjinYan MengjinYan commented Sep 30, 2024

Why are these changes needed?

Today, when a user specifies the --temp-dir when starting ray, the paths in the default config files generated are still with the default root directory /tmp/ray. The issue is because the default config files are directly copied from the package when ray starts and the paths are all hardcoded.

This PR fix the issue by generating the corresponding config files with the corresponding path at runtime.

Also:

  • keep using the default path with the hard coded file in the ray metrics launch-prometheus command because the command can be executed with/without ray cluster started. The doc is updated accordingly to suggest the user to use manual mode to start prometheus is --temp-dir will be specified
  • Updated the tests to:
    • add verification of the monitoring config generation for the case of specify temp-dir
    • add verification of the paths in the config content

Related issue number

Closes #41648

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

… temp-dir

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
@MengjinYan MengjinYan changed the title Issue-41648: Fix the Paths in the Generated Monitoring Configs to Consider temp-dir [Ray Core] Fix the Paths in the Generated Monitoring Configs to Consider temp-dir Sep 30, 2024
doc/source/cluster/metrics.md Outdated Show resolved Hide resolved
doc/source/cluster/metrics.md Outdated Show resolved Hide resolved
2. fix comment
3. Update the logic in the install_and_start_prometheus to continue leverage the hard coded file, in order to avoid generating a file in user's package or working directory. Tests added accordingly

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
# in install_and_start_prometheus.py is updated to reload the config file.
shutil.copy(PROMETHEUS_CONFIG_INPUT_PATH, prometheus_config_output_path)

# The code here is slightly different from how the prometheus config is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I fully understand this comment section. What about:

# This code generates the Prometheus config based on the custom temporary root path 
# set by the user at Ray cluster start up (via --temp-dir). In contrast, start_prometheus in
# install_and_start_prometheus.py uses a hardcoded Prometheus config at
# PROMETHEUS_CONFIG_INPUT_PATH that always uses "/tmp/ray". Other than the root path,
# the config file generated here is identical to that hardcoded config file.

# However, if in the future Ray ever modifies this file at runtime, we'll
# need to use the user-friendly location instead, and reload the config
# file after it's updated by Ray.
# The function assumes, to-be-prometheus-monitored ray cluster uses the default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

# This function assumes the Ray cluster to be monitored by Prometheus uses the default 
# configuration with "/tmp/ray" as the root temporary directory.
# 
# This is to support the `ray metrics launch-prometheus` command, when a ray cluster
# is not started yet and we have no way to get a `--temp-dir` anywhere. So we choose to
# use a hardcoded default value.

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
@rynewang rynewang added the go add ONLY when ready to merge, run all tests label Oct 4, 2024
@rynewang rynewang enabled auto-merge (squash) October 7, 2024 22:20
@rynewang
Copy link
Contributor

rynewang commented Oct 7, 2024

@angelinalg please take a look at this - Mengjin fixed an issue about when users specify --temp-dir, the default grafana config no longer works. We added docs for those people about how to config their grafana.

Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some style nits and a typo. For future reference, please run Vale to get style feedback: https://docs.ray.io/en/latest/ray-contribute/docs.html#how-to-use-vale

@@ -22,6 +22,12 @@ For a quick demo, you can run Prometheus locally on your machine. Follow the qui

### Quickstart: Running Prometheus locally

```{admonition} Note
:class: note
If you need to change the root temporary directory by using "--temp-dir" in your ray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you need to change the root temporary directory by using "--temp-dir" in your ray
If you need to change the root temporary directory by using "--temp-dir" in your Ray

```{admonition} Note
:class: note
If you need to change the root temporary directory by using "--temp-dir" in your ray
cluster setup, please follow the [manual steps](#optional-manual-running-prometheus-locally) to setup Prometheus locally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cluster setup, please follow the [manual steps](#optional-manual-running-prometheus-locally) to setup Prometheus locally.
cluster setup, follow these [manual steps](#optional-manual-running-prometheus-locally) to set up Prometheus locally.

@@ -76,7 +82,7 @@ tar xvfz prometheus-*.tar.gz
cd prometheus-*
```

Ray provides a Prometheus config that works out of the box. After running Ray, you can find the config at `/tmp/ray/session_latest/metrics/prometheus/prometheus.yml`.
Ray provides a Prometheus config that works out of the box. After running Ray, you can find the config at `/tmp/ray/session_latest/metrics/prometheus/prometheus.yml`. If you specify the `--temp-dir={your_temp_path}` when starting the ray cluster, the config file will be at `{yout_temp_path}/session_latest/metrics/prometheus/prometheus.yml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ray provides a Prometheus config that works out of the box. After running Ray, you can find the config at `/tmp/ray/session_latest/metrics/prometheus/prometheus.yml`. If you specify the `--temp-dir={your_temp_path}` when starting the ray cluster, the config file will be at `{yout_temp_path}/session_latest/metrics/prometheus/prometheus.yml`
Ray provides a Prometheus config that works out of the box. After running Ray, you can find the config at `/tmp/ray/session_latest/metrics/prometheus/prometheus.yml`. If you specify the `--temp-dir={your_temp_path}` when starting the Ray cluster, the config file is at: `{your_temp_path}/session_latest/metrics/prometheus/prometheus.yml`

@rynewang rynewang merged commit f04ddd6 into master Oct 15, 2024
6 checks passed
@rynewang rynewang deleted the issue-41648 branch October 15, 2024 18:58
MengjinYan added a commit that referenced this pull request Oct 15, 2024
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
angelinalg added a commit that referenced this pull request Oct 16, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

This PR fixed the style suggestions in
#47871.

The original PR got accidentally merged before the suggestions are
fixed.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
can-anyscale added a commit that referenced this pull request Oct 16, 2024
can-anyscale added a commit that referenced this pull request Oct 16, 2024
can-anyscale added a commit that referenced this pull request Oct 16, 2024
MengjinYan added a commit that referenced this pull request Oct 16, 2024
… Configs to Consider temp-dir

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
MengjinYan added a commit that referenced this pull request Oct 16, 2024
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
rynewang pushed a commit that referenced this pull request Oct 18, 2024
…48054)

* Resubmit #47871 and #48030 as they are reverted as release blocker
* Fixed the failed test in windows CI test. 
* The test `test_metrics_folder_and_content` failed because of the
different file separator between linux and Windows
* The PR fix the issue by using `os.path.join` to generate the path for
validation instead of directly using formatted string

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Jay-ju pushed a commit to Jay-ju/ray that referenced this pull request Nov 5, 2024
Jay-ju pushed a commit to Jay-ju/ray that referenced this pull request Nov 5, 2024
…led Test for Windows (ray-project#48054)

* Resubmit ray-project#47871 and ray-project#48030 as they are reverted as release blocker
* Fixed the failed test in windows CI test. 
* The test `test_metrics_folder_and_content` failed because of the
different file separator between linux and Windows
* The PR fix the issue by using `os.path.join` to generate the path for
validation instead of directly using formatted string

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
…led Test for Windows (ray-project#48054)

* Resubmit ray-project#47871 and ray-project#48030 as they are reverted as release blocker
* Fixed the failed test in windows CI test. 
* The test `test_metrics_folder_and_content` failed because of the
different file separator between linux and Windows
* The PR fix the issue by using `os.path.join` to generate the path for
validation instead of directly using formatted string

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
…led Test for Windows (ray-project#48054)

* Resubmit ray-project#47871 and ray-project#48030 as they are reverted as release blocker
* Fixed the failed test in windows CI test.
* The test `test_metrics_folder_and_content` failed because of the
different file separator between linux and Windows
* The PR fix the issue by using `os.path.join` to generate the path for
validation instead of directly using formatted string

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Ray Core] "/tmp/ray" should not be hard coded for dashboard/observation.
3 participants