Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dashboard] Enable customizable refresh frequency for the Metrics page #44037

Merged
merged 10 commits into from
Jun 20, 2024

Conversation

liuxsh9
Copy link
Contributor

@liuxsh9 liuxsh9 commented Mar 15, 2024

Why are these changes needed?

Currently, each chart on the Metrics page is continuously refreshing, which can put strain on the network and frontend. In our application with @nemo9cby @Bye-legumes, we have noticed that during periods of unstable cluster networks, there can be instances of response timeouts or loss.

Therefore, having the ability to customize the refresh frequency would be helpful.

In the proposed modification, we have set the default refresh frequency to 1 second and provided various frequency configurations, following the example of Grafana. This way, users can choose the refresh interval that best suits their needs.

refresh_frequency

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@anyscalesam anyscalesam added triage Needs triage (eg: priority, bug/not-bug, and owning component) observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Apr 29, 2024
@anyscalesam anyscalesam added the dashboard Issues specific to the Ray Dashboard label May 14, 2024
Signed-off-by: liuxsh9 <liuxiaoshuang4@huawei.com>
Signed-off-by: Xiaoshuang Liu <liuxiaoshuang4@huawei.com>
Copy link
Contributor

@alanwguo alanwguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding this.

Can you also add this to the metrics section of the service page and the serve deployment page?

@@ -377,6 +421,8 @@ export const Metrics = () => {
const toParam = to !== null ? `&to=${to}` : "";
const timeRangeParams = `${fromParam}${toParam}`;

const refreshParams = refresh !== "" ? `&refresh=${refresh}` : "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const refreshParams = refresh !== "" ? `&refresh=${refresh}` : "";
const refreshParams = refresh ? `&refresh=${refresh}` : "";

Since refresh can also be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, modified.

@@ -358,13 +390,25 @@ export const Metrics = () => {

const grafanaDefaultDatasource = dashboardDatasource ?? "Prometheus";

const [refreshOption, setRefreshOption] = useState<RefreshOptions>(
RefreshOptions.ONE_SECOND,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the default five seconds? one second seems too frequent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Five seconds is also the frequency we prefer. I was worried that changing from continuous updates to five seconds would change the habits of historical users. Seems like an unnecessary worry.

Signed-off-by: liuxsh9 <liuxiaoshuang4@huawei.com>
Signed-off-by: liuxsh9 <liuxiaoshuang4@huawei.com>
Signed-off-by: liuxsh9 <liuxiaoshuang4@huawei.com>
@liuxsh9
Copy link
Contributor Author

liuxsh9 commented Jun 14, 2024

Can you also add this to the metrics section of the service page and the serve deployment page?

Sure! Has also been added to the service and serve deployment page.

@anyscalesam anyscalesam added enhancement Request for new feature and/or capability P2 Important issue, but not time-critical labels Jun 14, 2024
@anyscalesam anyscalesam removed the triage Needs triage (eg: priority, bug/not-bug, and owning component) label Jun 14, 2024
@scottsun94
Copy link
Contributor

@liuxsh9 can you add a short recording of how it looks like? (sorry, I'm not fluent at building the frontend and checking it out by myself. It will be easier for me to take a look at the recording)

@alanwguo
Copy link
Contributor

alanwguo commented Jun 14, 2024

Screenshot 2024-06-14 at 2 53 57 PM

can we make the buttons consistent styling? One is outlined and the other is underlined, can we do underlined for both?

While you're here, can you also expand the width of the "Last 5..." box so we can see the full text?

Signed-off-by: liuxsh9 <liuxiaoshuang4@huawei.com>
…text appears

Signed-off-by: liuxsh9 <liuxiaoshuang4@huawei.com>
Signed-off-by: liuxsh9 <liuxiaoshuang4@huawei.com>
@liuxsh9
Copy link
Contributor Author

liuxsh9 commented Jun 17, 2024

Great suggestion! How about this? @alanwguo @scottsun94
dashboard_refresh_interval

@scottsun94
Copy link
Contributor

LGTM. Thanks!

liuxsh9 and others added 2 commits June 18, 2024 08:54
Co-authored-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Xiaoshuang Liu <liuxiaoshuang4@huawei.com>
Signed-off-by: liuxsh9 <liuxiaoshuang4@huawei.com>
@liuxsh9
Copy link
Contributor Author

liuxsh9 commented Jun 18, 2024

Based on testing, the lowest available refresh rate was found to be 5 seconds, so the 1 second option was removed.

@liuxsh9
Copy link
Contributor Author

liuxsh9 commented Jun 19, 2024

@alanwguo Hi alan, do you think this PR is ready to be merged?

@scottsun94
Copy link
Contributor

@alanwguo Hi alan, do you think this PR is ready to be merged?

@alanwguo is out until Thursday. Let's wait a little bit?

@alanwguo alanwguo added the go add ONLY when ready to merge, run all tests label Jun 20, 2024
Copy link
Contributor

@alanwguo alanwguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

I'll have someone merge after tests pass

@rkooo567 rkooo567 merged commit 52947d7 into ray-project:master Jun 20, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard Issues specific to the Ray Dashboard enhancement Request for new feature and/or capability go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling P2 Important issue, but not time-critical
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants