Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dashboard/core?] Disk space displayed in dashboard doesn't match size of disk. #48783

Open
Joshuaalbert opened this issue Nov 18, 2024 · 3 comments
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core dashboard Issues specific to the Ray Dashboard good first issue Great starter issue for someone just starting to contribute to Ray P1 Issue that should be fixed within a few weeks

Comments

@Joshuaalbert
Copy link

Joshuaalbert commented Nov 18, 2024

What happened + What you expected to happen

When setting temp-dir to a value on head it doesn't seem to be reflected in dashboard.

ray start --head --dashboard-host=0.0.0.0 --metrics-export-port=8090 --temp-dir=/path/bigdisk/temp

on worker it's started like this:

ray start --address="ray_head:${RAY_REDIS_PORT}"

The dashboard shows this

image

The first line is the head node, the second is a worker node. Both are run in container in working directories volumes mounted to 10TB size disks. Why is the head showing only 50GB. That's the size of / on the host, which it shouldn't have access to.

Versions / Dependencies

ray 2.37
also same on 2.39

Reproduction script

ray start --head --dashboard-host=0.0.0.0 --metrics-export-port=8090 --temp-dir=/path/bigdisk/temp

look at dashboard

Issue Severity

High: It blocks me from completing my task.

@Joshuaalbert Joshuaalbert added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 18, 2024
@jcotant1 jcotant1 added dashboard Issues specific to the Ray Dashboard core Issues that should be addressed in Ray Core labels Nov 18, 2024
@jjyao jjyao added good first issue Great starter issue for someone just starting to contribute to Ray P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 18, 2024
@gitlijian
Copy link

gitlijian commented Nov 19, 2024

hi , @Joshuaalbert

  1. Execute the mount command on head node and worker node respectively to confirm if your mount path is correct.
  2. If the mounting path is correct, then there may be some bugs in the logic of ray

@Joshuaalbert
Copy link
Author

Joshuaalbert commented Nov 19, 2024

Okay, I solved but not by changing the Ray side, but the docker side. Which makes me suspect that Ray has some strange potential unwanted behaviour with docker. I'll explain.

I noticed the node that showed 50B in the above screenshot was the same size as the disk that stores docker images. Now, this is really weird because nowhere am I mounting that partition as a volume in the container. So I tried moving the docker data dir to a different disk, and low-and-behold the storage shown in the dashboard changed to reflect that.

@Joshuaalbert
Copy link
Author

Another useful info: the head and worker docker storage drivers are different between the two nodes. On the head node it is using overlay2 (which is kernel space), and on the worker fuse-overlay (which is user space). When I updated the storage driver on the head node to fuse-overlay it started showing the correct storage size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core dashboard Issues specific to the Ray Dashboard good first issue Great starter issue for someone just starting to contribute to Ray P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

4 participants