Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX] Friendly error message when mounting fails with non-empty mount path #1908

Merged
merged 25 commits into from
May 20, 2023

Conversation

landscapepainter
Copy link
Collaborator

@landscapepainter landscapepainter commented Apr 28, 2023

This resolves #1895

When launching a remote VM, there is a chance of the mount path set by the user to be pre-occupied when the path is taken by the kernel (i.e. /tmp or /sys). In such case, sky raises CommandError showing what command caused the error. But in this case, the command is a heredoc which causes the error message to be very long and difficult to interpret. This PR fixes the issue by suppressing the long command raised with CommandError and raises a concise and straightforward message.

Before:

sky.exceptions.CommandError: Command (cat <<-\EOF > ~/.sky/mount_300531.sh
#!/usr/bin/env bash
set -e

MOUNT_PATH=/tmp
MOUNT_BINARY=gcsfuse

# Check if path is already mounted
if grep -q $MOUNT_PATH /proc/mounts ; then
    echo "Path already mounted - unmounting..."
    fusermount -uz "$MOUNT_PATH"
    echo "Successfully unmounted $MOUNT_PATH."
fi

# Install MOUNT_BINARY if not already installed
if [ -x "$(command -v gcsfuse)" ] && gcsfuse --version | grep -q 0.42.3; then
  echo "$MOUNT_BINARY already installed. Proceeding..."
else
  echo "Installing $MOUNT_BINARY..."
  wget -nc https://github.com/GoogleCloudPlatform/gcsfuse/releases/download/v0.42.3/gcsfuse_0.42.3_amd64.deb -O /tmp/gcsfuse.deb && sudo dpkg --install /tmp/gcsfuse.deb
fi

# Check if mount path exists
if [ ! -d "$MOUNT_PATH" ]; then
  echo "Mount path $MOUNT_PATH does not exist. Creating..."
  sudo mkdir -p $MOUNT_PATH
  sudo chmod 777 $MOUNT_PATH
else
  # Check if mount path contains files
  if [ "$(ls -A $MOUNT_PATH)" ]; then
    echo "Mount path $MOUNT_PATH is not empty. Please make sure its empty."
    exit 1
  fi
fi
echo "Mounting $SOURCE_BUCKET to $MOUNT_PATH with $MOUNT_BINARY..."
gcsfuse -o allow_other --implicit-dirs --stat-cache-capacity 4096 --stat-cache-ttl 5s --type-cache-ttl 5s --rename-dir-limit 10000 tmp-doyoung-testing-root /tmp
echo "Mounting done."
) && chmod +x ~/.sky/mount_300531.sh && bash ~/.sky/mount_300531.sh && rm ~/.sky/mount_300531.sh failed with return code 1.
Failed to run command before rsync ~/tmp-workdir -> /tmp. Ensure that the network is stable, then retry.

After:

sky.exceptions.StorageMountPathError: Mount path /tmp is non-empty. /tmp may have been already taken by the Kernel. Please set the mount path to another name.

Tested (run the relevant ones):

  • Launching a single node cluster with pre-occupied directory, /tmp, set as mount path
  • Launching a multi-node cluster with pre-occupied directory, /tmp, set as mount path

@landscapepainter landscapepainter changed the title [UX] Friendlier message when mounting fails with non-empty mount path [UX] Friendly error message when mounting fails with non-empty mount path Apr 28, 2023
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @landscapepainter! Left some comments.

sky/exceptions.py Outdated Show resolved Hide resolved
f' {mount_path} may have been already '
f'taken by the Kernel. Please set the '
f'mount path to another name.')
raise exceptions.StorageMountPathError(error_msg) from None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to catch exceptions.MOUNT_PATH_NON_EMPTY_CODE since that's a common case.

What happens if e.returncode != exceptions.MOUNT_PATH_NON_EMPTY_CODE. We should perhaps add a || echo like mentioned here, so the error is surfaced nicely instead of the entire heredoc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation, when e.returncode != exceptions.MOUNT_PATH_NON_EMPTY_CODE (manually setting the exit code of the heredoc to be 99 and MOUNT_PATH_NON_EMPTY_CODE to be 42) and attempting to mount to a non-empty directory, it does surface the error message from the heredoc without displaying the entire heredoc : E 05-05 03:11:59 subprocess_utils.py:70] Mount path /tmp is not empty. Please make sure its empty.

Does it still need the || echo implementation in mounting_utils.py? If so, what error should it be catching? The only error with an exit code in the heredoc from mounting_utils.py is when the $MOUNT_PATH is non-empty, and in this case, it is hard coded that it exits with 42 and echos a error message as well: E 05-05 03:11:59 subprocess_utils.py:70] Mount path /tmp is not empty. Please make sure its empty.

Or perhaps, I misunderstood what you meant. Would you mind elaborating a bit more if this is the case?

sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
@landscapepainter
Copy link
Collaborator Author

@romilbhardwaj Ready for another look :)

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick look at the UX.

sky/exceptions.py Outdated Show resolved Hide resolved
@@ -60,7 +60,7 @@ def get_mounting_command(
# Check if mount path contains files
if [ "$(ls -A $MOUNT_PATH)" ]; then
echo "Mount path $MOUNT_PATH is not empty. Please make sure its empty."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "Mount path $MOUNT_PATH is not empty. Please make sure its empty."
echo "Mount path $MOUNT_PATH is not empty. Please mount to another path or remove it first."

(Suggest a fix for common scenarios first; e.g., if people accidentally mount to /tmp, the first fix should be use another path)


NOTE: I find it confusing that we allow mounting to a /empty-dir without files. The convention in other tools seems to be if the path exists, regardless of whether files exist, then back off.

Copy link
Collaborator Author

@landscapepainter landscapepainter May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the echo message.

What kind of potential issues can be arised by allowing to mount on existing directories?

sky/exceptions.py Show resolved Hide resolved
@romilbhardwaj romilbhardwaj added this to the v0.3 milestone May 16, 2023
@landscapepainter
Copy link
Collaborator Author

@concretevitamin @romilbhardwaj Ready for another look! Thanks!

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @landscapepainter! Left some comments. Tried it out and the error is much cleaner now. It also prints the stdout/stderr of the mounting script, which I think is helpful for debugging.

sky/data/mounting_utils.py Outdated Show resolved Hide resolved
sky/exceptions.py Outdated Show resolved Hide resolved
@landscapepainter
Copy link
Collaborator Author

@romilbhardwaj ready for another look!

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @landscapepainter, thanks!

@landscapepainter landscapepainter merged commit c991c7d into skypilot-org:master May 20, 2023
@concretevitamin
Copy link
Member

Works nicely @landscapepainter - one UX observation, with the /tmp: ... repro:

...
I 05-20 08:56:40 backend_utils.py:1220] Mounting (to 1 node): ~/tmp-workdir -> /tmp
⠼ MountingE 05-20 08:56:44 subprocess_utils.py:70] bash: warning: here-document at line 37 delimited by end-of-file (wanted `EOF')
E 05-20 08:56:44 subprocess_utils.py:70] gcsfuse version 0.41.9 (Go version go1.18.4)
E 05-20 08:56:44 subprocess_utils.py:70] Installing gcsfuse...
...
E 05-20 08:56:44 subprocess_utils.py:70] Setting up gcsfuse (0.42.3) ...
E 05-20 08:56:44 subprocess_utils.py:70] Mount path /tmp is not empty. Please mount to another path or remove it first.
E 05-20 08:56:44 subprocess_utils.py:70] Warning: Permanently added '35.226.87.123' (ED25519) to the list of known hosts.
  • Is the here-document at line 37 delimited by end-of-file (wanted EOF')` warning expected? Somewhere we should do write string EOF?
  • Minor: The install log of gcsfuse (many lines) is printed with E, error. That seems a bit unexpected as really only the second to last line is erroring. If it's too hard to fix we can certainly drop it.

@landscapepainter
Copy link
Collaborator Author

@concretevitamin Thanks for bringing up those cases:
EOF seems to be a known issue
Screenshot 2023-05-20 at 5 50 37 PM
I tried adding string 'EOF' at the end of here-doc and as part of command after {script}, but neither fixed.

For the gcfuse install log, subprocess_utils.py/handle_returncode() outputs stderr + stdout when the returncode != 0. And the install log is part of the stdout from backend_utils.py/parallel_data_transfer_to_nodes(). By the way, the EOF error is also part of stdout as well. So I think it's purposely designed to show stdout and stderr to help debug when returncode != 0

@concretevitamin
Copy link
Member

outputs stderr + stdout when the returncode != 0

Nit: Maybe we can logger.info() stdout and logger.error() stderr in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bucket mounting should print friendlier error messages
3 participants