Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] Add .skyignore support #4038

Merged
merged 14 commits into from
Oct 8, 2024
Merged

[Storage] Add .skyignore support #4038

merged 14 commits into from
Oct 8, 2024

Conversation

yika-luo
Copy link
Collaborator

@yika-luo yika-luo commented Oct 5, 2024

If you have a .skyignore file in your sky working directory, sky will exclude the listed files when uploading your work dir to the sky clusters, and sky will NOT use any of your .gitignore files. However, if you don't have a .skyignore, sky will fallback to use your .gitignore. In other words, sky will never use both .skyignore and .gitignore.

Example .skyignore:

# Files that match pattern under ONE directory
/hello.py
/*.txt
/dir
/dir/*.txt

# Files that match pattern under ALL directories
*.txt
hello.py

Please do NOT use patterns like ./*.txt because these expressions do not behave consistently across the APIs.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Tested sky launch on a simple GCP .yaml and .skyignore, ssh into sky cluster to make sure both workdir and mounted dir include the correct list of files
  • Tested sky jobs launch on the same .yaml and .skyignore, check the GCS buckets to make sure both workdir and mounted buckets include the correct list of files

@yika-luo yika-luo force-pushed the ignore branch 2 times, most recently from 6014d1a to 78fef30 Compare October 5, 2024 01:01
@yika-luo yika-luo requested a review from Michaelvll October 5, 2024 01:04
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @yika-luo! It is very useful to our users. The PR looks mostly good to me. To expose this option to users, let's add some instructions in our doc as well (can be another PR): https://skypilot.readthedocs.io/en/latest/examples/syncing-code-artifacts.html

sky/constants.py Outdated Show resolved Hide resolved
sky/data/storage_utils.py Outdated Show resolved Hide resolved
sky/data/storage_utils.py Outdated Show resolved Hide resolved
sky/data/storage_utils.py Outdated Show resolved Hide resolved
sky/data/storage_utils.py Outdated Show resolved Hide resolved
sky/utils/command_runner.py Outdated Show resolved Hide resolved
sky/utils/command_runner.py Outdated Show resolved Hide resolved
sky/utils/command_runner.py Outdated Show resolved Hide resolved
tests/unit_tests/test_storage_utils.py Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @yika-luo! It mostly looks good to me now.

Comment on lines 85 to 86
if line.startswith('*.'):
line = '**/' + line
Copy link
Collaborator

@Michaelvll Michaelvll Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if this will be too specific, e.g., does test.log match test.log in all subdir; does test*.log match all sub dir? Wondering if we should add **/ to all lines, except the ones start with /
(the newly proposed version might be very slow in a large folder with many files/folders)

Suggested change
if line.startswith('*.'):
line = '**/' + line
if line.startswith('*.'):
line = '**/' + line

Copy link
Collaborator Author

@yika-luo yika-luo Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test.log and test*.log should be just matching in current dir. That being said, I should remove this logic and requires users to do **/* if they want to match ALL directories. I also added in doc saying users should avoid doing *.txt or ./*.txt

sky/utils/command_runner.pyi Outdated Show resolved Hide resolved
sky/data/storage_utils.py Show resolved Hide resolved
docs/source/examples/syncing-code-artifacts.rst Outdated Show resolved Hide resolved
sky/data/storage_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this support @yika-luo! LGTM with the minor comment of /* vs /.

docs/source/reference/yaml-spec.rst Show resolved Hide resolved
@yika-luo yika-luo added this pull request to the merge queue Oct 8, 2024
@Michaelvll Michaelvll removed this pull request from the merge queue due to a manual request Oct 8, 2024
@yika-luo yika-luo added this pull request to the merge queue Oct 8, 2024
@yika-luo yika-luo removed this pull request from the merge queue due to a manual request Oct 8, 2024
@yika-luo yika-luo added this pull request to the merge queue Oct 8, 2024
Merged via the queue into master with commit 3f898ab Oct 8, 2024
20 checks passed
@yika-luo yika-luo deleted the ignore branch October 8, 2024 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants