-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage] Add .skyignore support #4038
Conversation
6014d1a
to
78fef30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this @yika-luo! It is very useful to our users. The PR looks mostly good to me. To expose this option to users, let's add some instructions in our doc as well (can be another PR): https://skypilot.readthedocs.io/en/latest/examples/syncing-code-artifacts.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @yika-luo! It mostly looks good to me now.
sky/data/storage_utils.py
Outdated
if line.startswith('*.'): | ||
line = '**/' + line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if this will be too specific, e.g., does test.log
match test.log
in all subdir; does test*.log
match all sub dir? Wondering if we should add **/
to all lines, except the ones start with /
(the newly proposed version might be very slow in a large folder with many files/folders)
if line.startswith('*.'): | |
line = '**/' + line | |
if line.startswith('*.'): | |
line = '**/' + line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test.log
and test*.log
should be just matching in current dir. That being said, I should remove this logic and requires users to do **/*
if they want to match ALL directories. I also added in doc saying users should avoid doing *.txt
or ./*.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this support @yika-luo! LGTM with the minor comment of /*
vs /
.
If you have a
.skyignore
file in your sky working directory, sky will exclude the listed files when uploading your work dir to the sky clusters, and sky will NOT use any of your.gitignore
files. However, if you don't have a.skyignore
, sky will fallback to use your.gitignore
. In other words, sky will never use both.skyignore
and.gitignore
.Example
.skyignore
:Please do NOT use patterns like
./*.txt
because these expressions do not behave consistently across the APIs.Tested (run the relevant ones):
bash format.sh
sky launch
on a simple GCP .yaml and .skyignore, ssh into sky cluster to make sure both workdir and mounted dir include the correct list of filessky jobs launch
on the same .yaml and .skyignore, check the GCS buckets to make sure both workdir and mounted buckets include the correct list of files