-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix streaming from arrow files #7083
Conversation
src/datasets/data_files.py
Outdated
@@ -386,7 +386,7 @@ def resolve_pattern( | |||
matched_paths = [ | |||
filepath if filepath.startswith(protocol_prefix) else protocol_prefix + filepath | |||
for filepath, info in fs.glob(pattern, detail=True, **glob_kwargs).items() | |||
if info["type"] == "file" | |||
if info["type"] in ("file", "other") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand why you did this change ? can you elaborate ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my Ubuntu machine, the info of symlinked files is "other". Currently, if datafiles are symlinks, they are not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is an issue for the LocalFileSystem
implementation of fsspec
no ?
I'm afraid of future errors if we support "other" as a valid file type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It was just my quick hack to get it to work locally for me.
On the other hand, I'm not sure why checking for the type is necessary at all since a user should probably only specify glob patterns to valid files anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, maybe let's try to run the tests without this line ?
anyway for this PR feel free to just keep the "arrow" change and we can tackle your other issue in another PR - feel free to ping me there :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. I've reverted the second commit and merged the recent changes from main. The second PR is here #7133
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks ! I started the CI on your other PR
No description provided.