Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix open_mfdataset for list of fsspec files #9785

Merged
merged 3 commits into from
Nov 15, 2024
Merged

Conversation

phofl
Copy link
Contributor

@phofl phofl commented Nov 15, 2024

@dcherian
Copy link
Contributor

mypy needs a fix. Can you expand that logic out to an explicit for-loop please. The comprehension is quite hard to read now.

@phofl
Copy link
Contributor Author

phofl commented Nov 15, 2024

I added type ignores, mypy should have raised earlier as well since the else clause was incorrect, but type inference was probably a bit smarter here. I really don't know enough about mypy to properly troubleshoot this

@phofl
Copy link
Contributor Author

phofl commented Nov 15, 2024

The type hints are generally incorrect in open_mfdataset (I think) since the fsspec version doesn't produce a string or os.PathLike

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sry for breaking that.

paths.append(_normalize_path(p))
elif isinstance(p, list):
paths.append(_normalize_path_list(p)) # type: ignore[arg-type]
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me what the expected type here is?
The type hints only "allow" str, os.PathLike or a list of these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically whatever

fs = fsspec.filesystem("file")
fs.open(my_file)

returns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI trying to determine what type fs.open returns could be a massive rabbit hole - see fsspec/filesystem_spec#579 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess a minimal protocol similar to how pandas handles it should be the solution then.

@headtr1ck
Copy link
Collaborator

The type hints are generally incorrect in open_mfdataset (I think) since the fsspec version doesn't produce a string or os.PathLike

Do you know what this returns? Is it some buffered reader or similar?

@phofl
Copy link
Contributor Author

phofl commented Nov 15, 2024

No, no clue. Nothing is typed over there and there are so many subclasses out there...

The class AbstractFileSystem implements the open but again not really clear

We typed this as

ReadBuffer[bytes]

in pandas, but this is very generic. The implementation lives here: https://github.com/pandas-dev/pandas/blob/ee3c18f51b393893ed6e31214c7be2f9427ce0c9/pandas/_typing.py#L270

@dcherian
Copy link
Contributor

OK merging since this fixes a regression. We can followup with typing improvements.

@dcherian dcherian merged commit 9f459c3 into pydata:main Nov 15, 2024
29 checks passed
@phofl phofl deleted the fsspec-open branch November 15, 2024 20:19
@headtr1ck headtr1ck mentioned this pull request Nov 15, 2024
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 16, 2024
* main:
  Add download stats badges (pydata#9786)
  Fix open_mfdataset for list of fsspec files (pydata#9785)
  add 'User-Agent'-header to pooch.retrieve (pydata#9782)
  Optimize `ffill`, `bfill` with dask when `limit` is specified (pydata#9771)
dcherian added a commit that referenced this pull request Nov 19, 2024
* main: (24 commits)
  Bump minimum versions (#9796)
  Namespace-aware `xarray.ufuncs` (#9776)
  Add prettier and pygrep hooks to pre-commit hooks (#9644)
  `rolling.construct`: Add `sliding_window_kwargs` to pipe arguments down to `sliding_window_view` (#9720)
  Bump codecov/codecov-action from 4.6.0 to 5.0.2 in the actions group (#9793)
  Buffer types (#9787)
  Add download stats badges (#9786)
  Fix open_mfdataset for list of fsspec files (#9785)
  add 'User-Agent'-header to pooch.retrieve (#9782)
  Optimize `ffill`, `bfill` with dask when `limit` is specified (#9771)
  fix cf decoding of grid_mapping (#9765)
  Allow wrapping `np.ndarray` subclasses (#9760)
  Optimize polyfit (#9766)
  Use `map_overlap` for rolling reductions with Dask (#9770)
  fix html repr indexes section (#9768)
  Bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.2 in the actions group (#9763)
  unpin array-api-strict, as issues are resolved upstream (#9762)
  rewrite the `min_deps_check` script (#9754)
  CI runs ruff instead of pep8speaks (#9759)
  Specify copyright holders in main license file (#9756)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 19, 2024
* main:
  Bump minimum versions (pydata#9796)
  Namespace-aware `xarray.ufuncs` (pydata#9776)
  Add prettier and pygrep hooks to pre-commit hooks (pydata#9644)
  `rolling.construct`: Add `sliding_window_kwargs` to pipe arguments down to `sliding_window_view` (pydata#9720)
  Bump codecov/codecov-action from 4.6.0 to 5.0.2 in the actions group (pydata#9793)
  Buffer types (pydata#9787)
  Add download stats badges (pydata#9786)
  Fix open_mfdataset for list of fsspec files (pydata#9785)
  add 'User-Agent'-header to pooch.retrieve (pydata#9782)
  Optimize `ffill`, `bfill` with dask when `limit` is specified (pydata#9771)
  fix cf decoding of grid_mapping (pydata#9765)
  Allow wrapping `np.ndarray` subclasses (pydata#9760)
  Optimize polyfit (pydata#9766)
  Use `map_overlap` for rolling reductions with Dask (pydata#9770)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

open_mfdataset with remote files is broken because of #9687
4 participants