-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Providing filesystem credentials through storage_options #436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great - thanks for submitting it @balanz24.
I've made a couple of review comments, and some formatting fixes are needed too. You can install pip install pre-commit
then run pre-commit run --all-files
to fix them.
cubed/core/ops.py
Outdated
@@ -308,6 +308,7 @@ def blockwise( | |||
extra_func_kwargs=extra_func_kwargs, | |||
fusable=fusable, | |||
num_input_blocks=num_input_blocks, | |||
storage_options=spec.storage_options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know they are all kwargs, but the storage_options
parameter would be better immediately after target_path
, since it belongs with target_store
and target_path
. This applies to all the changes in ops.py and blockwise.py, but not rechunk.py.
cubed/storage/zarr.py
Outdated
**kwargs, | ||
): | ||
"""Create a Zarr array lazily in memory.""" | ||
# use an empty in-memory Zarr array as a template since it normalizes its properties | ||
template = zarr.empty( | ||
shape, dtype=dtype, chunks=chunks, store=zarr.storage.MemoryStore() | ||
) | ||
if storage_options: | ||
s3 = s3fs.S3FileSystem(**storage_options) | ||
store = s3fs.S3Map(root=store, s3=s3, check=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is S3-specific, which we don't want to hardcode. In fact, I think you don't need any changes in zarr.py at all - since the storage_options
from the spec will be passed straight through to zarr.open_array
. Does that work in your case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. They can be passed through **kwargs.
However kwargs were not passed to zarr.open_array
inside LazyZarrArray.open()
so I made that modification.
Thanks for the comments @tomwhite. I was looking at |
This is the most consistent way of doing it. |
It would be good to add a unit test. Perhaps use a fsspec local filesystem and set |
@@ -109,7 +109,7 @@ def from_zarr(store, path=None, spec=None) -> "Array": | |||
The array loaded from Zarr storage. | |||
""" | |||
name = gensym() | |||
target = zarr.open(store, path=path, mode="r") | |||
target = zarr.open(store, path=path, mode="r", storage_options=spec.storage_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if spec
is None
, which it can be when the default config is being used. You can fix this by adding a line like this:
Line 47 in 3d08513
self.spec = spec or spec_from_config(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Sorry for the confussion.
Regarding the unit test, I'm trying something like this: def test_storage_options(tmp_path):
zarr_path = f"file://{tmp_path}/dir/array.zarr"
arr = lazy_zarr_array(zarr_path, shape=(3, 3), dtype=int, chunks=(2, 2), storage_options={"auto_mkdir": False})
with pytest.raises(ValueError):
arr.open()
arr.create() The problem is that when running it locally, the non-existing intermediate directory gets created with |
Ah, it looks like Zarr is doing this: So that's not going to work as a test. I can't think of another simple way of testing this - have you got any ideas? Also, how has the manual testing been going? |
With this current version the credentials are passed fine and I'm able to use a minIO server as the storage backend. I don't know how can we test it without having access to an external filesystem. It seems that in linux you can't create a directory with credentials through POSIX, nor simulate an S3 API over local directories. |
Fixes #432. This allows to specify the credentials of a storage file system to Spec class. Needed for S3-compatible storage system such as minIO.