Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing filesystem credentials to cubed.Spec #432

Closed
balanz24 opened this issue Mar 19, 2024 · 2 comments · Fixed by #436
Closed

Providing filesystem credentials to cubed.Spec #432

balanz24 opened this issue Mar 19, 2024 · 2 comments · Fixed by #436

Comments

@balanz24
Copy link
Contributor

Currently cubed doesn't allow to specify the credentials of a storage file system to Spec class. Although the class allows to pass some storage_options, these are actually never used.

This is a limitation since when working with S3-compatible storage system such as minIO the user should be able to specify credentials.

The solution could come from:

  1. Specifying credentials and addtional configuration parameters to Spec:
spec = cubed.Spec(work_dir="s3://cubed", allowed_mem='2GB', 
reserved_mem='100MB', executor_name="lithops", 
storage_options={"client_kwargs":{"endpoint_url": 'http://<ip>:<port>'}, "key": "key", "secret": "secret"})
  1. Wrapping the store path with specs storage_options when creating the temp_store and the target_store. e.g. in rechunk
if target_store is None:
        target_store = new_temp_path(name=name, spec=spec) # return not only path, but also store arguments
name_int = f"{name}-int"
temp_store = new_temp_path(name=name_int, spec=spec) # return not only path, but also store arguments
  1. Before calling zarr.open_array, checking if self.store is just a string or a string packed with additional arguments. If that is the case create a S3 compatible storage class as:
if isintance(self.store, dict):
  s3 = s3fs.S3FileSystem(**self.store["storage_options"])
  self.store = s3fs.S3Map(root=self.store["path"], s3=s3, check=False)

target = zarr.open_array(
            self.store,
            mode=mode,
            shape=self.shape,
            dtype=self.dtype,
            chunks=self.chunks,
            path=self.path,
            fill_value=self.fill_value,
            **self.kwargs,
        )

As accessing a MinIO server from zarr, for instance, is done by:

s3 = s3fs.S3FileSystem(key='key', secret='secret', client_kwargs={"endpoint_url": 'http://<ip>:<port>' })
store = s3fs.S3Map(root='s3://cubed/test.zarr', s3=s3, check=False)
a = zarr.open_array(shape=(10, 100, 100), chunks = (1, 100, 100), dtype = 'float64', store=store)
@tomwhite
Copy link
Member

Thanks for opening this @balanz24.

Although the class allows to pass some storage_options, these are actually never used.

Good catch!

The solution could come from:

  1. Specifying credentials and addtional configuration parameters to Spec:

This seems like the way to go. The storage_options need to be passed to the blockwise and rechunk functions then to lazy_zarr_array.

I don't think you need to check if store is a string or not as the storage_options are separate from the store.

Do you want me to create a PR or do you want to have a go?

@balanz24
Copy link
Contributor Author

Thanks for your quick response!
I can create the PR myself with these changes we commented as a starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants