Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.5.0 Array overhaul #13

Merged
merged 54 commits into from
Aug 29, 2024
Merged

v0.5.0 Array overhaul #13

merged 54 commits into from
Aug 29, 2024

Conversation

pattonw
Copy link
Contributor

@pattonw pattonw commented Aug 25, 2024

The main changes:

  1. Support for applying lazy functions to modify the data, with a special emphasis on slicing. You can now do something like time_slice_array = open_ds("path/to/data.zarr/array").lazy_op(np.s_[0:5]) which would open your data as a funlib.persistence.Array, and then slice the first 5 time steps (assuming your data has time in the first channel. You can also apply functions such as: thresholded_array = open_ds("path/to/data.zarr/array").adapt(lambda x: x > 0.5) which will lazily apply the function and will appropriately update the thresholded_array.dtype so assert thresholded_array.dtype == bool should pass. You can write to the array if you only use slicing operations, but once you apply a function to your data it will no longer be writable. Arrays are now backed by dask so our support extends to but is also limited by the lazy slicing and processing that dask supports.
  2. Slight interface change. open_ds and prepare_ds take a single store argument. This is directly passed to zarr.open, so we now both expand our support to anything zarr supports (zipped stores, cloud stores, etc.) but also limit ourselves (no more hdf5 etc.). Note this limitation only applies to the convenience functions open_ds and prepare_ds which come with expectations on data format and metadata format. Array will still work with any array like object that can be converted to a dask.Array with dask.from_array. If your data does not match our priors, we recommend writing custom open_ds and prepare_ds alternatives
  3. No longer provide the total_roi and num_channels when using prepare_ds or directly calling Array. We now just pass offset (in units defined by the "units" attribute) and shape (voxels). This means we now support any number of channel dimensions. I.e. you can do prepare_ds(..., offset = (100,200,300), shape = (3, 3, 300, 300, 300)) to have 2 channel dimensions and 3 physical which previously wouldn't have been straightforward
  4. expanded metadata. We now have axis_names, units, voxel_size, and offset. I have separated out a metadata class and a metadata parsing class that can be modified to cover a fairly large variety of simple metadata schemes, and added some reasonable defaults so this metadata will always be present or errors will be thrown if metadata is contradictory. If your metadata requires special parsing (e.g. you store your metadata on the multiscale group instead of directly on the array you are opening) then it is easy to pass in metadata fields to skip the automatic parsing so you can write your own thin wrapper for your specific data.
  5. Added support for configuring the default metadata schema. We check the following paths: "pyproject.toml", "funlib_persistence.toml", Path.home() / ".config/funlib_persistence/funlib_persistence.toml", "/etc/funlib_persistence/funlib_persistence.toml" for configs. The attributes that can be provided are voxel_size_attr, axis_names_attr, units_attr, and offset_attr. Whatever attributes you provide will be used for both reading and writing metadata. You can also override the default metadata in each python script via funlib.persistence.arrays.metadata.set_default_metadata_format(...).

pattonw and others added 30 commits June 14, 2024 09:26
allow integer indexing to eliminate dimensions
`MetaData` class now needs to take a `shape` argument to unambiguously determine the number of physical/channel dimensions.
@pattonw pattonw merged commit c26e256 into main Aug 29, 2024
7 checks passed
@pattonw pattonw deleted the array-overhaul branch August 29, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants