Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt Zarr v3 #182

Closed
normanrz opened this issue Mar 21, 2023 · 9 comments
Closed

Adopt Zarr v3 #182

normanrz opened this issue Mar 21, 2023 · 9 comments

Comments

@normanrz
Copy link
Contributor

The Zarr v3 specification draft is in its final stages. The current status can be viewed here: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html

The main changes from the OME-Zarr perspective are:

  • The core metadata of Zarr has changed to support extensibility
  • .zarray + .zgroup + .zattrs are now all stored in zarr.json files
  • dimension_names will be part of the core metadata

Other interesting changes:

  • Codecs become chainable and nestable
  • Configurable chunk keys, e.g. c/1/2/3, c.1.2.3, 1/2/3 (current OME-Zarr default), 1.2.3
  • Added storage transformers
  • Sharding is not yet part of v3, but will follow shortly after with ZEP2

To move OME-Zarr from v2 to v3, I would propose the following high-level changes:

  • Move the metadata from .zattrs to the attributes key in a group's zarr.json
  • Introduce an ome key under which all the metadata goes
  • Move version key up

Example group metadata for a multiscale image:

{
  "zarr_format": 3,
  "node_type": "group",
  "attributes": {
    "ome": {    
      "version": "1.0",
      "multiscales": [
        {
          "name": "example",
          "axes": [
            {"name": "c", "type": "channel"},
            {"name": "z", "type": "space", "unit": "micrometer"},
            {"name": "y", "type": "space", "unit": "micrometer"},
            {"name": "x", "type": "space", "unit": "micrometer"}
          ],
          "datasets": [
            {
              "path": "0",
              "coordinateTransformations": [{
                "type": "scale",
                "scale": [1.0, 0.5, 0.5, 0.5]
              }]
            },
            {
              "path": "1",
              "coordinateTransformations": [{
                "type": "scale",
                "scale": [1.0, 1.0, 1.0, 1.0]
              }]
            }
          ]
        }
      ]
    }
  }
}

Example array metadata:

{
  "zarr_format": 3,
  "node_type": "array",
  "shape": [1, 4096, 4096, 1536],
  "data_type": "uint8",
  "chunk_grid": {
    "configuration": { "chunk_shape": [1, 64, 64, 64] },
    "name": "regular"
  },
  "chunk_key_encoding": {
    "configuration": { "separator": "/" },
    "name": "default"
  },
  "fill_value": 0,
  "codecs": [
    { "name": "transpose", "configuration": { "order": "F" } },
    {
      "name": "blosc"
      "configuration": {
        "cname": "zstd",
        "clevel": 5,
        "shuffle": 0,
        "blocksize": 0
      },
    }
  ],
  "dimension_names": ["c", "z", "y", "x"],
  "attributes": {}
}

Example directory structure:

└── example
    │
    ├── zarr.json             # Each image is a Zarr group, or a folder, of other groups and arrays.
    ├── 0                     # Each multiscale level is stored as a separate Zarr array,
    │   ...                   # which is a folder containing chunk files which compose the array.
    ├── n                     # The name of the array is arbitrary with the ordering defined by
    │   │                     # by the "multiscales" metadata, but is often a sequence starting at 0.
    │   │
    │   ├── zarr.json         # Each resolution level is an array.
    │   └─ c                  # Chunks are stored with the chunk key encoding as specified in the zarr.json.
    │      └─ 0               # In this example, the chunk key encoding is "default" with separator "/".
    │         └─ 1            # All but the last chunk element are stored as directories. The terminal chunk is a
    │            └─ 2         # file. Together the directory and file names provide the "chunk coordinate" (c, z, y, x)
    │               └─ 3      # where the maximum coordinate will be dimension_size / chunk_size.
    │
    └── labels
        │
        ...
@sbesson
Copy link
Member

sbesson commented Mar 21, 2023

Thanks for the write-up and the initial version of the proposal @normanrz. From an external perspective, this looks fairly minimal and self-contained although I assumed the bulk of the issue will be to update all implementations across languages.

A couple of immediate high-level thoughts/questions:

  • To move OME-Zarr from v2 to v3 probably foreshadows a big initial question: is the intent to make a breaking transition in the specification to switch the Zarr format requirement from v2 to v3? Or would we consider one or multiple versions of the specifications that would remain compatible with both layouts?
  • the addition of the dimension_names also raises a few questions of unification as similar metadata is defined in a few other places:

@will-moore
Copy link
Member

I think it could be pretty hard to support Zarr v2 and v3 for a single version of OME-NGFF - or it would be as hard as supporting a big change in the NGFF spec. Don't know how the timing of NGFF v0.5 and Zarr v3 compare?

@normanrz
Copy link
Contributor Author

I think it could be pretty hard to support Zarr v2 and v3 for a single version of OME-NGFF - or it would be as hard as supporting a big change in the NGFF spec.

I agree. OME-Zarr 0.1-0.5 should support v2 and subsequent versions should only support v3. @joshmoore floated the idea to make the transition to v3 the 1.0 release of OME-Zarr.

the addition of the dimension_names also raises a few questions of unification as similar metadata is defined in a few other places:

I don't think the dimension_names will be particularly useful for OME-Zarr because (a) they are specified at the array-level and (b) they are just names and cannot carry additional metadata. Unfication within OME-Zarr may be useful, though.

@normanrz
Copy link
Contributor Author

Don't know how the timing of NGFF v0.5 and Zarr v3 compare?

Timing is hard to predict. Best case scenario: Zarr v3 is formally accepted end of April.

@normanrz
Copy link
Contributor Author

Zarr v3 has been approved last week. zarr-developers/zarr-specs#227

@joshmoore
Copy link
Member

❤️ which means that the implementors will now get busily underway.

As I mentioned to Norman elsewhere, I can certainly see this as being a good time to start the specification for this issue, but in that process we should define what metrics we want to define for adoption: number of implementations, must-have implementations, performance metrics, etc.

@clbarnes
Copy link

clbarnes commented Jul 12, 2023

Introduce an ome key under which all the metadata goes

+1 for this

Configurable chunk keys, e.g. c/1/2/3, c.1.2.3, 1/2/3 (current OME-Zarr default), 1.2.3

Is there any need to specify this in the NGFF spec? It would be nice to punt the responsibility for that on to the underlying zarr IO utility.

@normanrz
Copy link
Contributor Author

Configurable chunk keys, e.g. c/1/2/3, c.1.2.3, 1/2/3 (current OME-Zarr default), 1.2.3

Is there any need to specify this in the NGFF spec? It would be nice to punt the responsibility for that on to the underlying zarr IO utility.

Yes! I was just listing new features of Zarr v3.

@joshmoore
Copy link
Member

Closing along with #242. Final testing and announcement are forthcoming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants