Clarify plate and well specifications for sparse plates #24

melissalinkert · 2020-12-14T01:26:58Z

Starting point for discussion. The main scenarios to clarify are plates that are missing entire row(s)/column(s), and wells with a field in some (but not all) of the defined acquisitions.

will-moore · 2020-12-14T11:39:47Z

Looks good. 👍

sbesson · 2020-12-17T11:25:17Z

The suggested changes also read fine to me and are inline with the decisions made for the first version of the HCS specification. From my side, this commit could also be ported directly to the 0.1/index.bts specification as well.

As discussed recently, as we start applying the OME-Zarr HCS specification to more real-world HCS use cases especially sparse plates, we might need to review and reconsider how we handle these the specification. This can be captured and discussed as a separate issue.

melissalinkert · 2021-03-17T18:06:40Z

0c28690 expands on the sparse plate handling to explicitly identify the row and column for each well. glencoesoftware/bioformats2raw#91 is a corresponding proposed implementation.

Both are based on discussion with @kkoz and @chris-allan. In the sparse plate example where only C5 and D7 are acquired, a human reading the JSON can clearly see that C/5 means C5 and D/7 means D7, but the only way to automatically calculate that is to split the well path on / and match each token against rows and columns.

Happy to split 0c28690 into a separate issue if that's easier to discuss.

will-moore · 2021-03-18T14:52:34Z

There is a proposal to simplify the specifications of "collections" #31.
I assume that this will replace the existing HCS spec with something more generic.
Currently it looks like we are nearing some consensus on the overall structure of the data.
But haven't yet decided on any specific keywords for adding e.g. HCS metadata.
I'll try and come up with a suggestion, although it may initially not include plate-acquisition info.

sbesson · 2021-03-19T13:58:40Z

latest/index.bs

  <dt><strong>version</strong></dt>
  <dd>A string defining the version of the specification.</dd>
  <dt><strong>wells</strong></dt>
  <dd>A list of JSON objects defining the wells of the plate. Each well object 
-      MUST contain a `path` key identifying the path to the well subgroup.</dd>
+      MUST contain a `path` key identifying the path to the well subgroup.
+      Each well object MUST contain both a `row_index` key identifying the index into


Regarding #24 (comment), are there cases where it is not possible to recompute these indexes based on the knowledge of the individual wells path as well as the rows and names dictionaries? If recomputing is always possible (but at the cost of the consumer), my primary consideration is whether the recommendation for these new fields should be SHOULD rather than MUST.

For real-world examples, I can definitely see how row_index/column_index makes sense in terms of optimizing some of the queries. In addition to testing this with sparse plates, it will be useful to also generate representative plate with many wells (384 at least) to check there is no performance impact with the extra JSON metadata.

In order for these indexes to be forward or reverse computable, path would need to be much more explicitly defined than it is now:

A list of JSON objects defining the fields of views for a given well. Each object MUST contain a path key identifying the path to the field of view. If multiple acquisitions were performed in the plate, it SHOULD contain an acquisition key identifying the id of the acquisition which must match one of acquisition JSON objects defined in the plate metadata.

Furthermore, the wells array would need have be null or similar padding in order for those indexes to make sense.

Neither of these things are ideal obviously. I don't think there's a way to not have these things be MUST if we want to guarantee that lookups can happen based on physical plate characteristics.

melissalinkert · 2021-12-02T19:33:18Z

5a1ddc7 is based on glencoesoftware/bioformats2raw#119 and discussion with @chris-allan earlier today, in preparation for discussion with @sbesson tomorrow. The proposed changes around well path in particular are still up for debate.

will-moore · 2021-12-03T11:52:10Z

latest/index.bs

+      additional leading or trailing directories.
+      Each well object MUST contain both a `row_index` key identifying the index into
+      the `rows` list and a `column_index` key indentifying the index into
+      the `columns` list. `row_index` and `column_index` MUST be 0-based.</dd>


I realise that #70 has been added after this PR was opened, but the decision there means these new attributes should now be named rowIndex and columnIndex.

Should be fixed in 7c2536a.

melissalinkert · 2021-12-09T17:50:03Z

Following discussion with @sbesson and @chris-allan, 3c31c14 relaxes the "no empty groups" statement to address #24 (comment). There are also some clarifications to the row and column naming, intended to be consistent with https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#NamingConvention.

sbesson · 2021-12-15T22:41:10Z

Overview

The specification changes proposed in this PR closely reflect several choices made in omero-cli-zarr when implementing the first version of the HCS metadata. Effectively, this migrates implementation details at the specification level clearing several ambiguities when dealing with sparse plates. The advantages are:

reduce the divergence between writing implementations
clarify the expectations for consumer when dealing with NGFF datasets implementing the HCS specification

RFC and Community call

This PR has now passed several rounds of internal review and is reaching the state where community feedback would be useful before integrating formally in an upcoming version of the specification. Given the latest announcement of the next NGFF call, I would propose to set the week of 2022-01-24 as the deadline for public comments. Ideally, we can review the state of this proposal, reach an agreement and decide on the timeline for getting these changes released as part of this community call.

Specific comments

Empty Zarr groups

A former version of the proposal forbade the existence of Zarr groups for wells and well rows containing no images. Following the feedback from #24 (comment), the latest version now reduces this as a recommendation. I can think of rationales backing both specification. Importantly, the biggest decision factor might be at the level of the consumer library:

assuming a strict proposal (MUST NOT), a library can either use the existence of Zarr group or the wells metadata to determine whether wells are populated with images
assuming the more lenient proposal (SHOULD NOT), a library cannot rely on the exisence of Zarr groups. Instead, the wells field acts as the single source of truth for whether wells are populated.

Rows/columns names

The new requirements regarding the content of the rows and columns arrays allow to communicate a representation of the physical plate layout independently of whether wells are populated or not.

Constraints have been added to the name definition of rows and columns. This should make broken scenarios like duplicate row/column names invalid as per the specification. Additionally, these constraints support the ubiquitous convention in the High-Content Screening domain of using letters/numbers for rows/columns e.g. row A, column 2 while still catering for some flexibility in the naming of rows/columns.

Wells indices

A major change in this proposal is that each well element now requires three keys: a path AND a rowIndex AND a columnIndex. The first element is unchanged compared to the previous specs and specifies the path to the Zarr group. The two indices allow to link this group to the associated row/column in the plate metadata.

The examples in the specification page as well as the bioformats2raw and omero-cli-zarr implementations use systematic naming conventions where the path to the well is derived from the names of the corresponding row and the column e.g. the well corresponding to row A(columnIndex: 0) and column 2 (columnIndex: 1) is located in path A/2. This representation has obviously readability advantageous but this behavior is not enforced by the current proposal i.e. libraries should assume that the path to individual wells is independent of the row/column names.

As discussed in #24 (comment), an alternate proposal would be to force a mapping between the path to the Zarr group of the well and the names of the row and column associated with this well. Under such a proposal, it would become superfluous to require both the path and rowIndex/columnIndex attributes in the wells array as one could be recomputed from the other.
Probably the biggest trade-offs up for discussion here are:

flexibility e.g. support for well paths of type 0/0
size of the plate metadata e.g. for 1536 wells plate
performance i.e. cost of the name <-> index lookup

Well metadata

At the well group level, for multi-acquisition plates, the acquisition key is now a mandatory key. An alternative would be to define some default behavior if this field is absent e.g. the first element of the top-level acquisitions array. Multi-acquisition plates are the exception rather the norm but even in these scenarios, this change sounds completely reasonable to me

Samples and implementations

The specification includes a few examples of metadata for sparse HCS data that complements the existing examples of dense plates. As for every release, representative real-world HCS examples should be generated covering as many features as possible.

In terms of implementation, glencoesoftware/bioformats2raw#119 contains the implementation of these changes for bioformats2raw, changes are expected to omero-cli-zarr to support the new attributes. Possibly consuming libraries like vizarr could be updated to benefit from the index lookup.

sbesson · 2021-12-16T09:05:34Z

Another comment while looking at validation this morning is that the the current specification does not define the level of requirement for the keys under plate and well.

The initial JSON schema introduced in https://github.com/ome/ngff/pull/76/files#diff-2e387106f2f394aca19236f21c170f70b94c7931f60bfc2d8f6549941105e0cfR105 defines version, rows, columns and wells as required for plate. This would leave acquisitions, field_count and name as optional. This is largely in-line with the spirit of the changes here with possibly a discussion around version as this key is marked as recommended in the other specifications. That being said, I am personally in favor of enforcing version as a requirement everywhere in the mid-term.

For well, I assume images is required and version is either required or recommended, aligning with the decision regarding plates.

latest/index.bs

sbesson

After reviewing the discussions from the 6th OME-NGFF community call, no objection/amendment was made for this proposal. The various HCS aware OME-Zarr implementations have been updated to support the new proposed layout. Merging in preparation of the upcoming 0.4 specification announcement.

@sbesson

Clarify plate and well specifications for sparse plates SHA: 416a377 Reason: push, by @sbesson Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

imagesc-bot · 2022-02-09T10:06:25Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-2022-01-27/60885/11

Clarify plate and well specifications for sparse plates

8b24328

Add "row_index" and "column_index" to "wells" spec

0c28690

will-moore mentioned this pull request Mar 18, 2021

Collections Specification #31

Open

sbesson reviewed Mar 19, 2021

View reviewed changes

chris-allan mentioned this pull request Nov 22, 2021

Fix problems with missing groups/metadata and sparse plates glencoesoftware/bioformats2raw#119

Merged

Stricter requirements for row/column names, well groups, and well paths

5a1ddc7

will-moore reviewed Dec 3, 2021

View reviewed changes

melissalinkert added 3 commits December 3, 2021 12:40

Merge branch 'main' of git://github.com/ome/ngff into sparse-plate

f20c03c

Change row_index and column_index to rowIndex and columnIndex

7c2536a

Relax row/well group requirement and clarify naming expectations

3c31c14

sbesson mentioned this pull request Dec 17, 2021

Add new API to write multiscales metadata ome/ome-zarr-py#149

Merged

sbesson mentioned this pull request Jan 5, 2022

Add API for writing HCS metadata ome/ome-zarr-py#153

Merged

This was referenced Jan 13, 2022

Add support for passing wells as List[dict] in write_plate_metadata ome/ome-zarr-py#157

Merged

Add support for HCS 0.4 specification ome/ome-zarr-py#159

Merged

sbesson reviewed Jan 18, 2022

View reviewed changes

latest/index.bs Show resolved Hide resolved

sbesson mentioned this pull request Jan 28, 2022

Finalizing v0.4 #86

Open

13 tasks

sbesson added this to the 0.4 milestone Feb 1, 2022

sbesson approved these changes Feb 2, 2022

View reviewed changes

sbesson merged commit 416a377 into ome:main Feb 2, 2022

sbesson mentioned this pull request Mar 16, 2022

OME Metadata Support #104

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify plate and well specifications for sparse plates #24

Clarify plate and well specifications for sparse plates #24

melissalinkert commented Dec 14, 2020

will-moore commented Dec 14, 2020

sbesson commented Dec 17, 2020

melissalinkert commented Mar 17, 2021

will-moore commented Mar 18, 2021

sbesson Mar 19, 2021

chris-allan Mar 19, 2021

melissalinkert commented Dec 2, 2021

will-moore Dec 3, 2021

melissalinkert Dec 3, 2021

melissalinkert commented Dec 9, 2021

sbesson commented Dec 15, 2021

sbesson commented Dec 16, 2021

sbesson left a comment

imagesc-bot commented Feb 9, 2022

Clarify plate and well specifications for sparse plates #24

Clarify plate and well specifications for sparse plates #24

Conversation

melissalinkert commented Dec 14, 2020

will-moore commented Dec 14, 2020

sbesson commented Dec 17, 2020

melissalinkert commented Mar 17, 2021

will-moore commented Mar 18, 2021

sbesson Mar 19, 2021

Choose a reason for hiding this comment

chris-allan Mar 19, 2021

Choose a reason for hiding this comment

melissalinkert commented Dec 2, 2021

will-moore Dec 3, 2021

Choose a reason for hiding this comment

melissalinkert Dec 3, 2021

Choose a reason for hiding this comment

melissalinkert commented Dec 9, 2021

sbesson commented Dec 15, 2021

Overview

RFC and Community call

Specific comments

Empty Zarr groups

Rows/columns names

Wells indices

Well metadata

Samples and implementations

sbesson commented Dec 16, 2021

sbesson left a comment

Choose a reason for hiding this comment

imagesc-bot commented Feb 9, 2022