Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table spec proposal #64

Closed
wants to merge 35 commits into from
Closed
Changes from 28 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4559ce5
add tables spec draft
kevinyamauchi Oct 1, 2021
c9148a7
X must be a single dtype
kevinyamauchi Oct 1, 2021
d3330a2
add points spec draft
kevinyamauchi Oct 1, 2021
957555f
add spec for axis names
kevinyamauchi Oct 20, 2021
f8f2fd0
remove points spec
kevinyamauchi Jun 19, 2022
83c9491
Merge branch 'main' into add-tables-points
kevinyamauchi Jun 19, 2022
8d2d64e
add layers
kevinyamauchi Jun 19, 2022
d715323
add varm/obsm
kevinyamauchi Jun 19, 2022
98be5fa
add obsp/varp
kevinyamauchi Jun 19, 2022
bc72166
add missing .zarray
kevinyamauchi Jun 19, 2022
f4a13c6
add @type
kevinyamauchi Jun 19, 2022
ffdd046
Apply suggestions from code review
kevinyamauchi Jun 20, 2022
0d5636e
update uns
kevinyamauchi Jun 20, 2022
1d59ca4
add parent
kevinyamauchi Jun 20, 2022
d91da64
MAY -> MUST for anndata encoding
kevinyamauchi Jun 20, 2022
4ad6a90
indices, indptr dtype
kevinyamauchi Jun 20, 2022
fac5855
csc matrix
kevinyamauchi Jun 20, 2022
e3f779c
update table metadata
kevinyamauchi Jun 20, 2022
75fb5e1
Update index.bs
kevinyamauchi Jun 20, 2022
00e85c1
Update latest/index.bs
kevinyamauchi Jun 20, 2022
5d0b5b4
Update latest/index.bs
kevinyamauchi Jun 20, 2022
fe2cfad
update var table
kevinyamauchi Jun 29, 2022
692464f
update path spec
kevinyamauchi Oct 8, 2022
cc83a82
typo
kevinyamauchi Oct 8, 2022
b2de201
Apply suggestions from @ivirshup
kevinyamauchi Oct 28, 2022
283d21b
clarify allowed locations
kevinyamauchi Oct 28, 2022
bf81797
remove intensity image metadata
kevinyamauchi Oct 28, 2022
4b46882
remove reference to image id
kevinyamauchi Nov 15, 2022
f3e960b
add tables metadata
kevinyamauchi Jan 8, 2023
ad48296
add list of keys for subgroups
kevinyamauchi Jan 8, 2023
55fb1a8
add tables
giovp Feb 22, 2023
337971b
rename table
giovp Feb 22, 2023
e9f707b
add my_table consistently
giovp Feb 22, 2023
22844a4
remove comment
giovp Feb 23, 2023
ea8a622
Merge pull request #1 from giovp/add-tables-points
kevinyamauchi Feb 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
285 changes: 285 additions & 0 deletions latest/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,292 @@ For this example we assume an image with 5 dimensions and axes called `t,c,z,y,x
└── n
</pre>

Tables {#table-layout}
----------------------
The following describes the expected layout for tabular data.
OME-NGFF tables are compatible with the [AnnData model](https://github.com/scverse/anndata).

<pre>
. # Root folder, potentially in S3,
│ # with a flat list of images.
└── 123.zarr
└── table # The table group is a container which holds a table that is compatible with AnnData.
Copy link
Member

@will-moore will-moore Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example at https://github.com/kevinyamauchi/ome-ngff-tables-prototype/blob/4fdde521b6b5514424f9c6508a8b1fc3a2cff86e/src/ngff_tables_prototype/writer.py#L248 this group is called "tables" with path of tables/regions_table/.zattrs whereas here it is just table/.zattrs (without the nested regions_table group.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion! As specified, the group containing a given table can either be in root or inside of another group.

In the ome-ngff-tables-prototype example, the table called regions_table is within a group called tables. In this case, one could have multiple tables (e.g., regions_table_1, and regions_table_2) within the tables group. As written here, the table is called itable and is stored in root.

I can see how the usage of table here is confusing. Should I rename the table to something else or maybe just specify in the comment that this is a table stored in root?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 'labels' case the image.zarr/labels/.zattrs contains a list of child labels, e.g.

{
    "labels": [
        "label_image"
    ]
}

This allows you to find the child labels without having to ls to find all child directories (which you can't do on s3 or http etc.).
So we should do something similar with tables...

image.zarr/tables/.zattrs lists one or more tables:

{
    "tables": [
        "regions_table_1",
        "regions_table_2"
    ]
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented this proposal in kevinyamauchi/ome-ngff-tables-prototype#12

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this to the spec here: f3e960b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @kevinyamauchi. However, that's not quite the same as what we have for labels. In the above suggestions (and for labels) we have an extra level of the hierarchy that is not what's currently in this PR:

image.123/tables/my_table/ where image.123/tables/.zattrs has {"tables": ["my_table"]}.

In this case, if there are any tables for the image.123 then image.123/tables/.zattrs should exist.

Currently this PR has image.123/table/ where image.123/.zattrs has {"tables": ["table"]}.
So the note about .zattrs listing "tables" needs to be moved one level lower in the hierarchy, and rename image.123/table to image.123/tables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @will-moore , I think there shouldn't be problems here, although @ivirshup and @kevinyamauchi should confirm. I made a PR on kevin's branch here: kevinyamauchi#1 is that you had in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, @will-moore . I have merged @giovp 's PR, which should address this issue.

│ # The table group MAY be in the root of the zarr file.
├── .zgroup # The table group MAY be in root or in another group.
|
├── .zattrs # `.zattrs` MUST contain "type", which is set to `"ngff:region_table"`
| # `.zattrs` MUST contain "region", which is the path to the data the table is annotating.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this MUST is too strong for two reasons.

  • table that we are experimentally using for points and polygons adhere to this spec except for not having region, region_key and instance_key specified, so relaxing this would make this table future proof, and be a general table that is the base for other types of tables (for instance annotating collection of images/sample, describing points/shapes, etc). This is also the reason why I would call this table ngff:table. An alternative is to make this table specific exactly to annotating labels, and then relax this in further specifications, but I think that people would start using the table for more general uses anyway.
  • Linked to the above, I think it would be useful to store also general expression (non-spatial tables) in this way, for instance when the lazy loading from the chunked storage is ready. An application to spatial data would be the possibility to save in one OME-NGFF file (or multiple ones), both spatial data to be deconvoluted and the cell type annotation used to map the cell types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for relaxing the constraint of having an associated region, region_key & instance_key. We also use tables both for region annotation (e.g. feature measurements) and for defining regions of interest in intensity images (=> no label image exists)

| # "region" MUST be a single path (single region) or an array of paths (multiple regions).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more correct to use the terms regions, regions_key, instance_key (notice the plural for regions), this because Labels (and in the future Points, Circles, Polygons, ...) are objects that describe multiple regions, not just one region per object. So sentences like the one above "region" MUST be a single path (single region) or ... are misleading, because there is not a single region, but multiple regions, or a single regions object.

| # "region" paths MUST be objects with a key "path" and the path value MUST be a string.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogovicj , @joshmoore , does this track with your understanding of our proposal for paths?

Is the plan to add a section somewhere on paths that can be referenced?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current example at https://github.com/kevinyamauchi/ome-ngff-tables-prototype/blob/4fdde521b6b5514424f9c6508a8b1fc3a2cff86e/src/ngff_tables_prototype/writer.py#L60 generates a string e.g, "labels/label_image", not an object with "path".

Also, it'll be easier to validate etc if we have a list with a single entry for single region instead of a "single path or array of paths", since then we have to handle 2 different types of value.

| # `.zattrs` MUST contain "region_key" if "region" is an array. "region_key" is the key in `obs` denoting which region a given row corresponds to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment related to the paths. As you pointed out in the presentation, the column named after the value of region_key could be either a full/relative path/url (string) either an index relative to the paths described in region. Maybe the second would be preferable. Also it could be a workaround for cases in which the same row of a table is annotating multiple regions, as mentioned in this pr discussion and developed more in details here scverse/spatialdata#34

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review comment was from months ago (forgot to send), I prefer the full path approach since it is less ambiguous, and the workaround for rows annotating multiple regions can still be achieved with full paths, as I described in this comment: #64 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have used relative paths everywhere else in the NGFF spec to date. This is preferable in the case that you generate the data in one location, then upload it to a different location/server which you don't know about when you generate the data.

| # `.zattrs` MAY contain "instance_key", which is the key in `obs` that denotes which instance in "region" the row corresponds to. If "instance_key" is not provided, the values from the `obs` `.zattrs` "_index" key is used.
├── X # You MAY add an zarr array `X`.
│ │ # `X` MUST not be a complex type (i.e., MUST be a single type)
│ │ # `X` MAY be chunked as the user desires.
│ ├── .zarray
│ ├── 0.0
│ │ ...
│ └── n.m
|
├── layers # You MAY add a `layers` group, which contains dense matrices with the same shape as X.
│ │
│ ├── .zgroup
│ ├── .zattrs
│ │
│ └── layer_0 # You MAY add a zarr array for each layer
| | # Each layer array MUST have the same shape as X
| | # Each layer array SHOULD be chunked the same as X
| ├── .zarray
| |
| ├── 0.0
│ │ ...
│ └── n.m
├── obs # You MUST add an obs group container. The obs group holds a table of annotations on the rows in X.
│ │ # The rows in obs MUST be index-matched to the rows in X.
│ ├── .zgroup
│ │
│ ├── .zattrs # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index.
│ │ # `.zattrs` MUST contain `"column-order"`, which is a list of the order of the non-_index columns.
│ │ # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dataframe"` by AnnData.
│ │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData.
│ │
│ └── col_0 # Each column in the obs table is a 1D zarr array. The rows can be chunked as the user desires.
kevinyamauchi marked this conversation as resolved.
Show resolved Hide resolved
│ ├── .zarray # However, the obs columns SHOULD be chunked in the same way as the rows in X (if present).
│ │
│ └─ 0
├── var # You MAY add a var group container. The var group holds a table of annotations on the columns in X.
| │ # The rows in var MUST be index-matched to the columns in X (if present).
| |
| ├── .zattrs # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index.
| │ # `.zattrs` MUST contain `"column-order"`, which is a list of the order of the non-_index columns.
| │ # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dataframe"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData.
| │
| ├── array_col # Columns in the var table MAY be a 1D zarr array. The rows can be chunked as the user desires.
| | ├── .zarray # However, the var columns SHOULD be chunked in the same way as the columns in X.
| | │
| | └─ 0
| |
| └── cat_col # Columns in the var table MAY be categorical
| ├── .zattrs. # `.zattrs` MUST contain `"encoding-type"`, which is set to `"categorical"` by AnnData.
| | # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData.
| |
| ├── categories
| | ├── .zarray # categories MUST be a 1D zarr array. The rows can be chunked as the user desires.
| | |
| | └─ 0
| ├── codes
| | ├── .zarray # codes MUST be a 1D zarr array. The rows can be chunked as the user desires.
| | |
| | └─ 0
| |
| ├── null_col # Columns in the var table MAY nullable integer
| ├── .zattrs. # `.zattrs` MUST contain `"encoding-type"`, which is set to `"nullable-integer"` by AnnData.
| | # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| |
| ├── mask
| | ├── .zarray # categories MUST be a 1D zarr array. The rows can be chunked as the user desires.
| | |
| | └─ 0
| └── values
| ├── .zarray # codes MUST be a 1D zarr array. The rows can be chunked as the user desires.
| |
| └─ 0
|
├── obsm # You MAY add a obsm group comtainer. The obsm group contains arrays that annotate the rows in X.
| │ # The rows in each array MUST be index-matched to the rows in X (if present).
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| │
│ └── obsm_0 # You MAY add a zarr array for each obsm matrix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where obsm_0 is referenced in the parent .zattrs?
How do we know to load the obsm_0 data (if we can't ls the directories below obsm)?
The obs/.zattrs has "column-order" which lists columns, but there isn't an equivalent for obsm/.zattrs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue exists for obsp, varm and varp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've basically assumed that you can ls, and not maintained order for these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that's not the case if you're loading data over https (or for some s3 backends), so we've avoided making that assumption elsewhere on the spec to allow web-based accessing of OME-NGFF.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the locations of everything in the hierarchy being either computable (implementations know how to figure it out) or explicitly listed in the metadata.

FWIW, the OME2022 zarr-java discussion today touched on exactly this point and the need to perhaps bubble this requirement up to the zarr spec itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider the 2 issues separately, since I think the cost/benefit of consolidated metadata are different...

For the top-level image.zarr/tables/.zattrs it is very easy to manually add the metadata to list the child tables as in kevinyamauchi/ome-ngff-tables-prototype#12. This should be a requirement (MUST) in the spec so all readers can rely on it (without needing to try consolidated metadata).

However, in the other case, I don't know how easy it is to adopt a similar approach for the obsm, obsp, layers, raw and uns groups? I'm not very familiar with AnnData and haven't dug into the creation of these groups.

Using a naive approach, I tried simply listing the sub-directories of these groups, and adding those names to the .zattrs of each group.

E.g. add this to write_table_regions() from the ome-ngff-tables-prototype:

    # write group names into .zattrs
    table_path = os.path.join(group.store.dir_path(), group.path, table_group_name)
    for sub_dir in ["layers", "obsm", "obsp", "raw", "uns"]:
        sub_path = os.path.join(table_path, sub_dir)
        if not os.path.exists(sub_path):
            continue
        children = [f.name for f in os.scandir(sub_path) if f.is_dir()]
        sub_group = table_group[sub_dir]
        sub_group.attrs[sub_dir] = children

E.g. obsm/.zattrs now looks like this:

{
    "encoding-type": "dict",
    "encoding-version": "0.1.0",
    "obsm": [
        "X_scanorama",
        "X_umap",
        "spatial"
    ]
}

This approach feels a bit hacky, and would need to be recursive in some cases (e.g. uns).
I don't have a strong preference for this compared with consolidate_metadata().
These sets of metadata aren't the core matrix data for the ann-data table. They're not even displayed in napari (in the screenshot) so it's less critical than listing the tables above.

If a client relies on consolidate_metadata() and it's not available, then they simply wouldn't display this metadata.
A client wouldn't need to always try with and without consolidated metadata.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick side note on the viewer, the new napari-spatialdata, that we haven't involved in the discussion since atm it needs some bugfix and its scope is beyond displaying annotated tables (it experiments with points and polygons as well), shows by default all those entries (obsm, obs, etc), with the exception of uns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a client relies on consolidate_metadata() and it's not available, then they simply wouldn't display this metadata.

Having data disappear because of something that did or did not get called in the python code and is otherwise not recorded doesn't feel great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshmoore So you prefer the # write group names into .zattrs approach above (like a manual consolidate metadata)? Or another alternative?

The presence or absence of everything in the viewers is always depends on whether the creating code did or did not add something. That's the choice that the code has when the spec says SHOULD or MAY.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other things being equal, I prefer that all details & members of a fileset are either well-known beforehand (i.e. static in the spec) or can be determined from the metadata.

| | # Each obsm array MUST have the same number of rows as X.
| | # The rows in each obsm array SHOULD be chunked the same as the rows in X.
kevinyamauchi marked this conversation as resolved.
Show resolved Hide resolved
| ├── .zarray
| |
| ├── 0.0
│ │ ...
│ └── n.m
|
├── varm # You MAY add a varm group comtainer. The varm group contains arrays that annotate the columns in X.
| │ # The rows in each array MUST be index-matched to the columns in X (if present).
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| │
│ └── varm_0 # You MAY add a zarr array for each varm matrix.
| | # Each varm array MUST have the same number of rows as columns in X.
| | # The rows in each obsm array SHOULD be chunked the same as the columns in X.
| ├── .zarray
| ├── 0.0
│ │ ...
│ └── n.m
|
├── obsp # You MAY add a obsp group comtainer. The obsp group contains sparse arrays that annotate the rows in X.
| │ # The rows in each array MUST be index-matched to the columns in X (if present).
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| │
│ └── obsp_0 # You MAY add a zarr group for each obsp array.
| | # Each obsp array MUST have the same number of rows as rows in X.
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` or `"csc_matrix"` for compressed sparse row and compressed sparse column, respectively.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| | # `.zattrs` MUST contain `"shape"` which is an array giving the shape of the densified array.
| |
| ├── data # You MUST add a one-dimensional zarr array named "data".
| | | # `data` MAY be chunked as the user desires.
| | ├── .zarray
| | |
| | ├── 0
│ │ | ...
│ | └── n
| |
| ├── indices # You MUST add a one-dimensional zarr array named "indices".
| | | # `indices` MAY be chunked as the user desires.
| | ├── .zarray # `indices` MUST be an `int` dtype.
| | |
| | ├── 0
│ │ | ...
│ | └── n
| |
| └── indptr # You MUST add a one-dimensional zarr array named "indptr".
| | # `indptr` MAY be chunked as the user desires.
| ├── .zarray # `indptr` MUST be an `int` dtype.
| |
| ├── 0
│ | ...
│ └── n
|
├── varp # You MAY add a varp group comtainer. The varp group contains sparse arrays that annotate the columns in X.
| │ # The rows in each array MUST be index-matched to the columns in X (if present).
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| │
│ └── varp_0 # You MAY add a zarr group for each varp array.
| | # Each varp array MUST have the same number of rows as columns in X.
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` or `"csc_matrix"` for compressed sparse row and compressed sparse column, respectively.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| | # `.zattrs` MUST contain `"shape"` which is an array giving the shape of the densified array.
| |
| ├── data # You MUST add a one-dimensional zarr array named "data".
| | | # `data` MAY be chunked as the user desires.
| | ├── .zarray
| | |
| | ├── 0
│ │ | ...
│ | └── n
| |
| ├── indices # You MUST add a one-dimensional zarr array named "indices".
| | | # `indices` MAY be chunked as the user desires.
| | ├── .zarray # `indices` MUST be an `int` dtype.
| | |
| | ├── 0
│ │ | ...
│ | └── n
| |
| └── indptr # You MUST add a one-dimensional zarr array named "indptr".
| | # `indptr` MAY be chunked as the user desires.
| ├── .zarray # `indptr` MUST be an `int` dtype.
| |
| ├── 0
│ | ...
│ └── n
|
└── uns # You MAY add a uns containter to store unstructured data.
kevinyamauchi marked this conversation as resolved.
Show resolved Hide resolved
|
├── .zgroup
|
├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
│ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
├── group # You MAY add zarr groups.
| | # `uns` groups MAY contain groups, dataframes, dense arrays, and sparse arrays.
| |
| ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| ...
|
├── dataframe_0 # You MAY add dataframe group containers.
| | # dataframes MAY be in the `uns` group or in a subgroup.
| │
| ├── .zgroup
| │
| ├── .zattrs # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example at https://haniffa.cog.sanger.ac.uk/fetal-immune/fetal-liver/visium/0.0.1/visium_1_anndata.zarr/var/.zattrs the "_index" is "SYMBOL".

{
    "_index": "SYMBOL",
    "column-order": [
        "ENSEMBL",
        "feature_types",
        "genome",
        "mt"
    ],
    "encoding-type": "dataframe",
    "encoding-version": "0.2.0"
}

But I don't see that this refers to anything under
https://haniffa.cog.sanger.ac.uk/fetal-immune/fetal-liver/visium/0.0.1/visium_1_anndata.zarr/obs/.zattrs

{
    "_index": "_index",
    "column-order": [
        "in_tissue",
        "array_row",
        "array_col",
        "sample",
        "n_genes_by_counts",
        "log1p_n_genes_by_counts",
        "total_counts",
        "log1p_total_counts",
        "pct_counts_in_top_50_genes",
        "pct_counts_in_top_100_genes",
        "pct_counts_in_top_200_genes",
        "pct_counts_in_top_500_genes",
        "mt_frac",
        "img_id",
        "EXP_id",
        "Organ",
        "Fetal_id",
        "SN",
        "Visium_Area_id",
        "Age_PCW",
        "Digestion time",
        "paths",
        "sample_id",
        "_scvi_batch",
        "_scvi_labels",
        "_indices",
        "total_cell_abundance",
        "label_id"
    ],
    "encoding-type": "dataframe",
    "encoding-version": "0.2.0"

Also, this line is a duplicate of the same line under 'obs' above (line 238). Is that correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably a typo

- # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index
+ # `.zattrs` MUST contain `"_index"`, which is the name of the column in var to be used as the index

In the example, the data for the var index column name specified in _index of var/.zattrs column is located under var/{_index}/ (see https://haniffa.cog.sanger.ac.uk/fetal-immune/fetal-liver/visium/0.0.1/visium_1_anndata.zarr/var/SYMBOL/.zattrs).

https://github.com/vitessce/vitessce/blob/10f6f3b/packages/file-types/zarr/src/AnnDataSource.js#L247

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great - thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this line still needs to be fixed to ...the column in var to...

| │ # `.zattrs` MUST contain `"column-order"`, which is a list of the order of the non-_index columns.
| │ # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dataframe"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData.
| │
| └── col_0 # Each column in the obs table is a 1D zarr array.
| ├── .zarray # Each columns MUST be chunked the same, but the chunking may be chosen by the user.
| │
| └─ 0
|
├── dense_array # You MAY dense arrays as n n-dimensional zarr arrays.
| │ # `dense_array` MUST not be a complex type (i.e., MUST be a single type)
| │ # `dense_array` MAY be chunked as the user desires.
| | # `dense array` MAY be in the `uns` group or in a subgroup.
| |
| ├── .zarray
| ├── 0.0
| │ ...
| └── n.m
|
└── sparse_array # You MAY add sparse arrays as a zarr group for each sparse array.
| # sparse arrays MAY be in the `uns` group or in a subgroup.
|
├── .zgroup
|
├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` or `"csc_matrix"` for compressed sparse row and compressed sparse column, respectively.
│ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| # `.zattrs` MUST contain `"shape"` which is an array giving the shape of the densified array.
|
├── data # You MUST add a one-dimensional zarr array named "data".
| | # `data` MAY be chunked as the user desires.
| ├── .zarray
| |
| ├── 0
│ | ...
| └── n
|
├── indices # You MUST add a one-dimensional zarr array named "indices".
| | # `indices` MAY be chunked as the user desires.
| ├── .zarray # `indices` MUST be an `int` dtype.
| |
| ├── 0
│ | ...
| └── n
|
└── indptr # You MUST add a one-dimensional zarr array named "indptr".
| # `indptr` MAY be chunked as the user desires.
├── .zarray # `indptr` MUST be an `int` dtype.
|
├── 0
| ...
└── n


</pre>

High-content screening {#hcs-layout}
------------------------------------
Expand Down