Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table spec proposal #64

Closed
wants to merge 35 commits into from
Closed
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4559ce5
add tables spec draft
kevinyamauchi Oct 1, 2021
c9148a7
X must be a single dtype
kevinyamauchi Oct 1, 2021
d3330a2
add points spec draft
kevinyamauchi Oct 1, 2021
957555f
add spec for axis names
kevinyamauchi Oct 20, 2021
f8f2fd0
remove points spec
kevinyamauchi Jun 19, 2022
83c9491
Merge branch 'main' into add-tables-points
kevinyamauchi Jun 19, 2022
8d2d64e
add layers
kevinyamauchi Jun 19, 2022
d715323
add varm/obsm
kevinyamauchi Jun 19, 2022
98be5fa
add obsp/varp
kevinyamauchi Jun 19, 2022
bc72166
add missing .zarray
kevinyamauchi Jun 19, 2022
f4a13c6
add @type
kevinyamauchi Jun 19, 2022
ffdd046
Apply suggestions from code review
kevinyamauchi Jun 20, 2022
0d5636e
update uns
kevinyamauchi Jun 20, 2022
1d59ca4
add parent
kevinyamauchi Jun 20, 2022
d91da64
MAY -> MUST for anndata encoding
kevinyamauchi Jun 20, 2022
4ad6a90
indices, indptr dtype
kevinyamauchi Jun 20, 2022
fac5855
csc matrix
kevinyamauchi Jun 20, 2022
e3f779c
update table metadata
kevinyamauchi Jun 20, 2022
75fb5e1
Update index.bs
kevinyamauchi Jun 20, 2022
00e85c1
Update latest/index.bs
kevinyamauchi Jun 20, 2022
5d0b5b4
Update latest/index.bs
kevinyamauchi Jun 20, 2022
fe2cfad
update var table
kevinyamauchi Jun 29, 2022
692464f
update path spec
kevinyamauchi Oct 8, 2022
cc83a82
typo
kevinyamauchi Oct 8, 2022
b2de201
Apply suggestions from @ivirshup
kevinyamauchi Oct 28, 2022
283d21b
clarify allowed locations
kevinyamauchi Oct 28, 2022
bf81797
remove intensity image metadata
kevinyamauchi Oct 28, 2022
4b46882
remove reference to image id
kevinyamauchi Nov 15, 2022
f3e960b
add tables metadata
kevinyamauchi Jan 8, 2023
ad48296
add list of keys for subgroups
kevinyamauchi Jan 8, 2023
55fb1a8
add tables
giovp Feb 22, 2023
337971b
rename table
giovp Feb 22, 2023
e9f707b
add my_table consistently
giovp Feb 22, 2023
22844a4
remove comment
giovp Feb 23, 2023
ea8a622
Merge pull request #1 from giovp/add-tables-points
kevinyamauchi Feb 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
302 changes: 302 additions & 0 deletions latest/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,309 @@ For this example we assume an image with 5 dimensions and axes called `t,c,z,y,x
└── n
</pre>

Tables {#table-layout}
----------------------
The following describes the expected layout for tabular data.
OME-NGFF tables are compatible with the [AnnData model](https://github.com/scverse/anndata).

<pre>
. # Root folder, potentially in S3,
│ # with a flat list of images.
└── 123.zarr
|
├── .zgroup
|
├── .zattrs
|
└── tables # The tables group is a container which holds one or multiple tables that are compatible with AnnData.
|
│ # The tables group MAY be in the root of the zarr file.
├── .zgroup # The tables group MAY be in root or in another group.
|
├── .zattrs # `.zattrs` MUST contain "tables", which lists the keys of the subgroups that are tables. In this case, the only table is "my_table".
# hence `.zattrs` should be equal to `{ "tables": [ "my_table" ] }`.
|
└── my_table
│ # The table group MAY be in the root of the zarr file.
├── .zgroup # The table group MAY be in root or in another group.
|
├── .zattrs # `.zattrs` MUST contain "type", which is set to `"ngff:region_table"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that I have this as ngff:regions_table in this example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which are you suggesting needs fixing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was due to a typo in our code, we should be careful to use region and region_key instead of regions and regions_key

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinyamauchi I just noticed that I had forgotten to submit my review months ago (it was one of my first review with the GitHub interface and I have must forgotten to submit, sorry about that). I did it now and one of the comments was actually on region vs regions. I am now fine with both spellings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to use https://github.com/kevinyamauchi/ome-ngff-tables-prototype/blob/0b7e59c58caf07e5f4e37756b396afe3e05e48e9/src/ngff_tables_prototype/reader.py#L307 to read a table, and this expects "@type": "ngff:points_table".
So there is a difference between @type and type but also the spec says it MUST be ngff:region_table. So is this simply too strict and we support various types here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to enumerate the "type" options that we support here?
E.g. regions_table, points_table or some other table that doesn't have any documented structure?

| # `.zattrs` MUST contain "region", which is the path to the data the table is annotating.
| # "region" MUST be a single path (single region) or an array of paths (multiple regions).
| # "region" paths MUST be objects with a key "path" and the path value MUST be a string.
| # `.zattrs` MUST contain "region_key" if "region" is an array. "region_key" is the key in `obs` denoting which region a given row corresponds to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these MUST rules apply to regions table, but don't apply to the use-case of storing points/tracking data in tables as in ome/napari-ome-zarr#81
Presumably we do want to allow that use case (and others)? So these rules could be MAY instead of MUST?

| # `.zattrs` MAY contain "instance_key", which is the key in `obs` that denotes which instance in "region" the row corresponds to. If "instance_key" is not provided, the values from the `obs` `.zattrs` "_index" key is used.
├── X # You MAY add an zarr array `X`.
│ │ # `X` MUST not be a complex type (i.e., MUST be a single type)
│ │ # `X` MAY be chunked as the user desires.
│ ├── .zarray
│ ├── 0.0
│ │ ...
│ └── n.m
|
├── layers # You MAY add a `layers` group, which contains dense matrices with the same shape as X.
│ │
│ ├── .zgroup
│ ├── .zattrs # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing a `layer`.
│ │
│ └── layer_0 # You MAY add a zarr array for each layer
| | # Each layer array MUST have the same shape as X
| | # Each layer array SHOULD be chunked the same as X
| ├── .zarray
| |
| ├── 0.0
│ │ ...
│ └── n.m
├── obs # You MUST add an obs group container. The obs group holds a table of annotations on the rows in X.
│ │ # The rows in obs MUST be index-matched to the rows in X.
│ ├── .zgroup
│ │
│ ├── .zattrs # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index.
│ │ # `.zattrs` MUST contain `"column-order"`, which is a list of the order of the non-_index columns.
│ │ # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dataframe"` by AnnData.
│ │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData.
│ │
│ └── col_0 # Each column in the obs table is a 1D zarr array. The rows can be chunked as the user desires.
│ ├── .zarray # However, the obs columns SHOULD be chunked in the same way as the rows in X (if present).
│ │
│ └─ 0
├── var # You MAY add a var group container. The var group holds a table of annotations on the columns in X.
| │ # The rows in var MUST be index-matched to the columns in X (if present).
| |
| ├── .zattrs # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example at https://haniffa.cog.sanger.ac.uk/fetal-immune/fetal-liver/visium/0.0.1/visium_1_anndata.zarr/var/.zattrs the "_index" is "SYMBOL".

{
    "_index": "SYMBOL",
    "column-order": [
        "ENSEMBL",
        "feature_types",
        "genome",
        "mt"
    ],
    "encoding-type": "dataframe",
    "encoding-version": "0.2.0"
}

But I don't see that this refers to anything under
https://haniffa.cog.sanger.ac.uk/fetal-immune/fetal-liver/visium/0.0.1/visium_1_anndata.zarr/obs/.zattrs

{
    "_index": "_index",
    "column-order": [
        "in_tissue",
        "array_row",
        "array_col",
        "sample",
        "n_genes_by_counts",
        "log1p_n_genes_by_counts",
        "total_counts",
        "log1p_total_counts",
        "pct_counts_in_top_50_genes",
        "pct_counts_in_top_100_genes",
        "pct_counts_in_top_200_genes",
        "pct_counts_in_top_500_genes",
        "mt_frac",
        "img_id",
        "EXP_id",
        "Organ",
        "Fetal_id",
        "SN",
        "Visium_Area_id",
        "Age_PCW",
        "Digestion time",
        "paths",
        "sample_id",
        "_scvi_batch",
        "_scvi_labels",
        "_indices",
        "total_cell_abundance",
        "label_id"
    ],
    "encoding-type": "dataframe",
    "encoding-version": "0.2.0"

Also, this line is a duplicate of the same line under 'obs' above (line 238). Is that correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably a typo

- # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index
+ # `.zattrs` MUST contain `"_index"`, which is the name of the column in var to be used as the index

In the example, the data for the var index column name specified in _index of var/.zattrs column is located under var/{_index}/ (see https://haniffa.cog.sanger.ac.uk/fetal-immune/fetal-liver/visium/0.0.1/visium_1_anndata.zarr/var/SYMBOL/.zattrs).

https://github.com/vitessce/vitessce/blob/10f6f3b/packages/file-types/zarr/src/AnnDataSource.js#L247

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great - thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this line still needs to be fixed to ...the column in var to...

| │ # `.zattrs` MUST contain `"column-order"`, which is a list of the order of the non-_index columns.
| │ # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dataframe"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData.
| │
| ├── array_col # Columns in the var table MAY be a 1D zarr array. The rows can be chunked as the user desires.
| | ├── .zarray # However, the var columns SHOULD be chunked in the same way as the columns in X.
| | │
| | └─ 0
| |
| └── cat_col # Columns in the var table MAY be categorical
| ├── .zattrs. # `.zattrs` MUST contain `"encoding-type"`, which is set to `"categorical"` by AnnData.
| | # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData.
| |
| ├── categories
| | ├── .zarray # categories MUST be a 1D zarr array. The rows can be chunked as the user desires.
| | |
| | └─ 0
| ├── codes
| | ├── .zarray # codes MUST be a 1D zarr array. The rows can be chunked as the user desires.
| | |
| | └─ 0
| |
| ├── null_col # Columns in the var table MAY nullable integer
| ├── .zattrs. # `.zattrs` MUST contain `"encoding-type"`, which is set to `"nullable-integer"` by AnnData.
| | # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| |
| ├── mask
| | ├── .zarray # categories MUST be a 1D zarr array. The rows can be chunked as the user desires.
| | |
| | └─ 0
| └── values
| ├── .zarray # codes MUST be a 1D zarr array. The rows can be chunked as the user desires.
| |
| └─ 0
|
├── obsm # You MAY add a obsm group comtainer. The obsm group contains arrays that annotate the rows in X.
| │ # The rows in each array MUST be index-matched to the rows in X (if present).
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| | # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing `obsm` arrays.
| │
│ └── obsm_0 # You MAY add a zarr array for each obsm matrix.
| | # Each obsm array MUST have the same number of rows as X.
| | # The rows in each obsm array SHOULD be chunked the same as the rows in X.
| ├── .zarray
| |
| ├── 0.0
│ │ ...
│ └── n.m
|
├── varm # You MAY add a varm group comtainer. The varm group contains arrays that annotate the columns in X.
| │ # The rows in each array MUST be index-matched to the columns in X (if present).
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| | # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing `varm` arrays.
| │
│ └── varm_0 # You MAY add a zarr array for each varm matrix.
| | # Each varm array MUST have the same number of rows as columns in X.
| | # The rows in each obsm array SHOULD be chunked the same as the columns in X.
| ├── .zarray
| ├── 0.0
│ │ ...
│ └── n.m
|
├── obsp # You MAY add a obsp group comtainer. The obsp group contains sparse arrays that annotate the rows in X.
| │ # The rows in each array MUST be index-matched to the columns in X (if present).
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
kevinyamauchi marked this conversation as resolved.
Show resolved Hide resolved
| | # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing `obsp` arrays.
| │
│ └── obsp_0 # You MAY add a zarr group for each obsp array.
| | # Each obsp array MUST have the same number of rows as rows in X.
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` or `"csc_matrix"` for compressed sparse row and compressed sparse column, respectively.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| | # `.zattrs` MUST contain `"shape"` which is an array giving the shape of the densified array.
| |
| ├── data # You MUST add a one-dimensional zarr array named "data".
| | | # `data` MAY be chunked as the user desires.
| | ├── .zarray
| | |
| | ├── 0
│ │ | ...
│ | └── n
| |
| ├── indices # You MUST add a one-dimensional zarr array named "indices".
| | | # `indices` MAY be chunked as the user desires.
| | ├── .zarray # `indices` MUST be an `int` dtype.
| | |
| | ├── 0
│ │ | ...
│ | └── n
| |
| └── indptr # You MUST add a one-dimensional zarr array named "indptr".
| | # `indptr` MAY be chunked as the user desires.
| ├── .zarray # `indptr` MUST be an `int` dtype.
| |
| ├── 0
│ | ...
│ └── n
|
├── varp # You MAY add a varp group comtainer. The varp group contains sparse arrays that annotate the columns in X.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comtainer

| │ # The rows in each array MUST be index-matched to the columns in X (if present).
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| | # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing `varp` arrays.
| │
│ └── varp_0 # You MAY add a zarr group for each varp array.
| | # Each varp array MUST have the same number of rows as columns in X.
| |
│ ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` or `"csc_matrix"` for compressed sparse row and compressed sparse column, respectively.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| | # `.zattrs` MUST contain `"shape"` which is an array giving the shape of the densified array.
| |
| ├── data # You MUST add a one-dimensional zarr array named "data".
| | | # `data` MAY be chunked as the user desires.
| | ├── .zarray
| | |
| | ├── 0
│ │ | ...
│ | └── n
| |
| ├── indices # You MUST add a one-dimensional zarr array named "indices".
| | | # `indices` MAY be chunked as the user desires.
| | ├── .zarray # `indices` MUST be an `int` dtype.
| | |
| | ├── 0
│ │ | ...
│ | └── n
| |
| └── indptr # You MUST add a one-dimensional zarr array named "indptr".
| | # `indptr` MAY be chunked as the user desires.
| ├── .zarray # `indptr` MUST be an `int` dtype.
| |
| ├── 0
│ | ...
│ └── n
|
└── uns # You MAY add a uns containter to store unstructured data.
|
├── .zgroup
|
├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData.
│ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
├── group # You MAY add zarr groups.
| | # `uns` groups MAY contain groups, dataframes, dense arrays, and sparse arrays.
| |
| ├── .zgroup
| |
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| ...
|
├── dataframe_0 # You MAY add dataframe group containers.
| | # dataframes MAY be in the `uns` group or in a subgroup.
| │
| ├── .zgroup
| │
| ├── .zattrs # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index.
| │ # `.zattrs` MUST contain `"column-order"`, which is a list of the order of the non-_index columns.
| │ # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dataframe"` by AnnData.
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData.
| │
| └── col_0 # Each column in the obs table is a 1D zarr array.
| ├── .zarray # Each columns MUST be chunked the same, but the chunking may be chosen by the user.
| │
| └─ 0
|
├── dense_array # You MAY dense arrays as n n-dimensional zarr arrays.
| │ # `dense_array` MUST not be a complex type (i.e., MUST be a single type)
| │ # `dense_array` MAY be chunked as the user desires.
| | # `dense array` MAY be in the `uns` group or in a subgroup.
| |
| ├── .zarray
| ├── 0.0
| │ ...
| └── n.m
|
└── sparse_array # You MAY add sparse arrays as a zarr group for each sparse array.
| # sparse arrays MAY be in the `uns` group or in a subgroup.
|
├── .zgroup
|
├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` or `"csc_matrix"` for compressed sparse row and compressed sparse column, respectively.
│ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData.
| # `.zattrs` MUST contain `"shape"` which is an array giving the shape of the densified array.
|
├── data # You MUST add a one-dimensional zarr array named "data".
| | # `data` MAY be chunked as the user desires.
| ├── .zarray
| |
| ├── 0
│ | ...
| └── n
|
├── indices # You MUST add a one-dimensional zarr array named "indices".
| | # `indices` MAY be chunked as the user desires.
| ├── .zarray # `indices` MUST be an `int` dtype.
| |
| ├── 0
│ | ...
| └── n
|
└── indptr # You MUST add a one-dimensional zarr array named "indptr".
| # `indptr` MAY be chunked as the user desires.
├── .zarray # `indptr` MUST be an `int` dtype.
|
├── 0
| ...
└── n


</pre>

High-content screening {#hcs-layout}
------------------------------------
Expand Down