-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table spec proposal #64
Changes from all commits
4559ce5
c9148a7
d3330a2
957555f
f8f2fd0
83c9491
8d2d64e
d715323
98be5fa
bc72166
f4a13c6
ffdd046
0d5636e
1d59ca4
d91da64
4ad6a90
fac5855
e3f779c
75fb5e1
00e85c1
5d0b5b4
fe2cfad
692464f
cc83a82
b2de201
283d21b
bf81797
4b46882
f3e960b
ad48296
55fb1a8
337971b
e9f707b
22844a4
ea8a622
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -175,7 +175,309 @@ For this example we assume an image with 5 dimensions and axes called `t,c,z,y,x | |
└── n | ||
</pre> | ||
|
||
Tables {#table-layout} | ||
---------------------- | ||
The following describes the expected layout for tabular data. | ||
OME-NGFF tables are compatible with the [AnnData model](https://github.com/scverse/anndata). | ||
|
||
<pre> | ||
. # Root folder, potentially in S3, | ||
│ # with a flat list of images. | ||
│ | ||
└── 123.zarr | ||
| | ||
├── .zgroup | ||
| | ||
├── .zattrs | ||
| | ||
└── tables # The tables group is a container which holds one or multiple tables that are compatible with AnnData. | ||
| | ||
│ # The tables group MAY be in the root of the zarr file. | ||
├── .zgroup # The tables group MAY be in root or in another group. | ||
| | ||
├── .zattrs # `.zattrs` MUST contain "tables", which lists the keys of the subgroups that are tables. In this case, the only table is "my_table". | ||
# hence `.zattrs` should be equal to `{ "tables": [ "my_table" ] }`. | ||
| | ||
└── my_table | ||
│ # The table group MAY be in the root of the zarr file. | ||
├── .zgroup # The table group MAY be in root or in another group. | ||
| | ||
├── .zattrs # `.zattrs` MUST contain "type", which is set to `"ngff:region_table"` | ||
| # `.zattrs` MUST contain "region", which is the path to the data the table is annotating. | ||
| # "region" MUST be a single path (single region) or an array of paths (multiple regions). | ||
| # "region" paths MUST be objects with a key "path" and the path value MUST be a string. | ||
| # `.zattrs` MUST contain "region_key" if "region" is an array. "region_key" is the key in `obs` denoting which region a given row corresponds to. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All these |
||
| # `.zattrs` MAY contain "instance_key", which is the key in `obs` that denotes which instance in "region" the row corresponds to. If "instance_key" is not provided, the values from the `obs` `.zattrs` "_index" key is used. | ||
│ | ||
├── X # You MAY add an zarr array `X`. | ||
│ │ # `X` MUST not be a complex type (i.e., MUST be a single type) | ||
│ │ # `X` MAY be chunked as the user desires. | ||
│ ├── .zarray | ||
│ ├── 0.0 | ||
│ │ ... | ||
│ └── n.m | ||
| | ||
├── layers # You MAY add a `layers` group, which contains dense matrices with the same shape as X. | ||
│ │ | ||
│ ├── .zgroup | ||
│ ├── .zattrs # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing a `layer`. | ||
│ │ | ||
│ └── layer_0 # You MAY add a zarr array for each layer | ||
| | # Each layer array MUST have the same shape as X | ||
| | # Each layer array SHOULD be chunked the same as X | ||
| ├── .zarray | ||
| | | ||
| ├── 0.0 | ||
│ │ ... | ||
│ └── n.m | ||
│ | ||
├── obs # You MUST add an obs group container. The obs group holds a table of annotations on the rows in X. | ||
│ │ # The rows in obs MUST be index-matched to the rows in X. | ||
│ ├── .zgroup | ||
│ │ | ||
│ ├── .zattrs # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index. | ||
│ │ # `.zattrs` MUST contain `"column-order"`, which is a list of the order of the non-_index columns. | ||
│ │ # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dataframe"` by AnnData. | ||
│ │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData. | ||
│ │ | ||
│ └── col_0 # Each column in the obs table is a 1D zarr array. The rows can be chunked as the user desires. | ||
│ ├── .zarray # However, the obs columns SHOULD be chunked in the same way as the rows in X (if present). | ||
│ │ | ||
│ └─ 0 | ||
├── var # You MAY add a var group container. The var group holds a table of annotations on the columns in X. | ||
| │ # The rows in var MUST be index-matched to the columns in X (if present). | ||
| | | ||
| ├── .zattrs # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the example at https://haniffa.cog.sanger.ac.uk/fetal-immune/fetal-liver/visium/0.0.1/visium_1_anndata.zarr/var/.zattrs the "_index" is "SYMBOL".
But I don't see that this refers to anything under
Also, this line is a duplicate of the same line under 'obs' above (line 238). Is that correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is probably a typo - # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index
+ # `.zattrs` MUST contain `"_index"`, which is the name of the column in var to be used as the index In the example, the data for the var index column name specified in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, great - thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this line still needs to be fixed to |
||
| │ # `.zattrs` MUST contain `"column-order"`, which is a list of the order of the non-_index columns. | ||
| │ # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dataframe"` by AnnData. | ||
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData. | ||
| │ | ||
| ├── array_col # Columns in the var table MAY be a 1D zarr array. The rows can be chunked as the user desires. | ||
| | ├── .zarray # However, the var columns SHOULD be chunked in the same way as the columns in X. | ||
| | │ | ||
| | └─ 0 | ||
| | | ||
| └── cat_col # Columns in the var table MAY be categorical | ||
| ├── .zattrs. # `.zattrs` MUST contain `"encoding-type"`, which is set to `"categorical"` by AnnData. | ||
| | # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData. | ||
| | | ||
| ├── categories | ||
| | ├── .zarray # categories MUST be a 1D zarr array. The rows can be chunked as the user desires. | ||
| | | | ||
| | └─ 0 | ||
| ├── codes | ||
| | ├── .zarray # codes MUST be a 1D zarr array. The rows can be chunked as the user desires. | ||
| | | | ||
| | └─ 0 | ||
| | | ||
| ├── null_col # Columns in the var table MAY nullable integer | ||
| ├── .zattrs. # `.zattrs` MUST contain `"encoding-type"`, which is set to `"nullable-integer"` by AnnData. | ||
| | # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
| | | ||
| ├── mask | ||
| | ├── .zarray # categories MUST be a 1D zarr array. The rows can be chunked as the user desires. | ||
| | | | ||
| | └─ 0 | ||
| └── values | ||
| ├── .zarray # codes MUST be a 1D zarr array. The rows can be chunked as the user desires. | ||
| | | ||
| └─ 0 | ||
| | ||
├── obsm # You MAY add a obsm group comtainer. The obsm group contains arrays that annotate the rows in X. | ||
| │ # The rows in each array MUST be index-matched to the rows in X (if present). | ||
| | | ||
│ ├── .zgroup | ||
| | | ||
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData. | ||
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
| | # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing `obsm` arrays. | ||
| │ | ||
│ └── obsm_0 # You MAY add a zarr array for each obsm matrix. | ||
| | # Each obsm array MUST have the same number of rows as X. | ||
| | # The rows in each obsm array SHOULD be chunked the same as the rows in X. | ||
| ├── .zarray | ||
| | | ||
| ├── 0.0 | ||
│ │ ... | ||
│ └── n.m | ||
| | ||
├── varm # You MAY add a varm group comtainer. The varm group contains arrays that annotate the columns in X. | ||
| │ # The rows in each array MUST be index-matched to the columns in X (if present). | ||
| | | ||
│ ├── .zgroup | ||
| | | ||
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData. | ||
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
| | # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing `varm` arrays. | ||
| │ | ||
│ └── varm_0 # You MAY add a zarr array for each varm matrix. | ||
| | # Each varm array MUST have the same number of rows as columns in X. | ||
| | # The rows in each obsm array SHOULD be chunked the same as the columns in X. | ||
| ├── .zarray | ||
| ├── 0.0 | ||
│ │ ... | ||
│ └── n.m | ||
| | ||
├── obsp # You MAY add a obsp group comtainer. The obsp group contains sparse arrays that annotate the rows in X. | ||
| │ # The rows in each array MUST be index-matched to the columns in X (if present). | ||
| | | ||
│ ├── .zgroup | ||
| | | ||
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData. | ||
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
kevinyamauchi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| | # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing `obsp` arrays. | ||
| │ | ||
│ └── obsp_0 # You MAY add a zarr group for each obsp array. | ||
| | # Each obsp array MUST have the same number of rows as rows in X. | ||
| | | ||
│ ├── .zgroup | ||
| | | ||
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` or `"csc_matrix"` for compressed sparse row and compressed sparse column, respectively. | ||
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
| | # `.zattrs` MUST contain `"shape"` which is an array giving the shape of the densified array. | ||
| | | ||
| ├── data # You MUST add a one-dimensional zarr array named "data". | ||
| | | # `data` MAY be chunked as the user desires. | ||
| | ├── .zarray | ||
| | | | ||
| | ├── 0 | ||
│ │ | ... | ||
│ | └── n | ||
| | | ||
| ├── indices # You MUST add a one-dimensional zarr array named "indices". | ||
| | | # `indices` MAY be chunked as the user desires. | ||
| | ├── .zarray # `indices` MUST be an `int` dtype. | ||
| | | | ||
| | ├── 0 | ||
│ │ | ... | ||
│ | └── n | ||
| | | ||
| └── indptr # You MUST add a one-dimensional zarr array named "indptr". | ||
| | # `indptr` MAY be chunked as the user desires. | ||
| ├── .zarray # `indptr` MUST be an `int` dtype. | ||
| | | ||
| ├── 0 | ||
│ | ... | ||
│ └── n | ||
| | ||
├── varp # You MAY add a varp group comtainer. The varp group contains sparse arrays that annotate the columns in X. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. comtainer |
||
| │ # The rows in each array MUST be index-matched to the columns in X (if present). | ||
| | | ||
│ ├── .zgroup | ||
| | | ||
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData. | ||
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
| | # `.zattrs` MUST contain `"keys"`, which is an array of the names of the subgroups containing `varp` arrays. | ||
| │ | ||
│ └── varp_0 # You MAY add a zarr group for each varp array. | ||
| | # Each varp array MUST have the same number of rows as columns in X. | ||
| | | ||
│ ├── .zgroup | ||
| | | ||
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` or `"csc_matrix"` for compressed sparse row and compressed sparse column, respectively. | ||
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
| | # `.zattrs` MUST contain `"shape"` which is an array giving the shape of the densified array. | ||
| | | ||
| ├── data # You MUST add a one-dimensional zarr array named "data". | ||
| | | # `data` MAY be chunked as the user desires. | ||
| | ├── .zarray | ||
| | | | ||
| | ├── 0 | ||
│ │ | ... | ||
│ | └── n | ||
| | | ||
| ├── indices # You MUST add a one-dimensional zarr array named "indices". | ||
| | | # `indices` MAY be chunked as the user desires. | ||
| | ├── .zarray # `indices` MUST be an `int` dtype. | ||
| | | | ||
| | ├── 0 | ||
│ │ | ... | ||
│ | └── n | ||
| | | ||
| └── indptr # You MUST add a one-dimensional zarr array named "indptr". | ||
| | # `indptr` MAY be chunked as the user desires. | ||
| ├── .zarray # `indptr` MUST be an `int` dtype. | ||
| | | ||
| ├── 0 | ||
│ | ... | ||
│ └── n | ||
| | ||
└── uns # You MAY add a uns containter to store unstructured data. | ||
| | ||
├── .zgroup | ||
| | ||
├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dict"` by AnnData. | ||
│ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
│ | ||
├── group # You MAY add zarr groups. | ||
| | # `uns` groups MAY contain groups, dataframes, dense arrays, and sparse arrays. | ||
| | | ||
| ├── .zgroup | ||
| | | ||
| ├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` by AnnData. | ||
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
| ... | ||
| | ||
├── dataframe_0 # You MAY add dataframe group containers. | ||
| | # dataframes MAY be in the `uns` group or in a subgroup. | ||
| │ | ||
| ├── .zgroup | ||
| │ | ||
| ├── .zattrs # `.zattrs` MUST contain `"_index"`, which is the name of the column in obs to be used as the index. | ||
| │ # `.zattrs` MUST contain `"column-order"`, which is a list of the order of the non-_index columns. | ||
| │ # `.zattrs` MUST contain `"encoding-type"`, which is set to `"dataframe"` by AnnData. | ||
| │ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.2.0"` by AnnData. | ||
| │ | ||
| └── col_0 # Each column in the obs table is a 1D zarr array. | ||
| ├── .zarray # Each columns MUST be chunked the same, but the chunking may be chosen by the user. | ||
| │ | ||
| └─ 0 | ||
| | ||
├── dense_array # You MAY dense arrays as n n-dimensional zarr arrays. | ||
| │ # `dense_array` MUST not be a complex type (i.e., MUST be a single type) | ||
| │ # `dense_array` MAY be chunked as the user desires. | ||
| | # `dense array` MAY be in the `uns` group or in a subgroup. | ||
| | | ||
| ├── .zarray | ||
| ├── 0.0 | ||
| │ ... | ||
| └── n.m | ||
| | ||
└── sparse_array # You MAY add sparse arrays as a zarr group for each sparse array. | ||
| # sparse arrays MAY be in the `uns` group or in a subgroup. | ||
| | ||
├── .zgroup | ||
| | ||
├── .zattrs # `.zattrs` MUST contain `"encoding-type"`, which is set to `"csr_matrix"` or `"csc_matrix"` for compressed sparse row and compressed sparse column, respectively. | ||
│ # `.zattrs` MUST contain `"encoding-version"`, which is set to `"0.1.0"` by AnnData. | ||
| # `.zattrs` MUST contain `"shape"` which is an array giving the shape of the densified array. | ||
| | ||
├── data # You MUST add a one-dimensional zarr array named "data". | ||
| | # `data` MAY be chunked as the user desires. | ||
| ├── .zarray | ||
| | | ||
| ├── 0 | ||
│ | ... | ||
| └── n | ||
| | ||
├── indices # You MUST add a one-dimensional zarr array named "indices". | ||
| | # `indices` MAY be chunked as the user desires. | ||
| ├── .zarray # `indices` MUST be an `int` dtype. | ||
| | | ||
| ├── 0 | ||
│ | ... | ||
| └── n | ||
| | ||
└── indptr # You MUST add a one-dimensional zarr array named "indptr". | ||
| # `indptr` MAY be chunked as the user desires. | ||
├── .zarray # `indptr` MUST be an `int` dtype. | ||
| | ||
├── 0 | ||
| ... | ||
└── n | ||
|
||
|
||
</pre> | ||
|
||
High-content screening {#hcs-layout} | ||
------------------------------------ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that I have this as
ngff:regions_table
in this exampleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which are you suggesting needs fixing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was due to a typo in our code, we should be careful to use
region
andregion_key
instead ofregions
andregions_key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinyamauchi I just noticed that I had forgotten to submit my review months ago (it was one of my first review with the GitHub interface and I have must forgotten to submit, sorry about that). I did it now and one of the comments was actually on
region
vsregions
. I am now fine with both spellings.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to use https://github.com/kevinyamauchi/ome-ngff-tables-prototype/blob/0b7e59c58caf07e5f4e37756b396afe3e05e48e9/src/ngff_tables_prototype/reader.py#L307 to read a table, and this expects
"@type": "ngff:points_table"
.So there is a difference between
@type
andtype
but also the spec says it MUST bengff:region_table
. So is this simply too strict and we support various types here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to enumerate the "type" options that we support here?
E.g.
regions_table
,points_table
or some other table that doesn't have any documented structure?