-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] BEP031 - New columns to participants.tsv file #816
Changes from 1 commit
13b4c83
5bb1bf8
8b41137
7146144
b451671
d08542f
10316d5
11a0ea1
4822ed0
cb5a524
393d3ce
2a3bc86
7190c6e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -162,16 +162,20 @@ participants.json | |
``` | ||
|
||
The purpose of this RECOMMENDED file is to describe properties of participants | ||
such as age, sex, handedness. | ||
such as age, sex, handedness, species, strain, strain_rrid, diagnosis. | ||
If this file exists, it MUST contain the column `participant_id`, | ||
which MUST consist of `sub-<label>` values identifying one row for each participant, | ||
followed by a list of optional columns describing participants. | ||
Each participant MUST be described by one and only one row. | ||
|
||
Commonly used *optional* columns in `participant.tsv` files are `age`, `sex`, | ||
and `handedness`. We RECOMMEND to make use of these columns, and | ||
in case that you do use them, we RECOMMEND to use the following values | ||
for them: | ||
When different from `homo sapiens`, `participants.tsv` SHOULD include a `species` | ||
column, and the value MUST be the string of the binomial species name from | ||
[NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi). | ||
|
||
Commonly used *optional* columns in `participants.tsv` files are `age`, `sex`, | ||
`handedness`, `strain`, `strain_rrid` and `diagnosis`. We RECOMMEND to make use | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could group be added to this list as its used in the example below? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question, I'm not sure we should because |
||
of these columns, and in case that you do use them, we RECOMMEND to use the | ||
following values for them: | ||
|
||
- `age`: numeric value in years (float or integer value) | ||
|
||
|
@@ -197,6 +201,15 @@ for them: | |
- for "ambidextrous", use one of these values: `ambidextrous`, `a`, `A`, | ||
`AMBIDEXTROUS`, `Ambidextrous` | ||
|
||
- `strain`: string value indicating the strain of the species | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Examples for each of these would be useful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to clarify, do you mean example directly in the description, like above for handedness with I did not change the example of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps a few examples from: https://www.jax.org/jax-mice-and-services/find-and-order-jax-mice/most-popular-jax-mice-strains There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Examples added in b451671. |
||
|
||
- `strain_rrid`: research resource identifier ([RRID](https://scicrunch.org/resources/Organisms/search)) | ||
of the strain of the species | ||
|
||
- `diagnosis`: string value describing the diagnosis of the participant. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i don't know if this has to be a string value. in many datasets on openneuro diagnosis/dx is present and can be an enumerated type. also, this is one place, where one can have multiple designations depending on the study. we should allow for some notion of that, or simply remove diagnosis from this file. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The aim of this PR is to add new columns to describe animal properties. In that context, I agree that diagnosis may be out of scope. For context, I added the columns following this discussion #779 (comment), #779 (comment) and #779 (comment) because we also introduced pathology in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following your suggestion, I removed the diagnosis column in 7146144 as being out of scope for this PR (animal properties). |
||
The diagnosis MAY instead be specified in [Sessions files](06-longitudinal-and-multi-site-studies.md#sessions-file) | ||
in case it changes over time. | ||
|
||
Throughout BIDS you can indicate missing values with `n/a` (for "not | ||
available"). | ||
|
||
|
@@ -213,9 +226,9 @@ It is RECOMMENDED to accompany each `participants.tsv` file with a sidecar | |
`participants.json` file to describe the TSV column names and properties of their values (see also | ||
the [section on tabular files](02-common-principles.md#tabular-files)). | ||
Such sidecar files are needed to interpret the data, especially so when | ||
optional columns are defined beyond `age`, `sex`, and `handedness`, such as | ||
`group` in this example, or when a different age unit is needed | ||
(for example, gestational weeks). | ||
optional columns are defined beyond `age`, `sex`, `handedness`, `species`, `strain`, | ||
`strain_rrid` and `diagnosis`, such as `group` in this example, or when a different | ||
age unit is needed (for example, gestational weeks). | ||
If no `units` is provided for age, it will be assumed to be in years relative | ||
to date of birth. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be MUST, given that all of BIDS assumes humans at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how we would validate a
MUST
, here. Also, there are rodent datasets that may not have this column in this form at this point, so we would be breaking backwards compatibility if we could validate. What about:Also, REQUIRE-ing a species name from
NCBI Taxonomy
feels like it's going to be difficult to validate, as we will need to either query the database or maintain a list of accepted names, updating the validator as new use cases arise... Is there a validation plan?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @effigies for the suggestion, I think assuming
homo sapiens
if the column is omitted is a strong incentive without breaking backward compatibility. I would be in favor of that.However, I had not thought about the validation, querying the database seems like the best option to not have to maintain an up-to-date list in the validator but it may be difficult to implement. Are there similar requirements elsewhere in the spec? Would the alternative of “SHOULD” or “strongly RECOMMENDED” be advisable?
Also, thinking about it, I think I should add examples other than
homo sapiens
likemus musculus
andrattus norvegicus
in the description.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effigies - regarding validation, i will raise the same issue here as the other post. i'm not sure we actually validate values for example for sex or anything that could have levels or enumerations.
while one could at least detect presence, i agree that keeping with the current perspective of the participants.tsv being a recommended file, we can keep things recommended instead of required.
species does get a little complicated, especially for animals, as you start going into species + genotype notions. here is our generic participant at a timepoint model in dandi: https://github.com/dandi/dandischema/blob/master/dandischema/models.py#L642 (technically all of those properties could come into play, with some being more important for animal studies).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effigies, I modified the species description as you suggested and added examples in 8b41137.
For validation purposes, I kept both the column and the taxonomy as RECOMMENDED and not REQUIRED.
Let me know what you think.