Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] BEP031 - New columns to participants.tsv file #816

Merged
merged 13 commits into from
Oct 13, 2021

Conversation

mariehbourget
Copy link
Collaborator

Dear BIDS community,

Context

As part of the development of the Microscopy BEP (BEP031 @mariehbourget, @jcohenadad) and Animal Ephys BEP (BEP032 @SylvainTakerkart, @JuliaSprenger), the “sample” entity was introduced along with a samples.tsv file describing properties of the samples in PR #812.

To better describe animal participants, new properties of the subjects also need to be included in participants.tsv.

Contribution

The purpose of this PR is to add new recommended column (species) and optional columns (strain, strain_rrid and diagnosis) to the participants.tsv file.

See issue #779 for related discussions on this topic.

and `handedness`. We RECOMMEND to make use of these columns, and
in case that you do use them, we RECOMMEND to use the following values
for them:
When different from `homo sapiens`, `participants.tsv` SHOULD include a `species`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be MUST, given that all of BIDS assumes humans at this point.

Suggested change
When different from `homo sapiens`, `participants.tsv` SHOULD include a `species`
When different from `homo sapiens`, `participants.tsv` MUST include a `species`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how we would validate a MUST, here. Also, there are rodent datasets that may not have this column in this form at this point, so we would be breaking backwards compatibility if we could validate. What about:

The RECOMMENDED `species` column MUST be a binomial species name from the
[NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi).
For backwards compatibility, if `species` is absent, the participant is assumed to be
`homo sapiens`.

Also, REQUIRE-ing a species name from NCBI Taxonomy feels like it's going to be difficult to validate, as we will need to either query the database or maintain a list of accepted names, updating the validator as new use cases arise... Is there a validation plan?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @effigies for the suggestion, I think assuming homo sapiens if the column is omitted is a strong incentive without breaking backward compatibility. I would be in favor of that.

However, I had not thought about the validation, querying the database seems like the best option to not have to maintain an up-to-date list in the validator but it may be difficult to implement. Are there similar requirements elsewhere in the spec? Would the alternative of “SHOULD” or “strongly RECOMMENDED” be advisable?

Also, thinking about it, I think I should add examples other than homo sapiens like mus musculus and rattus norvegicus in the description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@effigies - regarding validation, i will raise the same issue here as the other post. i'm not sure we actually validate values for example for sex or anything that could have levels or enumerations.

I don't know how we would validate a MUST

while one could at least detect presence, i agree that keeping with the current perspective of the participants.tsv being a recommended file, we can keep things recommended instead of required.

species does get a little complicated, especially for animals, as you start going into species + genotype notions. here is our generic participant at a timepoint model in dandi: https://github.com/dandi/dandischema/blob/master/dandischema/models.py#L642 (technically all of those properties could come into play, with some being more important for animal studies).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@effigies, I modified the species description as you suggested and added examples in 8b41137.
For validation purposes, I kept both the column and the taxonomy as RECOMMENDED and not REQUIRED.
Let me know what you think.

- `strain_rrid`: research resource identifier ([RRID](https://scicrunch.org/resources/Organisms/search))
of the strain of the species

- `diagnosis`: string value describing the diagnosis of the participant.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know if this has to be a string value. in many datasets on openneuro diagnosis/dx is present and can be an enumerated type. also, this is one place, where one can have multiple designations depending on the study. we should allow for some notion of that, or simply remove diagnosis from this file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aim of this PR is to add new columns to describe animal properties. In that context, I agree that diagnosis may be out of scope.

For context, I added the columns following this discussion #779 (comment), #779 (comment) and #779 (comment) because we also introduced pathology in samples.tsv.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following your suggestion, I removed the diagnosis column in 7146144 as being out of scope for this PR (animal properties).

@@ -197,6 +201,15 @@ for them:
- for "ambidextrous", use one of these values: `ambidextrous`, `a`, `A`,
`AMBIDEXTROUS`, `Ambidextrous`

- `strain`: string value indicating the strain of the species
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples for each of these would be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, do you mean example directly in the description, like above for handedness with ambidextrous, a, A, etc, or an example of particpants.tsv for an animal.

I did not change the example of participants.tsv below which is an example for human. I thought having complete examples for both human and animal would maybe be too much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples added in b451671.

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a more general and perhaps controversial comment, that I'd like to hear opinions on.

Up until now BIDS has been mostly about humans and their brain+behavior data. I assume that >80% of users will continue to come to BIDS from this angle.

Adding coverage for animal ephys, and microscopy into BIDS is great, and I fully support it. However is it a good idea to introduce very specific concepts (such as "strain") that are not used in 80% of BIDS use-cases* into a very general section that discusses things like age, sex, and handedness (which are ubiquitous)? I feel like this would be better discussed in a dedicated, separate section of the spec (animal ephys / microscopy respectively), as it avoids clutter and potential confusion / annoyance with the spec.

*(this is my assumption/prediction, feel free to challenge).

[NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi).

Commonly used *optional* columns in `participants.tsv` files are `age`, `sex`,
`handedness`, `strain`, `strain_rrid` and `diagnosis`. We RECOMMEND to make use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could group be added to this list as its used in the example below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'm not sure we should because group is also used below to illustrate how to describe an additional column that is not part of the optional ones in the participants.json example. From my understanding, group may have different meanings depending on the study and it would be hard to define it with a general example

@mariehbourget
Copy link
Collaborator Author

Adding coverage for animal ephys, and microscopy into BIDS is great, and I fully support it. However is it a good idea to introduce very specific concepts (such as "strain") that are not used in 80% of BIDS use-cases into a very general section that discusses things like age, sex, and handedness (which are ubiquitous)? I feel like this would be better discussed in a dedicated, separate section of the spec (animal ephys / microscopy respectively), as it avoids clutter and potential confusion / annoyance with the spec.

Thanks @sappelhoff for your feedback!
I changed the wording to make it clearer that it applies to species other than human and added an example for strain and strain_rrid, so it may create less confusion in that way.

The main goal of this PR is to promote species as a recommended field in participants.tsv to introduce animal data in the spec, as discussed previously with BEP032 and with the steering group.

We foresee that the BIDS-specification for microscopy will be adopted by many users, some have already started converting their datasets or working on compatible acquisition workflows, but I would not be able to quantify such adoption at this point.
So for microscopy and animal ephys, it is likely that strain and strain_rrid will apply for many users, but I agree that it would not be the case for 80% of BIDS at large.

I'm not against moving those field into the dedicated microscopy and animal ephys sections if we want to keep the general section more general. The main drawback being duplication in the specification.

@mariehbourget
Copy link
Collaborator Author

Hi @effigies and @sappelhoff,
I’m currently working on the microscopy .md file for the main BEP031 PR. I would like to know your thoughts about the optional columns strain and strain_rrid, following the latest changes and comment above.

I can move them to the microscopy part if it makes more sense to have them per modalities rather than in the “general” participants file.

In that case, this PR would only add the species columns to participants.
Thank you!

@sappelhoff
Copy link
Member

tagging @Remi-Gau to chime in instead of me :-)

@Remi-Gau
Copy link
Collaborator

Tempted to keep those strain and strain_rrid in the modality agnostic part, because at least 2 BEP will be using them and if we only start allowing in this part of the spec things that can REALLY ONLY apply to any modality then, we might have to move handedness out as I am not sure how it applies to rodents, invertebrates, cell cultures...

@SylvainTakerkart
Copy link

Tempted to keep those strain and strain_rrid in the modality agnostic part, because at least 2 BEP will be using them and if we only start allowing in this part of the spec things that can REALLY ONLY apply to any modality then, we might have to move handedness out as I am not sure how it applies to rodents, invertebrates, cell cultures...

agreed! if we only put them in the modality-specific part, we will have to repeat it for several future modalities and it's going to get more complex to handle the specs...

@mariehbourget
Copy link
Collaborator Author

Thank you everyone for your feedback!
All the comments have been addressed and the PR is ready for re-review.

@mariehbourget mariehbourget mentioned this pull request Sep 16, 2021
8 tasks
@mariehbourget
Copy link
Collaborator Author

mariehbourget commented Oct 8, 2021

Hi @bids-standard/maintainers,
All the comments in the PR have been addressed. I think it would be ready to be merged, is there something else we need to do?

I also wanted to mention that it may have an impact on another opened PR (#827) as it deals with TSV columns.
Thanks!

@effigies
Copy link
Collaborator

effigies commented Oct 8, 2021

I am still +1 for merge. Any seconds?

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I can handle adding the new columns to the TSV columns PR.

@tsalo tsalo mentioned this pull request Oct 9, 2021
2 tasks
@mariehbourget
Copy link
Collaborator Author

Great! thank you @effigies and @tsalo, should I do the merge or are the PRs usually merged by maintainers?

@effigies
Copy link
Collaborator

Since the last meaningful commit was in August, I think we can merge. @sappelhoff are there any blocks I'm forgetting?

@sappelhoff
Copy link
Member

Nope, I think this can go in 👍

@effigies effigies merged commit aae6274 into bids-standard:master Oct 13, 2021
tsalo added a commit to tsalo/bids-specification that referenced this pull request Oct 13, 2021
tsalo added a commit that referenced this pull request Nov 9, 2021
* Add template.

* Add first column.

* Add columns.

* More terms.

* Fill name field in all files.

* More work.

Note that there are three very different uses of "name" columns, and two of them are equally common, so I chose not to specify any of them as the "canonical" definition.

* Add remaining definitions.

* Add macro to render column tables.

* Fix YAML file.

* Consolidate suffixes file.

* Remove old individual files.

* Move columns file.

* Fix things up a bit.

* Add columns I missed for modality-agnostic TSV files.

* Support n/a for duration.

* Apply suggestions from code review

Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

* Code formatting in stim_file definition.

* Allow numbers and strings for value.

* Update src/schema/objects/columns.yaml

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

* Allow n/a for "z" column.

Addresses https://github.com/bids-standard/bids-specification/pull/827/files#r723280787.

* Describe meanings of x, y, and z columns.

Addresses https://github.com/bids-standard/bids-specification/pull/827/files#r723283314.

* Allow n/a for status column.

Addresses https://github.com/bids-standard/bids-specification/pull/827/files#r723269382.

* Add participant_id to participants.tsv table and append info for other IDs.

* Split type definitions into channels and electrodes versions.

* Update definitions for group based on file type.

* Split reference column definition.

* Clean up name_channels definition.

* Draft new columns from #816

* Add new columns to table.

* Remove list items.

* Update src/04-modality-specific-files/04-intracranial-electroencephalography.md

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

* Apply suggestions from code review

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

* Use two underscores to delineate multiply-defined columns.

* Remove text that is now in table.

* Update src/schema/objects/columns.yaml

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

* Add sections to README on columns file and on reused terms.

* Add EDF info to acq_time definition.

* Remove hardcoded tables.

* Remove unused links.

Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants