-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] BEP018: Genetic information #287
Conversation
Hi @CPernet, Thank you for converting and opening this PR! We can work on getting the formatting passed. Travis flagged a number of these formatting issues too. These issues could have resulted in the behavior you are seeing with the json examples. I can work on opening a PR on your fork Edit: opened to render the examples and pass Travis Edit2: flagging @bids-standard/everyone may be interested in reviewing! |
thx @franklin-feingold i think i fixed it (just pushed it in my fork) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening the PR @CPernet - I made some comments regarding the formatting.
|
||
## dataset_description.json | ||
|
||
Two additional keys related to the genetic data can be added. The Key `GeneticDataBase` (MANDATORY) links to the name of the database and web address. The key `GeneticDescriptor` (OPTIONAL) refers to the descriptor (e.g. journal article) of the genetic data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to revise this paragraph to use our words MUST(=mandatory), SHOULD(=recommended), and MAY(=optional), see rfc2119.
This would also solve the issue that the first sentence is a bit misleading, because it says "two keys can be added", which sounds like both are optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe GeneticDataBase
could be mandatory if 'GeneticID' is present in any tsv files or if a genetic_info.json file is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Key GeneticDataBase
MUST be added to link to the name of the database and web address. The key GeneticDescriptor
MAY also be present refering to the descriptor (e.g. journal article) of the genetic data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'GeneticID' is not mandatory if the imaging repo and genetic repo use the same ID (which the validator cannot check)
|
||
## genetic_info.json | ||
|
||
This file is the descriptor of the genetic information available either in the participant tsv file and/or the genetic database described in the dataset_description.json. The 'GeneticLevel' and 'SampleOrigin' are the only two mandatory fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is the descriptor of the genetic information available either in the participant tsv file and/or the genetic database described in the dataset_description.json. The 'GeneticLevel' and 'SampleOrigin' are the only two mandatory fields. | |
This file is the descriptor of the genetic information available either in the participant tsv file and/or the genetic database described in the dataset_description.json. The `GeneticLevel` and `SampleOrigin` are the only two mandatory fields. |
Use backticks to format field names
| Field name | Definition | Values | | ||
| :----------- | :--------- | :------| | ||
| GeneticLevel | MANDATORY Describes the level of analysis | `Genetic`, `Genomic`, `Epigenomic`, `Transcriptomic`, `Metabolomic`, or `Proteomic` | | ||
| AnalyticalApproach | OPTIONAL Methodology used to analyse the GeneticLevel | Value must be taken from [gapsolr](https://www.ncbi.nlm.nih.gov/projects/gapsolr/facets.html) under /Study/Molecular Data Type, for instance `SNP Genotypes (Array)` or `Methylation (CpG)` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you give an example of a filepath in text, please format that path using backticks like: /this/is/a/path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't got that? i used eg /Study/Molecular what's wrong
Travis is flagging some of the formating. This can be resolved after the formatting is reviewed. I can assist with that at that time |
Co-Authored-By: Franklin Feingold <35307458+franklin-feingold@users.noreply.github.com>
Co-Authored-By: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
making changes in my fork |
@franklin-feingold @sappelhoff the build has work (i think) -- the only stuff i did not do was the / in the table, since that already how this was ?? |
The table fences will have to be fixed :-) could you do that please? |
@CPernet should have fixed the formatting issue in my branch (https://github.com/franklin-feingold/bids-specification/blob/enh/genetics/src/04-modality-specific-files/08-genetic-descriptor.md) . My PR in your repo has some conflicts so perhaps can be easiest to grab my file and bring it over to your repo |
Fix Travis and spacing - merged changes
@franklin-feingold do you know why Travis failed? couldn't figure from 'details' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CPernet See my suggestions: Travis (=remark-lint) does not like too many empty lines :-)
Co-Authored-By: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
August is a particularly hard month for feedback -- I'm writing this as I'm packing for a flight ! I think you'll be likely to see more community input in September. All this to say, I'd be disappointed to see this merged before at least a few external members have chimed in. I've sent it to one of my colleagues working with genetic data, but they're on vacation ! So September seems safer to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some specific comments, but on the whole I think this could do with an edit for terminological clarity. For instance, database and dataset are used somewhat interchangeably, and we use descriptor to refer to both the entire section, a .json
file and an associated publication.
Possibly it would be useful to settle on specific vocabulary, introduce it either at the start of the section or at the end, and go through to conform.
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataset_description.json needs to be clarified. It appears inconsistent with the given example (and other references to it later in the document).
Does this spec allow for the data to be split across multiple files (e.g. by chromosome)? Is there supposed to be a convention for naming the genetics data?
I tried to see if it would be possible to apply this spec to imaging + genetic data I can access (UK Biobank), and it's unclear whether I could make the genetics data compliant with this.
@CPernet Here's the start of the bids-validator PR: This is checks for 'GeneticDatabase' in dataset_descrtiption.json if a genetic_info.json is present at the top level. It also add a json schema to validate genetic_info.json files. Issues so far:
Since this conversation is getting complex I'd be happy to continue the validator talk in the validator PR. |
Co-Authored-By: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
@effigies i pushed the last changes but travis fails, can't figure out why? help please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still fails :-(
thx anyway
@CPernet I made a PR from my personal fork to your fork that might fix the problem. The github editor was too finicky for getting the spacing right. |
couldn't fix alignment in the GH editor
Success! thx - these trailings are so annoying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sappelhoff @effigies @franklin-feingold seems ready now? needs approval
we are making examples for the repo + validator testing
sounds good! perhaps for the specification and validator/examples to stay in lock step, the associated materials (e.g., validator) can be prepared so everything continues to stay in lock? what do you all think? |
yes, since I PR to merge to master - better have the validator tested, good point |
I've merged master, pushed this to the @CPernet If it's okay with you, could we switch to that branch for the BEP018 PR? It will make making suggestions easier. |
Closing in favor of #395. |
I created the md file, and this looks fine
one issue, the json examples don't go the next line (despite spaces ??)