-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] [ENH] Provenance BEP028 #439
Conversation
@cmaumet the linkchecker is returning some errors |
src/03-modality-agnostic-files.md
Outdated
|
||
Possible places to encode provenance | ||
|
||
**Dataset level provenance.** At the dataset level, provenance could be about the dataset itself, or about any entity in the dataset. This provenance may evolve as new data are added, which may include sourcedata, BIDS data, and BIDS derived data. One option is to make use of <code>[https://w3c.github.io/json-ld-syntax/#named-graphs](https://w3c.github.io/json-ld-syntax/#named-graphs)</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to put your [https://w3c.github.io/json-ld-syntax/#named-graphs]
link into a code block? If you are certain that that's necessary, you can use the backtick sign: `
once at the beginning and once at the end for inline code: like so
assuming you just want the link DESCRIPTION to be formatted like code, you need to put the backticks inside the square brackets: https://w3c.github.io/json-ld-syntax/#named-graphs
[ + backtick + description + backtick + ] + ( + url + )
see: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sappelhoff OK you are right, fixing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mh okay, this didn't fix linkchecker. Seems more like a linkchecker issue ... your link works 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed with @cmaumet , i'll check by only keeping the link between the brackets, like :
[named-graphs](https://w3c.github.io/json-ld-syntax/#named-graphs)
just referencing the related issues here |
and Pull Requests |
@sappelhoff the linkchecker you are using seems to be dead : when I look into circle-ci logs I find a pointer to https://github.com/wummel/linkchecker/issues Are you sure the linkchecker is up-to-date ? |
I remember that it was a bit of a pain to get it in back then in #293 but we are also not using the wummel/linkchecker directly, I think. See:https://github.com/yarikoptic/linkchecker Hopefully @yarikoptic can help you. He implemented the linkchecker for our repo. |
so either it is some java-script magic or some other goodness which would not work for linkchecker (since it has no JS runtime support AFAIK). Thus either some more conventional permalink could be found, or disable linkchecker altogether |
FWIW
PS FWIW -- asked the origin: w3c/json-ld-syntax#343 |
This reverts commit fb87411. According to w3c/json-ld-syntax#343 (comment) references should point to final published versions on https://www.w3.org/TR/json-ld11/
pushed 4a06044 which brings "correct" URL for named-graphs back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some formatting comments
* Docs to Markdown version 1.0β20 | ||
* Tue Mar 24 2020 09:07:42 GMT-0700 (PDT) | ||
* Source doc: BIDS Extension Proposal XX (BEP0XX): Provenance | ||
-----> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines 220 till 235 can be removed (some tool output)
### Available under the CC-BY 4.0 International license | ||
|
||
Extension moderator/lead: Satra Ghosh <[satra@mit.edu](mailto:satra@mit.edu)> Camille Maumet <camille.maumet@inria.fr> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines 237 until 243 would need to be removed as well if this is a proposal to be directly integrated into the specification
This specification is an extension of BIDS, and general principles are shared. The specification should work for many different settings and facilitate the integration with other imaging methods. | ||
|
||
To see the original BIDS specification, see this link. This document inherits all components of the original specification (e.g. how to store imaging data, events, stimuli and behavioral data), and should be seen as an extension of it, not a replacement. | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for lines 244 until 250 (can be deleted): This is usually a disclaimer that we use in BEPs. it should not be part of what we put into our specification directly
|
||
ii. Provenance records MUST use the [PROV model](https://www.w3.org/TR/prov-o/) ontology and SHOULD be augmented by terms curated in the BIDS specification, the [NIDM](http://nidm.nidash.org/) model, and future enhancements to these models. | ||
|
||
iii. If provenance records are included, these records of provenance of a dataset or a file MUST be described using a `[<prefix>_]prov.jsonld` file. Since these [jsonld](https://json-ld.org/) documents are graph objects, they can be aggregated without the need to apply any inheritance principle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this needs to be fixed: [<prefix>_]prov.jsonld
|
||
Example context: Common | ||
|
||
[https://some/url/to/bids_common_context.jsonld]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[https://some/url/to/bids_common_context.jsonld]() | |
[https://some/url/to/bids_common_context.jsonld](https://some/url/to/bids_common_context.jsonld) |
|
||
Example context: Provenance | ||
|
||
[https://some/url/to/bids_provenance_context.jsonld]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[https://some/url/to/bids_provenance_context.jsonld]() | |
[https://some/url/to/bids_provenance_context.jsonld](https://some/url/to/bids_provenance_context.jsonld) |
@sappelhoff: Thanks for your review! We'll look into this as soon as possible. Do you have more info on the process to make this an official BEP? Thank you! |
@cmaumet @remiadon usually the process goes a bit like this: BEP process
At some point (between step 2 and 3), we make the BEP official by adding a number (e.g. BEP006 ...) and making an entry on our list of active BEPs In the past, this step of "making a BEP official" was done by @chrisgorgo / a BIDS maintainer. But now we have our @bids-standard/steering group, which will review a BEP and make it official (or request changes). See the "Draft BEP review" point in the BIDS governance. To me it seems like you are already at the "final" stage of your BEP (at least you are converting to markdown and opening a PR), and the issue is that your BEP has not yet been made official? Perhaps it'd be good if you get the BEP (perhaps even this PR) into a presentable state for the @bids-standard/steering group and then ask for an official review. This is also interesting for two more BEPs that will soon want to get an "official" status.
See also: |
Hi @sappelhoff Thanks a lot!
Although we did make a pull request, I would not say that our BEP is at the final stage yet. We would like to engage with the community, get feedback and improve it first.
Yes, that's exactly what we are after now. It would be nice to be able to describe our effort as an official BEP (with a number, a link from the BIDS website, etc.) as we get more people onboard. How can we ask @bids-standard/steering to review our BEP and make it official? (Is a ping on GitHub enough?) Thank you!! |
probably yes, but to be sure we can ask @franklin-feingold to also put it on the agenda for the next steering group meeting. He also publishes the meeting notes of these steering group meetings regularly on the webpage news section |
I think it would be nice to
|
Thanks @yarikoptic! I'd be in favour of creating a space for community discussions around the 'BIDS PROV' BEP (maybe regular calls, a GitHub repo for examples etc.). And to me both of your suggestion would happen in that space. But first it would be nice to have the greenlight from BIDS steering to make this an official BEP, no? (Note: We started looking into creating real life examples with @remiadon, this is work-in-progress and currently under my lab GitHub organization at https://github.com/Inria-Visages/BIDS-prov/tree/master/examples. This could be moved and discussed in the 'BIDS PROV' discussion space). |
This BEP is now official: BEP028, see: bids-standard/bids-website#123 |
sub-01/ | ||
func/ | ||
sub-01_task-xyz_acq-test1_run-1_bold.nii.gz | ||
sub-01_task-xyz_acq-test1_run-1_prov.jsonld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would appear to apply equally well to sub-01_task-xyz_acq-test1_run-1_events.tsv
. I would suggest that prov
is not an appropriate suffix, and could either be made into an extension .prov
or a double-extension .prov.jsonld
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effigies - i agree. i'm waiting for a PR to this PR to get merged before making any additional changes.
Closing in favor of #487. @sappelhoff @franklin-feingold - if we can update the BEP028 PR link that would be great. |
hey @satra - BEP028 from the website directs to your google doc currently - doesn't appear the PR was linked |
bids-standard#487 (and originally bids-standard#439) is a `WIP ENH` to introduce standardized provenance capture/expression for BIDS datasets. This PR just follows the idea of bids-standard#371 (small atomic ENHs), and is based on current state of the specification where we have GeneratedBy to describe how a BIDS derivative dataset came to its existence. ## Rationale As I had previously stated in many (face-to-face when it was still possible ;)) conversations, in my view, any BIDS dataset is a derivative dataset. Even if it contains "raw" data, it is never given by gods, but is a result of some process (let's call it pipeline for consistency) which produced it out of some other data. That is why there is 1) `sourcedata/` to provide placement for such original (as "raw" in terms of processing, but "raw"er in terms of its relation to actual data acquired by equipment), and 2) `code/` to provide placement for scripts used to produce or "tune" the dataset. Typically "sourcedata" is either a collection of DICOMs or a collection of data in some other formats (e.g. nifti) which is then either converted or just renamed into BIDS layout. When encountering a new BIDS dataset ATM it requires forensics and/or data archaeology to discover how this BIDS dataset came about, to e.g. possibly figure out the source of the buggy (meta)data it contains. At the level of individual files, some tools already add ad-hoc fields during conversion into side car .json files they produce, <details> <summary>e.g. dcm2niix adds ConversionSoftware and ConversionSoftwareVersion</summary> ```shell (git-annex)lena:~/datalad/dbic/QA[master]git $> git grep ConversionSoftware | head -n 2 sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftware": "dcm2niix", sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftwareVersion": "v1.0.20170923 (OpenJPEG build) GCC6.3.0", ``` </details> ATM I need to add such metadata to datasets produced by heudiconv to make sure that in case of incremental conversions there is no switch in versions of the software.
bids-standard#487 (and originally bids-standard#439) is a `WIP ENH` to introduce standardized provenance capture/expression for BIDS datasets. This PR just follows the idea of bids-standard#371 (small atomic ENHs), and is based on current state of the specification where we have GeneratedBy to describe how a BIDS derivative dataset came to its existence. ## Rationale As I had previously stated in many (face-to-face when it was still possible ;)) conversations, in my view, any BIDS dataset is a derivative dataset. Even if it contains "raw" data, it is never given by gods, but is a result of some process (let's call it pipeline for consistency) which produced it out of some other data. That is why there is 1) `sourcedata/` to provide placement for such original (as "raw" in terms of processing, but "raw"er in terms of its relation to actual data acquired by equipment), and 2) `code/` to provide placement for scripts used to produce or "tune" the dataset. Typically "sourcedata" is either a collection of DICOMs or a collection of data in some other formats (e.g. nifti) which is then either converted or just renamed into BIDS layout. When encountering a new BIDS dataset ATM it requires forensics and/or data archaeology to discover how this BIDS dataset came about, to e.g. possibly figure out the source of the buggy (meta)data it contains. At the level of individual files, some tools already add ad-hoc fields during conversion into side car .json files they produce, <details> <summary>e.g. dcm2niix adds ConversionSoftware and ConversionSoftwareVersion</summary> ```shell (git-annex)lena:~/datalad/dbic/QA[master]git $> git grep ConversionSoftware | head -n 2 sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftware": "dcm2niix", sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftwareVersion": "v1.0.20170923 (OpenJPEG build) GCC6.3.0", ``` </details> ATM I need to add such metadata to datasets produced by heudiconv to make sure that in case of incremental conversions there is no switch in versions of the software.
bids-standard#487 (and originally bids-standard#439) is a `WIP ENH` to introduce standardized provenance capture/expression for BIDS datasets. This PR just follows the idea of bids-standard#371 (small atomic ENHs), and is based on current state of the specification where we have GeneratedBy to describe how a BIDS derivative dataset came to its existence. ## Rationale As I had previously stated in many (face-to-face when it was still possible ;)) conversations, in my view, any BIDS dataset is a derivative dataset. Even if it contains "raw" data, it is never given by gods, but is a result of some process (let's call it pipeline for consistency) which produced it out of some other data. That is why there is 1) `sourcedata/` to provide placement for such original (as "raw" in terms of processing, but "raw"er in terms of its relation to actual data acquired by equipment), and 2) `code/` to provide placement for scripts used to produce or "tune" the dataset. Typically "sourcedata" is either a collection of DICOMs or a collection of data in some other formats (e.g. nifti) which is then either converted or just renamed into BIDS layout. When encountering a new BIDS dataset ATM it requires forensics and/or data archaeology to discover how this BIDS dataset came about, to e.g. possibly figure out the source of the buggy (meta)data it contains. At the level of individual files, some tools already add ad-hoc fields during conversion into side car .json files they produce, <details> <summary>e.g. dcm2niix adds ConversionSoftware and ConversionSoftwareVersion</summary> ```shell (git-annex)lena:~/datalad/dbic/QA[master]git $> git grep ConversionSoftware | head -n 2 sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftware": "dcm2niix", sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftwareVersion": "v1.0.20170923 (OpenJPEG build) GCC6.3.0", ``` </details> ATM I need to add such metadata to datasets produced by heudiconv to make sure that in case of incremental conversions there is no switch in versions of the software.
bids-standard#487 (and originally bids-standard#439) is a `WIP ENH` to introduce standardized provenance capture/expression for BIDS datasets. This PR just follows the idea of bids-standard#371 (small atomic ENHs), and is based on current state of the specification where we have GeneratedBy to describe how a BIDS derivative dataset came to its existence. ## Rationale As I had previously stated in many (face-to-face when it was still possible ;)) conversations, in my view, any BIDS dataset is a derivative dataset. Even if it contains "raw" data, it is never given by gods, but is a result of some process (let's call it pipeline for consistency) which produced it out of some other data. That is why there is 1) `sourcedata/` to provide placement for such original (as "raw" in terms of processing, but "raw"er in terms of its relation to actual data acquired by equipment), and 2) `code/` to provide placement for scripts used to produce or "tune" the dataset. Typically "sourcedata" is either a collection of DICOMs or a collection of data in some other formats (e.g. nifti) which is then either converted or just renamed into BIDS layout. When encountering a new BIDS dataset ATM it requires forensics and/or data archaeology to discover how this BIDS dataset came about, to e.g. possibly figure out the source of the buggy (meta)data it contains. At the level of individual files, some tools already add ad-hoc fields during conversion into side car .json files they produce, <details> <summary>e.g. dcm2niix adds ConversionSoftware and ConversionSoftwareVersion</summary> ```shell (git-annex)lena:~/datalad/dbic/QA[master]git $> git grep ConversionSoftware | head -n 2 sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftware": "dcm2niix", sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json: "ConversionSoftwareVersion": "v1.0.20170923 (OpenJPEG build) GCC6.3.0", ``` </details> ATM I need to add such metadata to datasets produced by heudiconv to make sure that in case of incremental conversions there is no switch in versions of the software.
This PR has been replaced by #487
With this PR, we would like to introduce a new BEP “BIDS-Provenance” to record provenance information in BIDS, i.e how the data were generated and processed, going all the way from raw data to results rendering. The proposed model is built as an extension of the W3C PROV model.
Discussions on this BEP were initially started in Google document with @satra, @jbpoline, @yarikoptic, @remiadon, @cmaumet. We would now like to make this effort an official BIDS Extension Proposal (BEP) and continue building this model with the BIDS community. This effort would be co-moderated by @satra and @cmaumet.
@steering : could you let us know what are the next steps to make this an official BEP? We are happy to answer any questions you may have.