-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Major update to pointing to files within, outside of, or remote to a current BIDS dataset #820
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 388: Throughout the BIDS specification there are metadata...
not is metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 392 Suggest rewording of
"failed to consistently describe relative to what these pointers should be specified,
and the exact syntax to be used.
To:
"failed to consistently describe what these pointers should be relative to.
Line 394: suggest rewording of:
"Yet even with perfectly well and consistently defined "relative paths",
to:
"Yet even with consistently defined "relative paths",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having gone through the entire proposal, I find it well-thought out. However, when I started reading, I became confused on line 407:
bids:<dataset-name>:/absolute/path/within/dataset
because I didn't understand what <dataset-name>
could be if I wanted to specify a file within the dataset. It became clear later that this was the "local" bids::/
.
Could we put a brief sentence here explaining that for local paths <dataset-name>
will be empty and other cases will be explained in the examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertoostenveld I believe I have addressed all your open points (thanks for the review) - could you please verify and either raise new points or approve of the changes?
If you can't do it now, an estimated date would also be helpful :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some fiddly comments. I will write up an additional post with my more general perspective.
and the BIDS URIs for the raw dataset (excluding `/derivatives`) MUST be resolved with respect to `/dataset_description.json`. | ||
|
||
However if a `derivatives/` *folder* is nested under a raw dataset and only the raw dataset has a `dataset_description.json` file, | ||
that `derivatives/` *folder* is not considered a BIDS dataset (see [Storage of Derivated Datasets](#storage-of-derived-datasets)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that `derivatives/` *folder* is not considered a BIDS dataset (see [Storage of Derivated Datasets](#storage-of-derived-datasets)), | |
that `derivatives/` *folder* is not considered a BIDS dataset (see [Storage of Derived Datasets](#storage-of-derived-datasets)), |
It may just as well be a (sub-)directory containing one or more arbitrary files. | ||
|
||
In the case where a derivatives dataset is nested under a raw dataset and both have a `dataset_description.json` file, | ||
the BIDS URIs within the nested derivatives dataset MUST be resolved with respect to `/derivatives/dataset_description.json`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we don't treat <bids>/derivatives/
itself as a derivative dataset but a container for derivative datasets.
the BIDS URIs within the nested derivatives dataset MUST be resolved with respect to `/derivatives/dataset_description.json`, | |
the BIDS URIs within the nested derivatives dataset MUST be resolved with respect to `/derivatives/<derivative>/dataset_description.json`, |
However if a `derivatives/` *folder* is nested under a raw dataset and only the raw dataset has a `dataset_description.json` file, | ||
that `derivatives/` *folder* is not considered a BIDS dataset (see [Storage of Derivated Datasets](#storage-of-derived-datasets)), | ||
and all BIDS URIs MUST be resolved with respect to the `/dataset_description.json` file. | ||
The latter situation is then comparable to referring to data in a `sourcedata/` or `code/` folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contents of sourcedata/
, code/
and unspecified derivatives are not specified, so it does not make sense to me to provide an interpretation for BIDS URIs found within them. We can't stop people from using them, and a tool from doing its best to find an appropriate BIDS root to resolve relative to, but this feels out-of-scope.
However sometimes it may be convenient to refer to files that are outside of a given dataset but on the same host. | ||
BIDS URIs allow for specifying such locations, | ||
but such specifications are by definition not portable in that the BIDS URIs break when the host changes. | ||
When sharing a BIDS dataset, the dataset curator MUST make sure that all BIDS URIs are portable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean for the validator? Should the validator reject relative paths, have a relative-path rejecting mode that something like OpenNeuro can use, or something else?
@@ -34,7 +34,7 @@ sub-01/func/sub-01_task-rest_desc-preproc_bold.json | |||
|
|||
```JSON | |||
{ | |||
"RawSources": ["sub-01/func/sub-01_task-rest_bold.nii.gz"] | |||
"RawSources": ["bids::/sub-01/func/sub-01_task-rest_bold.nii.gz"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawSources
are in a source dataset. We probably want something like:
"RawSources": ["bids::/sub-01/func/sub-01_task-rest_bold.nii.gz"] | |
"RawSources": ["bids:src:/sub-01/func/sub-01_task-rest_bold.nii.gz"] |
Using forward-slash separated paths is | ||
[DEPRECATED](/02-common-principles.html#definitions). | ||
The value of this field MUST be shared only by the images meant to be used | ||
as inputs for the estimation of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a unique string, not a URI. It's just an identifier so that you can do things like:
fieldmap = layout.get(B0FieldIdentifier="myidentifier", extension=".nii.gz")
MUST be a | ||
[BIDS URI][/02-common-principles.html#bids-uri-pointing-to-files-within-and-outside-of-bids-datasets], | ||
using forward-slash separated paths is | ||
[DEPRECATED](/02-common-principles.html#definitions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, identifier, not a URI.
applies_to = layout.get(B0FieldSource="myidentifier", extension=".nii.gz")
Apologies for not getting back to this sooner, as I see significant discussion has happened. I would like to push back on a particular concept here, but I recognize that this may have been settled to others' satisfaction. If that's the case, please use a thumbs-down (:-1:) on my post to indicate that you don't want to reconsider this. Please use a thumbs-up (:+1:) to indicate that you agree with my proposal, and a confused (😕) to indicate that you would like to re-discuss. I will leave it to @sappelhoff to decide whether this can be summarily dismissed, accepted, or re-litigated.
I would prefer In practice, I don't see an obvious way to enforce this distinction for subdirectories short of requiring that the target be validated, but I think we should not encourage this. We present a few use cases, where I would recommend alternative practices:
I would instead encourage users to use a regular URI such as https://github.com/spm/spm12/raw/r7771/canonical/single_subj_T1.nii. Alternately, to ensure access, one could copy
I would treat this the same, actually. I would either use a remote URI (https://github.com/mgxd/brainplot/raw/master/brainplot/Conte69_Atlas/Conte69.R.midthickness.32k_fs_LR.surf.gii) or encourage people to copy it into To put a fine point on it, I see BIDS URIs as solving two problems:
Dataset links seemed like an elegant solution as it allowed us to unify both within and between, while simultaneously making wholesale reorganization tractable via editing a single dictionary at the root of the dataset. Using a URI schema allows us to further unify with external resources because now we can say "Any reference to a file is a URI", which is easily validated. The indirection is less clearly needed for unrelated datasets or external resources, and so I would rather not introduce it. Additionally, to me As a final, minor point, I also worry that diluting the meaning of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for the thorough look, Chris. I'll deal with this once I am back from vacation. All others who want to help: Please read the post and "vote" as requested :-)
i think the proposal really comes down to this statement from @effigies above. but, the two examples right after point to non-bids-dataset references (spm, conte69). i concur with the above sentiment that i thought the intent of this PR was to refer to files in bids datasets rather than general resources, which then gets into provenance, parameters of operations, etc.,. if the latter is to be considered, how would this relate to |
Having read through all of the comments and thinking about it, I basically agree with comments of @effigies. Is it correct that this proposal is:
I do have a question though..... what is a BIDS dataset? I realized that I am not completely clear on what the minimum requirements are for a directory to be considered "BIDS". Is there a strictly enforced rule about what sub-directories must/are allowed to be under the top level directory? What is the minimum requirement for files in the root? |
To commenters/reviewers on this PR, I have opened #918 as an alternative that addresses my concerns. In the significant scale-back of scope, it's possible that I took out too much clarifying text that was added previously. I would appreciate your review over there. Pinging especially @nicholst @robertoostenveld @satra @VisLab, who previously reviewed. |
This a quite extensive PR... is it possible to get rendered? (Sorry if I should know the URL magic!) |
@nicholst I would look at #918, which is intended to replace this one and is much more compact. It's rendered here: https://bids-specification--918.org.readthedocs.build/en/918/ For reference, this one is rendered here: https://bids-specification--820.org.readthedocs.build/en/820/ You can find these links as the readthedocs.org check (bottom line of this screenshot): |
closing this, as #918 has completely taken over and all discussion seems resolved. |
This PR implements a suggestion by @effigies to solve several long standing issues with how we refer to files within, outside of, and remote to a current BIDS dataset. (see: https://github.com/bids-standard/bids-specification/projects/6)
These fields are affected:
IntendedFor
SpatialReference
AssociatedEmptyRoom
B0FieldIdentifier
B0FieldSource
BasedOn
RawSources
Sources
Specifically, @effigies suggests (and I agree) to use a URI scheme like
bids:<dataset-name>:/absolute/path/within/dataset
to refer to data.<dataset-name>
s are declared as name-to-location mappings in a newDatasetLinks
object indataset_description.json
.The whole proposal is fleshed out in the changes of this PR in the "Common Principles" section.
I also tried to make all other adjustments (examples, metadata field descriptions, etc.) so that you can have an impression of what will be touched by these changes. I may have missed something, please help me find everything.
This is my interpretation of @effigies proposal, so please correct me where I went wrong, Chris. And all other reviewers: Please help us to make this a good revision to consistently refer to files in BIDS!
This is intended to close several issues:
closes #790
closes #471
closes #718
#757 (or at least, pave the way for it to be closed)