-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specification of relative or absolute path in IntendedFor and SpatialReference is not consistent #471
Comments
Reading a bit further, I see another |
Thanks for bringing this up, @robertoostenveld. As another data point, I can see why
Then I have enough information to find the files. Any more specific path I might give will be very fragile if I organize my derivatives anywhere besides in the I would suggest that we try to normalize on "path, not beginning with |
uff, just ran into it -- it is indeed quite messy ATM in how paths are "prescribed" and there is no consistency across or even within (IntendedFor of mri vs eeg/meg) fields! I think we should look into harmonizing it across fields etc.
would conflict with current use of IntendedFor -- should not start with What about, if starts with
and follows POSIX convention ( FWIW, as of now (v1.4.0-373-g2f0f61b)
so it doesn't follow any of the conventions ;-) PS I wonder if we should come up with a new label such as |
👍 |
I don't get what you mean by
other than that, sounds good. But isn't Robert's original point also about referring (for example) from a derivatives dataset to a source (raw) dataset - even when that source dataset is not even on disk? Say I have an isolated derivatives-only dataset in which I place some What to use for that? Something like |
One problem is referring within one dataset, where I would for now say that the raw bids dataset is one, and a derived dataset (that might be placed inside the raw dataset) would be another. Another challenge is how to refer to files in other datasets, where it gets confusing as to what defines "another" dataset is. For example in
I am not sure whether people would consider derivative1 a different dataset from rawbids, but derivative2 and 3 (*) would certainly be different ones. And once they are different datasets, you can move them separately (on the same filesystem, or between servers) and put them together in different ways (or using datalad with subtrees or subvolumes). *) this is where I would put them, so that I can protect the rawbids directory by setting it to read-only. This also helps with my/our archiving and sharing policy. I would personally also put sourcedata next to rawbids, and not inside it. |
Somewhere I have a half-written comment that I intended to post months ago, but I think I've hit the point where it seems more efficient to start over than to find that tab. There are three cases where we want to refer to other files:
URIs are trivially permissible, and take the form
I would require all paths to be relative to the root of some dataset, except when backwards compatibility requires us to permit that use case. Elaboration of proposed URI follows: Within-dataset referencesTo refer to a file within the dataset, I would have two options:
This would match @yarikoptic's notion that starting with a Between-dataset references
Now I need some way to find {
...
"DatasetLinks": {
"deriv1": "file://derivatives/derivative1", # Within dataset
"deriv2": "file://../derivative2", # Local to a collection of datasets intended to be bundled together
"deriv3": "file:///path/to/derivative3", # Absolute on host
"deriv3": "https://mydatasethost.com/look/at/this/dataset.git" # Remote
}
} Now there's no guarantee that a link persists; if I'm a piece of software creating a derivative of a dataset that doesn't declare a DOI that I can reference, the best I can do is refer to it by the absolute path on the computer I'm running on. But a dataset curator can update Does this feel like something like the right direction? |
absolutely, thanks a lot for taking the time and writing this up! Why would you reserve
+1
+1 cc @adam2392 |
I like the direction, but think that this might confuses the "scheme" with the "authority". BIDS is just a format description, not an actual implementation of a storage system. Hence the
is not unique, since BIDS does not specify a unique namespace. E.g., we could both have a dataset with the same identifier ("study1") on our respective local computers. If it would be I think that
would be uniquely and persistently resolvable, as would
where the doi has been replaced by the corresponding (but not persistent) http web address. (Note that you cannot download from there directly, so not an ideal example). Idem, we could conceive something like
that would resolve to
or some corresponding datalad or S3 address specification. |
Minor note: / is allowed in doi suffix, so in doi:10.34973/37n0-yc51/sub-A2002/anat/sub-A2002_T1w.nii you can't really tell where doi stops and path begins. |
What I'm proposing is:
If we want to add an I'm not sure I would support adding authorities, though, because links between datasets are often quite dense, i.e., if I'm referencing an external dataset in one place, I'm probably referencing it in another. The canonical example in my mind is a derivative where each derivative data file may point back to one or more source files. If for some reason I need to change the dataset location, the operation would be large and potentially error-prone. Having a small registry that is updated seems to fit the use case better, IMO.
The namespaces would be defined within the dataset via the
I do not believe DOIs allow extending the path as you describe, at least in general. DOI is a |
could be made into ad-hoc {
"dataset": "doi:10.34973/37n0-yc51",
"path": "sub-A2002/anat/sub-A2002_T1w.nii"
} |
DOIs indeed do not specify how one would extend the path if a DOI were to refer to a filesystem. Also, DOIs don't have to refer to a filesystem but could refer to a PDF and then you might want to extent it with page numbers, or line numbers instead. Assuming the DOI points to a filesystem, we still have the issue that @yarikoptic mentions, which is that in my example the DNS hostnames also do not allow the specification of a an identifier on the server. Nevertheless, https://github.com/bids-standard/bids-specification/blob/master/README.md still works, because it breaks down into So my observation is more that URIs in general consist of a triplet. Your proposal consists of a pair, like |
I agree that decoupling is the best approach, and I would implement what you have as
The authority is Apologies if the above points were already clear. I may not be understanding your objections. |
More and more issues are raised connected to this problem, and I expect this to become worse in the future - so the earlier we solve it (properly), the better. I think Chris' proposal is in a good state to:
@effigies do you have time to prepare an initial PR in the next days? Alternatively, I can give it a go and you can review whether I accurately represented your proposal. Let me know. |
@sappelhoff Please feel free to take a pass at it. My attention is pretty divided at the moment. |
While reading the common derivatives draft as the rendered version of #265, I noticed that for
SpatialReference
it uses an absolute path:whereas in the existing MRI examples there is
and in the EEG examples there is again
This made me realize that the specification is not clear w.r.t. references to other files. E.g. in the
SpatialReference
case (which is in the derivative folder): would that point to a file in the derivative itself, or could it also point to a file at a higher level (e.g. the raw level)? In this case I know it points within the derivative dataset since it contains thedesc
keyword (which is not defined for raw). But had it been/sub-01/anat/sub-01_T1w.nii.gz
, then I would have said that it must have been pointing to a file in the raw dataset. The specification of a relative path, and the specification of what specifically it is relative to, might avoid confusion.The text was updated successfully, but these errors were encountered: