clarification for derivatives of derivatives #345

CPernet · 2019-10-11T09:29:20Z

I've been discussing the issue of creating derivatives from already derived data.

An example of that would be:

sub-001
sub-002
derivatives/fmripreprocess01/sub-001/sub-001_task-XX_preproc_bold.nii
derivatives/fmripreprocess01/sub-002/sub-002_task-XX_preproc_bold.nii

from there, say someone else extend the pipeline to make connectivity matrices, these can be stored as

derivatives/fmripreprocess01/sub-001/sub-001_task-XX_preproc_bold.nii
derivatives/fmripreprocess01/sub-001/sub-001_task-XX_conndata-network_connectivity.nii
derivatives/fmripreprocess01/sub-002/sub-002_task-XX_preproc_bold.nii
derivatives/fmripreprocess01/sub-002/sub-001_task-XX_conndata-network_connectivity.nii

but since it now a different pipeline and according to BEP003:

Each pipeline has a dedicated directory under which it stores all of its outputs

derivatives/fmripreprocess01_connect01/sub-001/sub-001_task-XX_conndata-network_connectivity.nii
derivatives/fmripreprocess01_connect01/sub-002/sub-002_task-XX_conndata-network_connectivity.nii

the question is can we store this without the need to store the sub-00X_task-XX_preproc_bold.nii ?? (with provenance well tracked obviously)
also some users suggested instead derivatives/derivatives/etc .. but that seems to contradict BEP003 idea of a derivative pipeline

The text was updated successfully, but these errors were encountered:

jadesjardins · 2019-10-11T20:24:13Z

I feel that nesting derivatives allows for a cascade of processing steps where the input to each derivative comes from the sub-### folders of the parent directory. This way the derivative hierarchy can provide some of the provenance for the analysis trajectory. This also allows for shared starting points for multiple analyses (a single preprocessing derivative can be used as the input for multiple subsequent analysis procedures).

We have an example of how we like to work with EEG in this way here: https://github.com/BUCANL/bids-examples/tree/face13_nest/eeg_face13

eeg_face13/sub-###

... contains the root project data files that are the inputs resulting in the derivative files...

eeg_face13/derivatives/BIDS-Lossless-EEG/sub-###

...where the code used to generate these derivative data files as well as the execution logs are located respectively at:

eeg_face13/derivatives/BIDS-Lossless-EEG/code
eeg_face13/derivatives/BIDS-Lossless-EEG/log

These derivative files are preprocessed with annotations used to remove artifacts. Next we want to purge the artifacts and segment the data. Because the segmenting is performed on the derivative data (e.g. eeg_face13/derivatives/BIDS-Lossless-EEG/sub-###) and not the root data (eeg_face13/sub-###) I feel that the segmention derivative should be nested inside the preprocessed derivative folder such that the output data are located at:

eeg_face13/derivatives/BIDS-Lossless-EEG/derivatives/BIDS-Seg-Face13-EEGLAB/sub-###

Each of these derivatives (nested or not) can be used independently (e.g. I could simply copy the segmentation derivative if I wanted to try to replicate a result.. or I could copy the preprocessed derivative if I wanted to do a new segmentation from the same input data). The full provenance of the resulting data files, however, requires the full derivative hierarchy up to the root of the project (I think that this is ok and good).

CPernet · 2019-10-22T14:45:13Z

what do MRI people do? @yarikoptic @tyarkoni

keeping the raw data (ie not talking about sharing preprocessed), would you do derivatives/preprocessed and derivatives/stats (but stats depends on preprocessed)
derivatives/preprocessed and derivatives/preprocessed/derivatives/stats (nesting allow almost directly sharing/using from any level)

while nesting seems easier, the spec doesn't specify this (see posts above)

tyarkoni · 2019-10-22T15:46:47Z

I think either approach is compliant. Every derivatives dataset must be a fully compliant BIDS dataset, which implies that you can nest derivatives. Note that the spec also supports a sourcedata/ folder, which would be another way to specify where the sources are (e.g., via symlink).

CPernet · 2019-10-23T08:04:38Z

ok so when nesting, will you go be ok with
/derivatives/preprocessed
/derivatives/preprocessed/derivatives/stats

@sappelhoff @effigies can you foresee any issues with the validator using flat derivatives vs nesting?

tyarkoni · 2019-10-23T13:54:37Z

The first level under derivatives/ is supposed to be used for the pipeline name or description, and every directory under derivatives/ must be its own valid BIDS project. Those are the only relevant constraints, I believe (but I haven't looked back at the spec carefully). So in your example, I think you could have either derivatives/preprocessed and derivatives/stats or derivatives/preprocessed/derivatives/stats. But you could not have derivatives/preprocessed/stats. And in either case, both the preprocessed and stats directories would have to be valid BIDS datasets by themselves.

Note though that the above assumes that the derivatives/ root is inside an existing BIDS dataset. This isn't required, and if you were to avoid nesting and instead place your derivative datasets somewhere else in the filesystem (e.g., under /derivatives/) then you could, e.g., have separate /derivatives/preprocessed/type1/ and /derivatives/preprocessed/stats/ directories. If you needed to link back to the source datasets for provenance purposes, you could put symlinks in sourcedata/. This is probably the most flexible approach, and the only downside is tools that scan the raw BIDS dataset won't know about its derivatives.

effigies · 2019-10-23T14:22:35Z

👍 To what @tyarkoni said, although it is not true that all subdirectories of derivatives/ must be valid derivatives datasets. From the current draft:

Nothing in this specification should be interpreted to disallow the storage/distribution non-compliant derivatives of BIDS datasets. In particular, if a BIDS dataset contains a derivatives/ sub-directory, the contents of that directory may be a heterogeneous mix of BIDS Derivatives datasets and non-compliant derivatives.

But the overall point that each BIDS-Derivatives dataset is a BIDS dataset, and thus may contain sourcedata/ or derivatives/ subdirectories as usual, stands. At this point, it's a question of overall strategy for how you would like to associate related datasets, and BIDS is flexible here.

#345 discuss splitting a pipeline into multiple derivatives with/without nesting

CPernet · 2019-10-23T16:17:06Z

ok thx for clarifying guys - I made a PR into the common derivatives branch of the spec

specify further the pipeline following #345

Andesha mentioned this issue Oct 18, 2019

Example for EEG derivatives: Face13 bids-standard/bids-examples#171

Merged

tyarkoni closed this as completed Oct 22, 2019

tyarkoni reopened this Oct 22, 2019

CPernet pushed a commit that referenced this issue Oct 23, 2019

specify further the pipeline following #345

4fa236f

#345 discuss splitting a pipeline into multiple derivatives with/without nesting

CPernet mentioned this issue Oct 23, 2019

[FIX] specify the derivatives pipeline following #345 #358

Merged

CPernet closed this as completed Oct 23, 2019

effigies added a commit that referenced this issue Oct 30, 2019

Merge pull request #358 from bids-standard/CPernet-patch-1

33de3f1

specify further the pipeline following #345

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarification for derivatives of derivatives #345

clarification for derivatives of derivatives #345

CPernet commented Oct 11, 2019

jadesjardins commented Oct 11, 2019

CPernet commented Oct 22, 2019

tyarkoni commented Oct 22, 2019 •

edited

Loading

CPernet commented Oct 23, 2019

tyarkoni commented Oct 23, 2019

effigies commented Oct 23, 2019

CPernet commented Oct 23, 2019

clarification for derivatives of derivatives #345

clarification for derivatives of derivatives #345

Comments

CPernet commented Oct 11, 2019

jadesjardins commented Oct 11, 2019

CPernet commented Oct 22, 2019

tyarkoni commented Oct 22, 2019 • edited Loading

CPernet commented Oct 23, 2019

tyarkoni commented Oct 23, 2019

effigies commented Oct 23, 2019

CPernet commented Oct 23, 2019

tyarkoni commented Oct 22, 2019 •

edited

Loading