Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarification for derivatives of derivatives #345

Closed
CPernet opened this issue Oct 11, 2019 · 7 comments
Closed

clarification for derivatives of derivatives #345

CPernet opened this issue Oct 11, 2019 · 7 comments

Comments

@CPernet
Copy link
Collaborator

CPernet commented Oct 11, 2019

I've been discussing the issue of creating derivatives from already derived data.

An example of that would be:

sub-001
sub-002
derivatives/fmripreprocess01/sub-001/sub-001_task-XX_preproc_bold.nii
derivatives/fmripreprocess01/sub-002/sub-002_task-XX_preproc_bold.nii

from there, say someone else extend the pipeline to make connectivity matrices, these can be stored as

derivatives/fmripreprocess01/sub-001/sub-001_task-XX_preproc_bold.nii
derivatives/fmripreprocess01/sub-001/sub-001_task-XX_conndata-network_connectivity.nii
derivatives/fmripreprocess01/sub-002/sub-002_task-XX_preproc_bold.nii
derivatives/fmripreprocess01/sub-002/sub-001_task-XX_conndata-network_connectivity.nii

but since it now a different pipeline and according to BEP003:

Each pipeline has a dedicated directory under which it stores all of its outputs

derivatives/fmripreprocess01_connect01/sub-001/sub-001_task-XX_conndata-network_connectivity.nii
derivatives/fmripreprocess01_connect01/sub-002/sub-002_task-XX_conndata-network_connectivity.nii
  • the question is can we store this without the need to store the sub-00X_task-XX_preproc_bold.nii ?? (with provenance well tracked obviously)
  • also some users suggested instead derivatives/derivatives/etc .. but that seems to contradict BEP003 idea of a derivative pipeline
@jadesjardins
Copy link

I feel that nesting derivatives allows for a cascade of processing steps where the input to each derivative comes from the sub-### folders of the parent directory. This way the derivative hierarchy can provide some of the provenance for the analysis trajectory. This also allows for shared starting points for multiple analyses (a single preprocessing derivative can be used as the input for multiple subsequent analysis procedures).

We have an example of how we like to work with EEG in this way here: https://github.com/BUCANL/bids-examples/tree/face13_nest/eeg_face13

eeg_face13/sub-###

... contains the root project data files that are the inputs resulting in the derivative files...

eeg_face13/derivatives/BIDS-Lossless-EEG/sub-###

...where the code used to generate these derivative data files as well as the execution logs are located respectively at:

eeg_face13/derivatives/BIDS-Lossless-EEG/code
eeg_face13/derivatives/BIDS-Lossless-EEG/log

These derivative files are preprocessed with annotations used to remove artifacts. Next we want to purge the artifacts and segment the data. Because the segmenting is performed on the derivative data (e.g. eeg_face13/derivatives/BIDS-Lossless-EEG/sub-###) and not the root data (eeg_face13/sub-###) I feel that the segmention derivative should be nested inside the preprocessed derivative folder such that the output data are located at:

eeg_face13/derivatives/BIDS-Lossless-EEG/derivatives/BIDS-Seg-Face13-EEGLAB/sub-###

Each of these derivatives (nested or not) can be used independently (e.g. I could simply copy the segmentation derivative if I wanted to try to replicate a result.. or I could copy the preprocessed derivative if I wanted to do a new segmentation from the same input data). The full provenance of the resulting data files, however, requires the full derivative hierarchy up to the root of the project (I think that this is ok and good).

@CPernet
Copy link
Collaborator Author

CPernet commented Oct 22, 2019

what do MRI people do? @yarikoptic @tyarkoni

keeping the raw data (ie not talking about sharing preprocessed), would you do derivatives/preprocessed and derivatives/stats (but stats depends on preprocessed)
derivatives/preprocessed and derivatives/preprocessed/derivatives/stats (nesting allow almost directly sharing/using from any level)

while nesting seems easier, the spec doesn't specify this (see posts above)

@tyarkoni
Copy link

tyarkoni commented Oct 22, 2019

I think either approach is compliant. Every derivatives dataset must be a fully compliant BIDS dataset, which implies that you can nest derivatives. Note that the spec also supports a sourcedata/ folder, which would be another way to specify where the sources are (e.g., via symlink).

@tyarkoni tyarkoni reopened this Oct 22, 2019
@CPernet
Copy link
Collaborator Author

CPernet commented Oct 23, 2019

ok so when nesting, will you go be ok with
/derivatives/preprocessed
/derivatives/preprocessed/derivatives/stats

@sappelhoff @effigies can you foresee any issues with the validator using flat derivatives vs nesting?

@tyarkoni
Copy link

The first level under derivatives/ is supposed to be used for the pipeline name or description, and every directory under derivatives/ must be its own valid BIDS project. Those are the only relevant constraints, I believe (but I haven't looked back at the spec carefully). So in your example, I think you could have either derivatives/preprocessed and derivatives/stats or derivatives/preprocessed/derivatives/stats. But you could not have derivatives/preprocessed/stats. And in either case, both the preprocessed and stats directories would have to be valid BIDS datasets by themselves.

Note though that the above assumes that the derivatives/ root is inside an existing BIDS dataset. This isn't required, and if you were to avoid nesting and instead place your derivative datasets somewhere else in the filesystem (e.g., under /derivatives/) then you could, e.g., have separate /derivatives/preprocessed/type1/ and /derivatives/preprocessed/stats/ directories. If you needed to link back to the source datasets for provenance purposes, you could put symlinks in sourcedata/. This is probably the most flexible approach, and the only downside is tools that scan the raw BIDS dataset won't know about its derivatives.

@effigies
Copy link
Collaborator

👍 To what @tyarkoni said, although it is not true that all subdirectories of derivatives/ must be valid derivatives datasets. From the current draft:

Nothing in this specification should be interpreted to disallow the storage/distribution non-compliant derivatives of BIDS datasets. In particular, if a BIDS dataset contains a derivatives/ sub-directory, the contents of that directory may be a heterogeneous mix of BIDS Derivatives datasets and non-compliant derivatives.

But the overall point that each BIDS-Derivatives dataset is a BIDS dataset, and thus may contain sourcedata/ or derivatives/ subdirectories as usual, stands. At this point, it's a question of overall strategy for how you would like to associate related datasets, and BIDS is flexible here.

CPernet pushed a commit that referenced this issue Oct 23, 2019
#345  discuss splitting a pipeline into multiple derivatives with/without nesting
@CPernet
Copy link
Collaborator Author

CPernet commented Oct 23, 2019

ok thx for clarifying guys - I made a PR into the common derivatives branch of the spec

@CPernet CPernet closed this as completed Oct 23, 2019
effigies added a commit that referenced this issue Oct 30, 2019
specify further the pipeline following #345
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants