Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMA] add physio to schema or EEG, MEG, iEEG, PET, ASL, DWI #894

Merged
merged 9 commits into from
Feb 1, 2022

Conversation

Remi-Gau
Copy link
Collaborator

Fixes bids-standard/bids-validator#1412

This updates the schema so that all modalities support the physio and stim suffixes.

This might need an update counterpart on the validator side.

@Remi-Gau
Copy link
Collaborator Author

pinging our schema master @tsalo & BEP lead @mnoergaard @melanieganz @patsycle @HenkMutsaerts

@HenkMutsaerts
Copy link
Collaborator

Nice, the only response I can think of, is why we don't declare this once in the code if this should be the same across all scantypes? Would that be more readable?

@Remi-Gau
Copy link
Collaborator Author

I think we can definitely think of different ways the schema could be "refactored" (and I usually am not a big fan of duplications) but I am not sure we are there yet in terms . Though we could easily get there if we apply to this anat.

But whatever is decided should be done in a separate PR IMHO

@tsalo
Copy link
Member

tsalo commented Oct 14, 2021

the only response I can think of, is why we don't declare this once in the code if this should be the same across all scantypes? Would that be more readable?

@HenkMutsaerts at the moment, the valid entities for physio files are not the same across datatypes. If we can figure out how to implement a "match" (see #620), then it would definitely be feasible to define them once. Unfortunately, we have no explicit link between associated files like physio acquired with imaging data and their "target" files within the schema, so we need to define all of the entities applicable to each datatype's physio files separately.

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem I see is that the new files are missing from the new datatypes' filename templates throughout the specification. I think the associated macro calls need to be updated to include the physio and stim suffixes.

EDIT: Also, if we do end up supporting anat physio, we'll be looking at a bunch of new groups- one for each of the existing anatomical MRI suffix groups for which concurrent physio would be possible.
EDIT 2: Although... physio files shouldn't include entities like echo or part, unless they come from different acquisitions, which can happen with qMRI file collections... this is a confusing one.

@Remi-Gau
Copy link
Collaborator Author

The only problem I see is that the new files are missing from the new datatypes' filename templates throughout the specification. I think the associated macro calls need to be updated to include the physio and stim suffixes.

Ha indeed. Will update those as well.
This might make things a bit redundant with the content of the template section in the physio page but I think it makes it clearer to the reader how physio file names should be constructed.

EDIT: Also, if we do end up supporting anat physio, we'll be looking at a bunch of new groups- one for each of the existing anatomical MRI suffix groups for which concurrent physio would be possible.

I would suggest applying this to anat in another PR: because a) this will be trickier, b) this is rare use-case (or maybe I am missing something).

@Remi-Gau
Copy link
Collaborator Author

The filename template creation is failing for MEG.

The recording is not appended as the last entity.

sub-<label>/
    [ses-<label>/]
        meg/
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_proc-<label>][_split-<index>]_meg.<extension>
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_proc-<label>][_split-<index>]_meg.json
            sub-<label>[_ses-<label>][_task-<label>][_acq-<label>][_space-<label>]_markers.mrk
            sub-<label>[_ses-<label>][_task-<label>][_acq-<label>][_space-<label>]_markers.sqd
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>]_events.json
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>]_events.tsv
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_recording-<label>][_proc-<label>][_split-<index>]_physio.json
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_recording-<label>][_proc-<label>][_split-<index>]_physio.tsv.gz
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_recording-<label>][_proc-<label>][_split-<index>]_stim.json
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_recording-<label>][_proc-<label>][_split-<index>]_stim.tsv.gz

This seems because recording is not set to be the last entity of all raw entities, which would be what I would expect.
https://github.com/bids-standard/bids-specification/blob/master/src/schema/rules/entities.yaml#L20

@tsalo any reason this should NOT be the case ?

@tsalo
Copy link
Member

tsalo commented Oct 18, 2021

@Remi-Gau It looks like proc and split come after recording in those rules, and that seems to be reflected in the filename templates, so it must be an issue with how the entities themselves are organized. On the bright side, it doesn't look like proc, split, and space can appear with recording in any valid filenames at the moment, so we could reorder those entities if we want. Since space is a mixed raw/derivative entities (raw for MEG/EEG/iEEG and derivative for other datatypes), I think we still want that at the end, but I believe we could reorder the entities to be proc --> split--> recording --> space without being backwards-incompatible. WDYT?

It does seem strange that proc is allowed for physio data. Shouldn't it denote multiple versions of the same data processed using different devices or algorithms? Other entities that indicate multiple versions from the same run are dropped for physio data (echo, for example). Then again, fMRI physio allows rec, which seems strange in the same way.

Also, should split be included? Since the MEG data are separated based on file size, are physio files going to be split up in the same way?

@Remi-Gau
Copy link
Collaborator Author

@Remi-Gau It looks like proc and split come after recording in those rules, and that seems to be reflected in the filename templates, so it must be an issue with how the entities themselves are organized. On the bright side, it doesn't look like proc, split, and space can appear with recording in any valid filenames at the moment, so we could reorder those entities if we want. Since space is a mixed raw/derivative entities (raw for MEG/EEG/iEEG and derivative for other datatypes), I think we still want that at the end, but I believe we could reorder the entities to be proc --> split--> recording --> space without being backwards-incompatible. WDYT?

yes that was exactly my thought too.

It does seem strange that proc is allowed for physio data. Shouldn't it denote multiple versions of the same data processed using different devices or algorithms? Other entities that indicate multiple versions from the same run are dropped for physio data (echo, for example). Then again, fMRI physio allows rec, which seems strange in the same way.

Also, should split be included? Since the MEG data are separated based on file size, are physio files going to be split up in the same way?

Tempted to say that if we have rec for func data then we should have it too for proc for meg.

For split: I am not as sure.

Pinging several of the people discussing these types simultaneous recording in issues #86

@CPernet @dorahermes @robertoostenveld

@CPernet
Copy link
Collaborator

CPernet commented Oct 20, 2021

if split, we should have scan.tsv files and those must match is that right? seems ok to me
(i still think having to scant through datasets and scan.tsv files to figure out which ones are truly simultaneous acquisitions is rubbish btw)

@Remi-Gau
Copy link
Collaborator Author

i still think having to scant through datasets and scan.tsv files to figure out which ones are truly simultaneous acquisitions is rubbish btw

You mean simultaneous for EEG and fMRI for example, right? For physio I would assume that the recording MUST be simultaneous between physio and the other datatype it matches.

@CPernet
Copy link
Collaborator

CPernet commented Oct 20, 2021

yes physio is assumed to be .. but does not have to ..
an example: galvanic response to images - then do an imaging session

@Remi-Gau
Copy link
Collaborator Author

For split: I am not as sure.

Pros of matching the file name fully:

  • is transparent how the matching is done
  • is consistent with all the other way physio behaves for other data types and with how it has been applied so far.
  • we don't have to add an "except for MEG files with a split entity) in the physio page

Cons:

  • leads to data duplication: same content of a physio file appears under 2 names.

Am I missing something?

As a bids maintainer I tend to prefer full matching. But it is not a hill I would die on.

@tsalo
Copy link
Member

tsalo commented Oct 20, 2021

I think it really comes down to whether we can consider the entity "within-acquisition" or "between-acquisition". Entities denoting data from the same recording, like echo and part, should not be included in the physio filenames. The same rule applies to events files. I think there must have been a decision point in the past where the maintainers had to decide if events and physio files would be duplicated across echoes or not, but instead they chose to leverage the inheritance principle. I lean toward that, but the big issues for me are (1) rec's behavior doesn't seem consistent with that idea, and (2) I don't know how split should be treated.

Based on the following line from the Scans file section of the spec, I'd assume that split should be treated like echo:

Each neural recording file SHOULD be described by exactly one row. Some recordings consist of multiple parts, that span several files, for example through echo-, part-, or split- entities. Such recordings MUST be documented with one row per file.

sappelhoff
sappelhoff previously approved these changes Nov 14, 2021
Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually this PR LGTM, didn't dive too deep into the discussion on simultaneous recordings. Once the merge conflict is resolved, I think this PR is mergeable.

@Remi-Gau
Copy link
Collaborator Author

I am still unsure about what how to deal with this split case for physio file with MEG.

People having to split their acquired physio data in 2 to match the content of the MEG feels like an extra curation step and potential source of error.

Also as mentioned by @tsalo the events.tsv are supposed to follow a <match> pattern but don't have the proc and split entity.

This is template for MEG that would come out of merging this PR as is.

sub-<label>/
    [ses-<label>/]
        meg/
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_proc-<label>][_split-<index>]_meg.<extension>
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_proc-<label>][_split-<index>]_meg.json
            sub-<label>[_ses-<label>]_acq-<calibration>_meg.dat
            sub-<label>[_ses-<label>]_acq-<crosstalk>_meg.fif
            sub-<label>[_ses-<label>][_task-<label>][_acq-<label>][_space-<label>]_markers.mrk
            sub-<label>[_ses-<label>][_task-<label>][_acq-<label>][_space-<label>]_markers.sqd
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>]_events.json
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>]_events.tsv
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_proc-<label>][_split-<index>][_recording-<label>]_physio.json
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_proc-<label>][_split-<index>][_recording-<label>]_physio.tsv.gz
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_proc-<label>][_split-<index>][_recording-<label>]_stim.json
            sub-<label>[_ses-<label>]_task-<label>[_acq-<label>][_run-<index>][_proc-<label>][_split-<index>][_recording-<label>]_stim.tsv.gz

@sappelhoff
Copy link
Member

sappelhoff commented Nov 15, 2021

Right - I agree with @tsalo opinion from #894 (comment)

I think it really comes down to whether we can consider the entity "within-acquisition" or "between-acquisition". Entities denoting data from the same recording, like echo and part, should not be included in the physio filenames. The same rule applies to events files. I think there must have been a decision point in the past where the maintainers had to decide if events and physio files would be duplicated across echoes or not, but instead they chose to leverage the inheritance principle.

He points out two issues:

(1) rec's behavior doesn't seem consistent with that idea, and (2) I don't know how split should be treated.

re: 1) --> no idea

re: 2) --> Just looked into this ... are we being inconsistent in the spec? 😬

split --> rows in scans.tsv ⚡ inconsistent?

UPDATE: I simply misunderstood this (see comments below). There is no inconsistency.

The (MEG-centric) documentation of the split entity and MEG split files both say:

If there are multiple parts of a recording and the optional scans.tsv is provided, remember to list all files separately in scans.tsv and that the entries for the acq_time column in scans.tsv MUST all be identical, as described in Scans file.

Yet the scans section says:

Each neural recording file SHOULD be described by exactly one row. Some recordings consist of multiple parts, that span several files, for example through echo-, part-, or split- entities. Such recordings MUST be documented with one row per file.

🤕

MNE-BIDS writes one row per "split" (instead of one row per neural recording, even if there are multiple "splits"), see: https://github.com/mne-tools/mne-bids/blob/c85dafaaac0ff79127b9513e130c566bc2313e0e/mne_bids/write.py#L458-L474 or try it yourself (click on details):

# %%
# do first:
# pip install mne
# pip install mne-bids
import os

import mne
import mne_bids
import numpy as np

# %%
# Create raw
nchannels = 100
ntimes = int(1e7)
data = np.random.randn(nchannels, ntimes)

info = mne.create_info(ch_names=[f"{i}" for i in range(nchannels)],
                       sfreq=1000, ch_types="mag")

raw = mne.io.RawArray(data, info)

# %%
# Save raw to FIFF
raw_fname = os.path.join(os.path.expanduser("~"), "Desktop", "test_raw.fif")
raw.save(raw_fname)
raw = mne.io.read_raw_fif(raw_fname)


# %%
# Save raw to BIDS
root = os.path.join(os.path.expanduser("~"), "Desktop", "split_test_bids")
os.makedirs(root, exist_ok=True)
bids_path = mne_bids.BIDSPath(subject="1", session="1", task="rest", root=root)
mne_bids.write_raw_bids(raw, bids_path)

# %%

@sappelhoff sappelhoff dismissed their stale review November 15, 2021 10:12

inconsistency came up

@sappelhoff
Copy link
Member

or am I misunderstanding the above ☝️ This part from scans.tsv

Such recordings MUST be documented with one row per file

I am confused now whether "a recording" is the combination of all splits ... OR each split file.

@tsalo
Copy link
Member

tsalo commented Dec 14, 2021

I am confused now whether "a recording" is the combination of all splits ... OR each split file

@sappelhoff the recording is the combination of all splits, but you must log each file separately within the scans.tsv file.

@Remi-Gau
Copy link
Collaborator Author

suggestion for this PR:

we get all the non MEG related things in with version 1.7 and we have an open PR left to only deal with the MEG physio aspect of this.

@sappelhoff
Copy link
Member

Coming back to this ...

we get all the non MEG related things in with version 1.7 and we have an open PR left to only deal with the MEG physio aspect of this.

generally I agree.

BUT: is the only "MEG related thing" that is open the question of what to do with the split entity for physio and stim files?

If yes, the I think @tsalo's comment from #894 (comment) is valid (now that also my misunderstanding above was cleared up, thanks btw):

Based on the following line from the Scans file section of the spec, I'd assume that split should be treated like echo:

@Remi-Gau
Copy link
Collaborator Author

is the only "MEG related thing" that is open the question of what to do with the split entity for physio and stim files?

If yes, the I think @tsalo's comment from #894 (comment) is valid (now that also my misunderstanding above was cleared up, thanks btw):

Based on the following line from the Scans file section of the spec, I'd assume that split should be treated like echo:

I think this is the only MEG specific issue.

Happy to make split for MEG behave the way split does for MRI.

@sappelhoff sappelhoff requested a review from tsalo January 23, 2022 15:16
@Remi-Gau Remi-Gau closed this Jan 25, 2022
@Remi-Gau Remi-Gau deleted the add_physio_to_schema branch January 25, 2022 21:47
@Remi-Gau Remi-Gau restored the add_physio_to_schema branch January 25, 2022 21:48
@Remi-Gau Remi-Gau reopened this Jan 25, 2022
@codecov
Copy link

codecov bot commented Jan 30, 2022

Codecov Report

Merging #894 (c8320c2) into master (17ca52d) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     bids-standard/bids-specification#894   +/-   ##
=======================================
  Coverage   36.16%   36.16%           
=======================================
  Files           8        8           
  Lines         788      788           
=======================================
  Hits          285      285           
  Misses        503      503           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f888291...c8320c2. Read the comment docs.

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sappelhoff sappelhoff added this to the 1.7.0 milestone Feb 1, 2022
@sappelhoff sappelhoff merged commit a277b87 into bids-standard:master Feb 1, 2022
@sappelhoff sappelhoff added the exclude-from-changelog This item will not feature in the automatically generated changelog label Aug 7, 2022
@Remi-Gau Remi-Gau deleted the add_physio_to_schema branch August 17, 2022 07:22
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task not supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-from-changelog This item will not feature in the automatically generated changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SCHEMA] physio and stim not included for EEG, MEG, iEEG and PET
6 participants