Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json files for EEG, MEG, iEEG can't be "augmented" like those for BOLD? #1044

Closed
Remi-Gau opened this issue Aug 20, 2020 · 6 comments
Closed

Comments

@Remi-Gau
Copy link
Contributor

Posted on neurostars but this topic might be too niche so reposting here.


TL;DR
I have 2 questions:

  • why is it not allowed to add extra-info in certain json files? I don't remember that this is mention in the specifications
  • why is this behavior not also seen for the bold json that also contains extra info?

Working on some "dummy" data for a side project and it creates this BIDS dataset (all the .nii, .edf and other types of data actually empty).

├── CHANGES
├── dataset_description.json
├── README
└── sub-001
    └── ses-001
        ├── beh
        │   ├── sub-001_ses-001_task-easyTargetPractice_run-001_beh.json
        │   └── sub-001_ses-001_task-easyTargetPractice_run-001_beh.tsv
        ├── eeg
        │   ├── sub-001_ses-001_task-targetPractice_run-001_eeg.edf
        │   └── sub-001_ses-001_task-targetPractice_run-001_eeg.json
        ├── func
        │   ├── sub-001_ses-001_task-rest_acq-newTYpe_ce-test_dir-yPos_rec-fastRecon_run-001_echo-1_bold.json
        │   ├── sub-001_ses-001_task-rest_acq-newTYpe_ce-test_dir-yPos_rec-fastRecon_run-001_echo-1_bold.nii.gz
        │   ├── sub-001_ses-001_task-testtask_run-001_bold.json
        │   ├── sub-001_ses-001_task-testtask_run-001_bold.nii.gz
        │   ├── sub-001_ses-001_task-testtask_run-001_events.json
        │   ├── sub-001_ses-001_task-testtask_run-001_events.tsv
        │   └── sub-001_ses-001_task-testtask_run-001_stim.tsv.gz
        ├── ieeg
        │   ├── sub-001_ses-001_task-implantedTargetPractice_run-001_ieeg.edf
        │   └── sub-001_ses-001_task-implantedTargetPractice_run-001_ieeg.json
        └── meg
            ├── sub-001_ses-001_task-magneticTargetPractice_run-001_meg.fif
            └── sub-001_ses-001_task-magneticTargetPractice_run-001_meg.json

When I try to run the command line bids-validator I get this

        1: [ERR] Empty files not allowed. (code: 99 - EMPTY_FILE)
                ./sub-001/ses-001/beh/sub-001_ses-001_task-easyTargetPractice_run-001_beh.tsv
                ./sub-001/ses-001/eeg/sub-001_ses-001_task-targetPractice_run-001_eeg.edf
                ./sub-001/ses-001/ieeg/sub-001_ses-001_task-implantedTargetPractice_run-001_ieeg.edf
                ./sub-001/ses-001/meg/sub-001_ses-001_task-magneticTargetPractice_run-001_meg.fif

Just to make my life easier on the validation side while playing around, I ask the validator to ignore the following in .bidsignore.

*_beh.tsv
*_eeg.edf
*_ieeg.edf
*_meg.fif
*_bold.nii.gz

Obviously as I now have json files with no data counterpart I now get this:

2: [ERR] A json sidecar file was found without a corresponding data file (code: 90 - SIDECAR_WITHOUT_DATAFILE)
                ./sub-001/ses-001/beh/sub-001_ses-001_task-easyTargetPractice_run-001_beh.json
                ./sub-001/ses-001/eeg/sub-001_ses-001_task-targetPractice_run-001_eeg.json
                ./sub-001/ses-001/func/sub-001_ses-001_task-rest_acq-newTYpe_ce-test_dir-yPos_rec-fastRecon_run-001_echo-1_bold.json
                ./sub-001/ses-001/func/sub-001_ses-001_task-testtask_run-001_bold.json
                ./sub-001/ses-001/ieeg/sub-001_ses-001_task-implantedTargetPractice_run-001_ieeg.json
                ./sub-001/ses-001/meg/sub-001_ses-001_task-magneticTargetPractice_run-001_meg.json

But more surprisingly I get this:

1: [ERR] Invalid JSON file. The file is not formatted according the schema. (code: 55 - JSON_SCHEMA_VALIDATION_ERROR)
                ./sub-001/ses-001/eeg/sub-001_ses-001_task-targetPractice_run-001_eeg.json
                        Evidence:  should NOT have additional properties
                ./sub-001/ses-001/ieeg/sub-001_ses-001_task-implantedTargetPractice_run-001_ieeg.json
                        Evidence:  should NOT have additional properties
                ./sub-001/ses-001/meg/sub-001_ses-001_task-magneticTargetPractice_run-001_meg.json
                        Evidence:  should NOT have additional properties

And this is what one of those json contains:

{
    "Instructions": "do this",
    "PowerLineFrequency": 50,
    "SamplingFrequency": 2400,
    "SoftwareFilters": "n/a",
    "TaskName": "implanted Target Practice",
    "iEEGReference": "Cz",
    "extraInfo": {
        "nestedExtraInfo": "something extra"
    }
}
@sappelhoff
Copy link
Member

Just to make my life easier on the validation side while playing around, I ask the validator to ignore the following in .bidsignore.

instead of bidsignoring your empty data files themselves, you could ignore the validator error on empty data files, similar to how we do it in the bids-examples repo:


The issue you report is due to the JSON schemas in the validator against which the content in your actual JSON files gets valiated. For many of these schemas we have a line: "additionalProperties": false

see:

why is it not allowed to add extra-info in certain json files?

To the best of my knowledge, this is to prevent users from adding potentially undocumented and/or confusing data to the JSON files.

I don't remember that this is mention in the specifications

we can re-evaluate this policy and should document the outcome (and then adjust the validator schemas, or "fix" the one for BOLD in case we stay with the current policy). What are your arguments to support additional properties in JSON files?

why is this behavior not also seen for the bold json that also contains extra info?

somebody forgot to add "additionalProperties": false

see:

{
"$schema": "http://json-schema.org/draft-06/schema#",
"type": "object",
"properties": {
"CogAtlasID": {
"type": "string"
},
"CogPOID": {
"type": "string"
},
"EchoTime": {
"type": "number",
"exclusiveMinimum": 0
},
"EffectiveEchoSpacing": {
"type": "number",
"exclusiveMinimum": 0
},
"PhaseEncodingDirection": {
"type": "string",
"enum": ["i", "j", "k", "i-", "j-", "k-"]
},
"RepetitionTime": {
"type": "number",
"exclusiveMinimum": 0
},
"SliceEncodingDirection": {
"type": "string",
"enum": ["i", "j", "k", "i-", "j-", "k-"]
},
"SliceTiming": {
"type": "array",
"items": {
"type": "number",
"minimum": 0
}
},
"TaskName": {
"type": "string"
}
}
}

@sappelhoff
Copy link
Member

sappelhoff commented Sep 3, 2020

To the best of my knowledge, this is to prevent users from adding potentially undocumented and/or confusing data to the JSON files.

another argument that I forgot to document here is, that preventing "additionalProperties" allows for stronger validation. Imagine "additionalProperties" is True, and somebody acidentally adds a field TaskNqme instead of TaskName: In that case there will not be an error, because the validator thinks that TaskNqme is just an additional property, and not a typo of an expected property.

(however, this could be solved made less harmful by always warning as soon as an additional property is detected)


Also, in the maintainers call there have been some voices in favor of "additionalProperties", as it is currently allowed in BOLD.JSON.

The argument was generally that "it doesn't hurt us if people want to document their data".

For this argument, my counter-argument would be:

  • if we do NOT allow this kind of "do whatever you want" documentation, then users will have more incentive to raise issues and bring up discussions on how to add official support for what they want to document
  • at the expense of being annoying, this situation will probably lead to more engagement and to more consistent (and better) documentation of datasets

I am overall against additionalProperties: true, but I'd be fine to leave it as is for BOLD.JSON ...

@sappelhoff
Copy link
Member

@Remi-Gau I think I know now why additionalProperties MUST be true in the BOLD.JSON schema.

Unlike the Ephys schemas, the MRI schemas are not fully specified. That is, there are several "jsons" that do not even have a schema and the only MRI json schema there is (bold.json) is underspecified in terms of JSON fields it could take on according to the spec.

We are currently making a big overhaul of requirement levels and datatypes for all json fields in bids-standard/bids-specification#605

I expect that after that work is done, we'd want to improve the MRI schemas as well

cc @effigies @tsalo @rwblair

@Remi-Gau
Copy link
Contributor Author

hey @sappelhoff

Sorry for not replying sooner. I was busy on another PR. 😉

Thanks a lot for all those explanations: really helps understanding the issue better.

I think my initial approach was pretty much along the "laissez-faire" approach you described as "it doesn't hurt us if people want to document their data" above.

But you make good points and I see how this could lead to some problems and how we hope the current "additionalProperties: false" default should lead to more users engagement.

I still believe that this default should be mentioned in the BIDS specification because this is not entirely clear (well it was not to me so I suspect it might not be to some others): if we want to nudge people towards engaging I think it cannot hurt to be upfront and in your face about it, no?

In the mean time this default could definitely be added to the BIDS starter kit.

@sappelhoff
Copy link
Member

if we want to nudge people towards engaging I think it cannot hurt to be upfront and in your face about it, no?

good point 👍

@Remi-Gau
Copy link
Contributor Author

Will close this for now as it has been opened in the bids specs repo and I suspect this is where the discussion will happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants