[SCHEMA] Reorganize schema #609

tsalo · 2020-09-16T17:10:04Z

Closes #603. This does not currently include updates to the schema-related code, but we can use it to discuss the changes to the schema itself.

entities.yaml specifies order, while actual definitions get their own files.

Some fields will probably be general, while others will be specific to the item class.

tsalo · 2020-09-16T17:13:09Z

@satra @yarikoptic I'd especially like to get your thoughts on this structure.

tsalo · 2020-09-16T17:24:04Z

src/schema/items/T1w.yaml

+  - .nii.gz
+  - .nii
+  - .json
+entities:


I think this is the main thing that @yarikoptic will object to with this structure. If we have a single file for each suffix, for example, we will have a very large number of files to update any time we change the specification's supported entities.

This is especially true for anatomical MRI scans, in which many suffixes have the same rules.

On the other hand, I think there's a lot of benefit in having suffix-specific definitions at least. Perhaps we could start with the unfortunate duplication, and then try an inheritance-like procedure as discussed in #588. For example, there could be "parent" YAML files with entities and extensions specified that the "child" YAMLs (like T1w.yaml) could inherit from. These parent files would correspond to the subgroups we currently have in our datatype YAML files. Does that make sense?

But why to bring in entities and extensions in here?

It is not per se "item"'s property, in particular whenever it might be both a suffix and a datatype (ref: Separate schema item definitions from layout rules #603 (comment))

I think records in items/ should have no "class-specific" attributes and be of a uniform schema (which attributes they could have etc). I think if those are introduced, establishing later a schema (and a validator) for our schema would be trickier

What if we keep them ("extensions", "entities") where/as they are (i.e. datatypes/anat.yaml for T1w). Then files in items/ could just provide a file for every "name" (well -- "altname") mentioned in the datatypes/*yaml file in any section, thus just centralizing their description etc:

Even extensions could get their class (i.e. "extension") and we could have a bunch of (hidden ;)) files (like .nii.gz.yaml) which would provide description etc

was your comment a reply to mine? github seems placed it above mine -- just want to make sure

#609 (comment) was before #609 (comment).

I am inclined to think that the entities and extensions are properties of the items. It's just that we need to treat instances where one term refers to two difference classes (e.g., meg as a datatype and suffix) as separate items. There should be a datatype-meg and a suffix-meg, for example. While that is my preference, though, I'm willing to defer to folks with more experience in this area.

I think records in items/ should have no "class-specific" attributes and be of a uniform schema (which attributes they could have etc). I think if those are introduced, establishing later a schema (and a validator) for our schema would be trickier

I don't think that's entirely possible. For example, entities are going to have features that are distinct from other classes, such as whether they take an index or a label, as well as their "entity-key". Metadata fields, at minimum, need "units". I think those are features of the actual items, rather than how they interact with other classes.

What if we keep them ("extensions", "entities") where/as they are (i.e. datatypes/anat.yaml for T1w). Then files in items/ could just provide a file for every "name" (well -- "altname") mentioned in the datatypes/*yaml file in any section, thus just centralizing their description etc:

Well, on the bright side we can always change things later, so I'm willing to give it a try. I can start with the filename, the description, and a "display-name" (per @effigies suggestion) for each of the suffixes. If we want to ultimately add the entities, extensions, and possibly metadata fields later on, that should be easy enough.

eh the call ended while I was staring at this PR and thought to may be chat about it further (sorry for being slow with replies on github): I think if we make this items/ folder into terms/ it could be pretty much what BIDS_Terms folder is (just nicer yamls, from which jsonlds would be generated) with just with descriptions tuned up to not provide any "BIDS formatting" specifics, and rather describing the actual generic concept (subject, session, MR enhancement contrast agent, etc). Then it would be for schema "upstairs" (in entities/ etc) to use the terms provided under terms/, so any "word" used in the schema has clear mapping to a term.
It would be ok for the same term to be used in different contexts (datatype/ folder vs a filename _suffix). If there is a need to provide a "concept" of a composite term + context where it is used, I think we (or whoever needs them) could automatically "mint" them, at the level of composition they need. So there could be a concept of sub-*/[ses-*/]<datatype-term>/ corresponding to a "BIDS folder with data for a single acquisition session of a single subject", thus reflecting layout limitation of BIDS (i.e. we cannot have a folder with eeg data for all subjects).
I think it also boils down to "contexts" to provide mapping if there is no 1-to-1 mapping. E.g. (hypothetical since no concrete case comes to mind ATM; and may be not good here since IMHO the example term is a composition of modality + concept): If there was an _nchan-<number> entity to signal number of channels in the file, then it would correspond to either ECGChannelCount or EEGChannelCount etc depending on the context -- datatype for which it is listed. So we would then need to be able somehow to say that entity points to different terms based on some other level (datatype ATM). For that we could provision that any entity listing does not only list its "use" (optional vs required; may be there is a better name) but also a "term". E.g. datatypes/eeg.yaml could have

... entities: sub: required ses: optional task: required acq: optional run: optional nchan: use: optional term: EEGChannelCount

and allow similar gimmick for any place where term could be used and we need to remap depending on the context.

To me, it is not yet totally clear on where we would need such "composite" terms. Since even in the case of EEGChannelCount, it is sameAs a generic "http://purl.org/nidash/nidm#NumberOfChannels", so I think overall it might be a utopia to fight combinatorics at the level of terms ;), and we should get away from it.

yarikoptic

Initial feedback

yarikoptic · 2020-09-16T18:30:29Z

src/schema/items/recording.yaml

+  More than one continuous recording file can be included (with different
+  sampling frequencies).
+  In such case use different labels.
+  For example: `_recording-contrast`, `_recording-saturation`.


unrelated to this PR, I wonder if we should just reserve examples: key in the records to contain a list of possible examples? That would de-mix description from examples, and allow to harmonize rendering (ATM it could be "For example:", "e.g.", etc)

I'd like to avoid incorporating examples into the schema files since a lot of the examples used throughout the specification for different entities are section-specific. For example, acq has different examples for anatomical MRI and functional MRI in the text.

I think you're right, though, that the examples should be removed from the entity definitions. We should clean up those definitions.

I wrote a response nodding in agreement but then realized -- we are talking about the super dooper machine readable filenaming standard! If we have ANY examples we would anyways need to parse and validate them to be legit or specify them as records ({'datatype': 'anat', 'subject': '01', 'acq': 'blah', 'suffix: 'T1w', 'extension': '.nii.gz'}) to start with and render when presented (as sub-{subject}/{datatype}/...) since we know entities order etc. Then any section of the bids specification "manual" could just limit examples to a specific filter (datatype: anat), and provide only matching examples. But then I also see a benefit from documentation providing examples across different datatypes for a given entity in data type agnostic section ;)

but anyways -- it is indeed a separate topic for some future work and indeed they just need to be removed from descriptions.

The filename format rendering will be generated in the specification text in #610. The examples I'm referring to are more text-based, like "acq may refer to something like different resolutions of T1w scans" in the Anatomical MRI section, while in the MEG _photos section it might say that acq is good for distinguishing "acquisition of different photos of the same face". Those kinds of examples are highly section-specific and (IMHO) shouldn't be in the schema.

yarikoptic · 2020-09-16T18:54:14Z

src/schema/items/T1w.yaml

+  - .nii.gz
+  - .nii
+  - .json
+entities:


But why to bring in entities and extensions in here?

It is not per se "item"'s property, in particular whenever it might be both a suffix and a datatype (ref: Separate schema item definitions from layout rules #603 (comment))

I think records in items/ should have no "class-specific" attributes and be of a uniform schema (which attributes they could have etc). I think if those are introduced, establishing later a schema (and a validator) for our schema would be trickier

What if we keep them ("extensions", "entities") where/as they are (i.e. datatypes/anat.yaml for T1w). Then files in items/ could just provide a file for every "name" (well -- "altname") mentioned in the datatypes/*yaml file in any section, thus just centralizing their description etc:

Even extensions could get their class (i.e. "extension") and we could have a bunch of (hidden ;)) files (like .nii.gz.yaml) which would provide description etc

yarikoptic · 2020-09-16T18:55:00Z

src/schema/items/acq.yaml

@@ -0,0 +1,21 @@
+---
+name: Acquisition


am I catching it right that filename pretty much specifies the altname (#588)?

If @effigies' idea is accepted then yes I would rename these files to use the altnames instead.

Also make markers and headshape into auxmodalities.

Related to bids-standard#593.

yarikoptic · 2020-09-25T20:10:58Z

src/schema/items/echo.yaml

+name: Echo
+class: entity
+description: |
+  Multi-echo data MUST be split into one file per echo.


oh, that is a good example where we have not just a description of a term (EchoTime which could be relevant for both entity AND sidecar file), but the description incorporating BIDS specific that it is REQUIRED (as an entity) in cases of multi-echo recording. So now I wonder if indeed descriptions in terms should have some reflection of being BIDS-Terms ;) or we just provide descriptions like this at the level of entity etc definitions (which would also describe format etc) to compliment generic description of the term/concept ("The echo time (TE) for the acquisition, specified in seconds").

yeap yeap -- the latter: term description should be generic. entity etc description would compliment with additional BIDS specific information for that "level" (term vs datatype vs suffix)! LD people then could get a proper list of "composite" terms (e.g. entity-EchoTime and sidecar-EchoTime, although that one would unlikely have any custom description) with "composite" descriptions thus not conflicting or duplicating anything!

oh wonderful - and validators then could validate that various instances of the term (e.g. in sidecar and filename entity) correspond since they do come from the same term, thus not hardcoding any kind of association!

The one I'm stuck on right now is "modality". Under Common Principles, modality has a very nice new definition:

Modality: the category of brain data recorded by a file. For MRI data, different pulse sequences are considered distinct modalities, such as T1w, bold or dwi. For passive recording techniques, such as EEG, MEG or iEEG, the technique is sufficiently uniform to define the modalities eeg, meg and ieeg. When applicable, the modality is indicated in the suffix. The modality may overlap with, but should not be confused with the data type.

On the other hand, the modality entity (mod-<label>) has its own definition, where it refers to the corresponding modality for the file. The Common Principles definition really doesn't work for the entity, but I don't like the idea of making the principle definition less informative.

I don't think there's a good way of making one definition for both. WDYT?

Why do you think it doesn't work for the entity, because of When applicable, the modality is indicated in the suffix?

Note: The only use of _mod- ATM is actually to absorb the suffix in MRI data whenever suffix becomes _defacemask, so if it works for suffix (e.g. T1w), it works for the entity IMHO.

The full definition of "modality" just doesn't seem relevant to the corresponding entity. At least it provides too much information that might confuse people. I don't want to cut it down though. I think the level of detail is necessary to understand the main concept defined in Common principles.

I'm just going to drop Common principles from the schema for now. Since the focus on this PR is separating context from description and adding the BIDS terms, I think we can just circle around to it later.

yarikoptic · 2020-09-25T20:40:11Z

src/schema/items/task.yaml

+  Each task has a unique label that MUST only consist of letters and/or
+  numbers (other characters, including spaces and underscores, are not
+  allowed).
+  Those labels MUST be consistent across subjects and sessions.


Similarly to -echo entity above, this description is not description of the "task" term, but again for the restriction bids imposes, and specifically to the "entity's label". So there could be a term "Task" describing what the heck task as a term in general is about. Then entity-task (composite) is currently what TaskName is currently. but IMHO the one which is a composition of a task as a BIDS entity which imposes additional restrictions. And I do not see ATM any "relationship" to be established between TaskName and TaskDescription -- that they aren't completely independent one from another. Most likely there should be 3 terms (Task, TaskName, TaskDescription) where the latter two are properties of a Task.

tsalo · 2020-09-29T18:14:05Z

Auxiliary data types/modalities are proving to be very difficult to manage in #610, so I'm leaning toward adopting a purely top-down organization with this PR. I think we can adjust as needed after.

EDIT: To clarify, this means removing the auxdatatypes/auxmodalities folder and simply adding those suffixes as new groups to the individual files in datatypes/.

This shifts to a pure top-down approach for the moment.

@nqueder

I used the jsonld version. Thanks to @nqueder.

…ification into ref/schema-reorg

tsalo · 2021-03-17T18:37:23Z

This is way out of date and combines too many changes. I've opened #762, which may be a better way to add the metadata files at least. We can discuss reorganizing the schema itself in another issue/PR.

tsalo added 2 commits September 16, 2020 13:05

Initial reorganization.

9b03fc4

entities.yaml specifies order, while actual definitions get their own files.

Clarify YAML structure

b5af5ba

Some fields will probably be general, while others will be specific to the item class.

Add example suffix file.

709351d

tsalo commented Sep 16, 2020

View reviewed changes

yarikoptic reviewed Sep 16, 2020

View reviewed changes

tsalo added 4 commits September 17, 2020 18:52

Draft other class examples.

594b9e9

Change auxdatatypes to auxmodalities.

036ce51

Also make markers and headshape into auxmodalities.

Add Common principles.

5f53f35

Rename modalities.yaml.

6b83898

Related to bids-standard#593.

yarikoptic reviewed Sep 25, 2020

View reviewed changes

tsalo mentioned this pull request Sep 28, 2020

[INFRA] SCHEMA: Declare entities by concept names, add entity field for filename components #616

Merged

tsalo added 2 commits September 29, 2020 14:11

items --> terms

97a62e0

Add auxmodalities.

3587875

tsalo added 9 commits September 29, 2020 14:43

Move aux modalities into main datatypes.

98e5263

This shifts to a pure top-down approach for the moment.

Copy over terms from BIDS_Terms.

d286048

I used the jsonld version. Thanks to @nqueder.

Update datatypes.

1e6dc8f

Rename some more.

bf55e72

Drop common principles.

74cd786

Merge terms and remove OPTIONAL/REQUIRED/etc.

60f3cc2

Merge branch 'master' into ref/schema-reorg

52d965e

Update and add more terms.

3aa1324

Merge branch 'ref/schema-reorg' of https://github.com/tsalo/bids-spec…

047d037

…ification into ref/schema-reorg

This was referenced Oct 4, 2020

[ENH] Add part entity for complex-valued data #424

Merged

Add metadata and conditional logic tsalo/bids-specification#2

Closed

tsalo changed the title ~~[INFRA] Reorganize schema~~ [SCHEMA] Reorganize schema Oct 16, 2020

tsalo mentioned this pull request Dec 31, 2020

Add Required/Recommended Metadata info to Schema #699

Open

tsalo closed this Mar 17, 2021

tsalo mentioned this pull request Feb 1, 2022

[SCHEMA, FIX] Remove mentions of auxdatatypes #996

Merged

tsalo added schema Issues related to the YAML schema representation of the specification. Patch version release. schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release. labels Apr 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCHEMA] Reorganize schema #609

[SCHEMA] Reorganize schema #609

tsalo commented Sep 16, 2020

tsalo commented Sep 16, 2020

tsalo Sep 16, 2020

tsalo Sep 16, 2020

tsalo Sep 16, 2020

yarikoptic Sep 16, 2020

yarikoptic Sep 16, 2020

tsalo Sep 16, 2020

tsalo Sep 17, 2020

yarikoptic Sep 25, 2020

yarikoptic left a comment

yarikoptic Sep 16, 2020

tsalo Sep 16, 2020 •

edited

Loading

yarikoptic Sep 25, 2020

tsalo Sep 28, 2020

yarikoptic Sep 16, 2020

yarikoptic Sep 16, 2020

tsalo Sep 16, 2020 •

edited

Loading

yarikoptic Sep 25, 2020 •

edited

Loading

yarikoptic Sep 25, 2020 •

edited

Loading

yarikoptic Sep 25, 2020

tsalo Sep 28, 2020

yarikoptic Sep 28, 2020

tsalo Sep 29, 2020

tsalo Oct 2, 2020

yarikoptic Sep 25, 2020

tsalo commented Sep 29, 2020 •

edited

Loading

tsalo commented Mar 17, 2021

[SCHEMA] Reorganize schema #609

[SCHEMA] Reorganize schema #609

Conversation

tsalo commented Sep 16, 2020

tsalo commented Sep 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yarikoptic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsalo Sep 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsalo Sep 16, 2020 • edited Loading

Choose a reason for hiding this comment

yarikoptic Sep 25, 2020 • edited Loading

Choose a reason for hiding this comment

yarikoptic Sep 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsalo commented Sep 29, 2020 • edited Loading

tsalo commented Mar 17, 2021

tsalo Sep 16, 2020 •

edited

Loading

tsalo Sep 16, 2020 •

edited

Loading

yarikoptic Sep 25, 2020 •

edited

Loading

yarikoptic Sep 25, 2020 •

edited

Loading

tsalo commented Sep 29, 2020 •

edited

Loading