Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFRA] SCHEMA: Declare entities by concept names, add entity field for filename components #616

Merged
merged 4 commits into from
Oct 1, 2020

Conversation

effigies
Copy link
Collaborator

As proposed in #588 (comment), roughly agreed on, and not (yet) argued against.

This uses the names settled on in bids-standard/pybids#489 (comment) to allow pybids to query in a somewhat consistent manner.

Closes #585.
Closes #588.

jbpoline and others added 3 commits September 18, 2020 09:59
some of the "name" dont match pybids - proposing to add a key for this
change pybids_name to altname
@satra
Copy link
Collaborator

satra commented Sep 18, 2020

@effigies - this really depends on how you think about the relation between pieces of information. i fear that the current formulation may handcuff future changes.

current proposal

pybids_variable: 
    bids_key: bids_name

as opposed to:

common_concept:
   preferred_variable_name: variable_name
   bids_key: bids_spec_name

so the current proposal ties the schema between two very specific things. everything can be reworked especially since the current yaml files don't form a formal general spec of any kind.

@effigies
Copy link
Collaborator Author

Could you make your suggestion concrete?

@effigies
Copy link
Collaborator Author

effigies commented Sep 28, 2020

I'm skeptical that we're going to find this case. We have four variable sources: entities (alphanumeric), TSV column headers (snake_case), metadata fields (CamelCase when not matching columns) and a couple special structural variables (datatype, suffix). The entities are the only ones shortened enough to need renaming to reduce confusion. For columns and metadata fields, I can't really see a case as a layout consumer for the name not to match the standard.

(On phone. Sorry for breaking thread.)

@satra
Copy link
Collaborator

satra commented Sep 28, 2020

just to clarify. my issue is not really about the key, it's about saying that the key is a pybids variable name and no other variable names could be possible in the future. if pybids choses to use the key as a variable name, that's fine, but let's not call the key a pybids variable name.

@tsalo
Copy link
Member

tsalo commented Sep 28, 2020

@satra I agree with that. I prefer to consider that top-level term a "common concept" (in single-word format), as you said above. Once we reorganize the schema in #609, I think these top-level names will instead by the filenames for split term-specific YAML files anyway, so it will be fairly distinct from a "variable name", even though the names will be good choices for variable names.

@effigies effigies changed the title SCHEMA: Declare entities by variable names, add entity field for filename components SCHEMA: Declare entities by concept names, add entity field for filename components Sep 28, 2020
@effigies
Copy link
Collaborator Author

Modified the title, as I think that's the only actual change required? Please let me know if there's a more substantial change needed.

@tsalo
Copy link
Member

tsalo commented Oct 1, 2020

@effigies do you know why the RTD CI is still registering as pending?

@effigies
Copy link
Collaborator Author

effigies commented Oct 1, 2020

Probably a temporary glitch on GitHub or RTD. Closing and reopening sometimes restarts CI...

@effigies effigies closed this Oct 1, 2020
@effigies effigies reopened this Oct 1, 2020
@effigies
Copy link
Collaborator Author

effigies commented Oct 1, 2020

Passed this time.

@tsalo
Copy link
Member

tsalo commented Oct 1, 2020

I'm not sure if we want to apply the five-day rule to schema-related changes, but if so then we can merge this on Monday, I think. Pending any new concerns from the community, of course.

@effigies
Copy link
Collaborator Author

effigies commented Oct 1, 2020

I would consider this more on the order of infrastructure, and provisional at that. Also, the last commit was 13 days ago, so I'd say go ahead.

@tsalo
Copy link
Member

tsalo commented Oct 1, 2020

Sounds good to me! I'll merge now.

@tsalo tsalo merged commit 3978a1d into bids-standard:master Oct 1, 2020
@sappelhoff
Copy link
Member

I would consider this more on the order of infrastructure, and provisional at that. Also, the last commit was 13 days ago, so I'd say go ahead.

agreed.

@effigies effigies deleted the rf/entity-names branch October 2, 2020 11:31
@sappelhoff sappelhoff changed the title SCHEMA: Declare entities by concept names, add entity field for filename components [INFRA] SCHEMA: Declare entities by concept names, add entity field for filename components Oct 7, 2020
yarikoptic added a commit to yarikoptic/bids-specification that referenced this pull request Dec 18, 2021
Text is using both 'file name' and 'filename' pretty much to the equal amount
ATM (see git grep outputs below).  Code uses 'filename', and wikipedia has
https://en.wikipedia.org/wiki/Filename and prefers to use 'filename' in it. So
I decided to harmonize  into  'filename'.

	$> git grep  'file name' | grep '\.md' | grep -v MACRO | nl
		 1	src/02-common-principles.md:1.  **File extension** - a portion of the file name after the left-most
		 2	src/02-common-principles.md:are compulsory. For example a particular file name format is required when
		 3	src/02-common-principles.md:saved under a particular file name specified in the standard. This standard
		 4	src/02-common-principles.md:A file name consists of a chain of *entities*, or key-value pairs, a *suffix* and an
		 5	src/02-common-principles.md:`subject`, the file name MUST begin with the string `sub-<label>_ses-<label>`.
		 6	src/02-common-principles.md:If the `session` level is omitted in the folder structure, the file name MUST begin
		 7	src/02-common-principles.md:key/value pair MUST also be included as part of the file names themselves.
		 8	src/02-common-principles.md:produces a human readable file name, such as `sub-01_task-rest_eeg.edf`.
		 9	src/02-common-principles.md:It is evident from the file name alone that the file contains resting state
		10	src/02-common-principles.md:Entities within a file name MUST be unique.
		11	src/02-common-principles.md:For example, the following file name is not valid because it uses the `acq`
		12	src/02-common-principles.md:label, but must be included in file names (similarly to other key names).
		13	src/02-common-principles.md:meaning of file names and setting requirements on their contents or metadata.
		14	src/02-common-principles.md:to suppress warnings or provide interpretations of your file names.
		15	src/03-modality-agnostic-files.md:This file is REQUIRED if `sample-<label>` is present in any file name within the dataset.
		16	src/04-modality-specific-files/05-task-events.md:Where `<matches>` corresponds to task file name. For example:
		17	src/04-modality-specific-files/06-physiological-and-other-continuous-recordings.md:In the template file names, the `<matches>` part corresponds to task file name
		18	src/05-derivatives/02-common-data-types.md:is used to prevent clashing with the original file name.
		19	src/06-longitudinal-and-multi-site-studies.md:and [file names](02-common-principles.md#file-name-structure)
		20	src/99-appendices/03-hed.md:screen or the file name of the stimulus image.
		21	src/99-appendices/03-hed.md:       "LongName": "Stimulus file name",
		22	src/99-appendices/04-entity-table.md:[file name structure](../02-common-principles.md#file-name-structure),
		23	src/99-appendices/06-meg-file-formats.md:that not only the file names, but also the internal file pointers will be
		24	src/99-appendices/09-entities.md:[file name structure](../02-common-principles.md#file-name-structure).
		25	src/CHANGES.md:-   \[FIX] Clarify use of session entity in file names [bids-standard#532](bids-standard#532) ([Moo-Marc](https://github.com/Moo-Marc))
		26	src/CHANGES.md:-   \[FIX] Specify marker file names for KIT data (MEG) [bids-standard#62](bids-standard#62) ([monkeyman192](https://github.com/monkeyman192))
		27	src/CHANGES.md:-   Added missing `sub-<participant_label>` in behavioral data file names.
		28	src/pregh-changes.md:-   Added missing `sub-<participant_label>` in behavioral data file names.

	$> git grep 'filename' | grep '\.md' | grep -v MACRO | nl
		 1	CONTRIBUTING.md:Make sure that all filename format templates, entity tables, and entity definitions are correct
		 2	src/02-common-principles.md:(with the same filename as the `.nii[.gz]` file, but with a `.json` extension).
		 3	src/03-modality-agnostic-files.md:      "filename": ("REQUIRED", "There MUST be exactly one row for each file."),
		 4	src/03-modality-agnostic-files.md:filename	acq_time
		 5	src/04-modality-specific-files/02-magnetoencephalography.md:which saves the MEG sensor coil positions in a separate file with two possible filename extensions  (`.sqd`, `.mrk`).
		 6	src/05-derivatives/01-introduction.md:    status. Any modification of raw files must use a modified filename that does
		 7	src/05-derivatives/01-introduction.md:    not conflict with the raw filename. Further, any files created as part of a
		 8	src/05-derivatives/01-introduction.md:    derivative dataset must not match a permissible filename of a valid raw
		 9	src/05-derivatives/01-introduction.md:    dataset. Stated equivalently, if any filename in a derivative dataset has a
		10	src/05-derivatives/01-introduction.md:-   Each Derivatives filename MUST be of the form:
		11	src/05-derivatives/01-introduction.md:    `source_entities` MUST be the entire source filename, with the omission of
		12	src/05-derivatives/01-introduction.md:    the source suffix and extension. One exception to this rule is filename
		13	src/05-derivatives/01-introduction.md:-   There is no prohibition against identical filenames in different derived
		14	src/05-derivatives/03-imaging.md:filename.
		15	src/99-appendices/04-entity-table.md:specification, and establishes a common order within a filename.
		16	src/99-appendices/08-coordinate-systems.md:The `scanner` coordinate system is implicit and assumed by default if the derivative filename does not define **any** `space-<label>`.
		17	src/99-appendices/11-qmri.md:filenames will remain the same; however, the optional metadata (third column) may
		18	src/CHANGES.md:-   \[SCHEMA] Use macro for filename templates in file collections appendix [bids-standard#787](bids-standard#787) ([tsalo](https://github.com/tsalo))
		19	src/CHANGES.md:-   \[FIX] Accidentally swapped Neuromag/Elekta/MEGIN cross-talk & fine-calibration filename extensions [bids-standard#621](bids-standard#621) ([hoechenberger](https://github.com/hoechenberger))
		20	src/CHANGES.md:-   \[INFRA] SCHEMA: Declare entities by concept names, add entity field for filename components [bids-standard#616](bids-standard#616) ([effigies](https://github.com/effigies))
		21	src/CHANGES.md:-   \[FIX] Common principles: Fix filename in inheritance principle [bids-standard#261](bids-standard#261) ([Lestropie](https://github.com/Lestropie))
		22	src/CHANGES.md:-   \[FIX] Example for IntendedFor was missing session indicator in the filename [bids-standard#129](bids-standard#129) ([yarikoptic](https://github.com/yarikoptic))
		23	src/schema/README.md:the entity tables, entity definitions, filename templates, and metadata tables.
		24	src/schema/README.md:-   `entities.yaml`: Entities (key/value pairs in folder and filenames).
		25	src/schema/README.md:This file contains a dictionary in which each entity (key/value pair in filenames) is defined.
		26	src/schema/README.md:they appear in filenames _and_ their full names.
		27	src/schema/README.md:For example, the key for the "Contrast Enhancing Agent" entity, which appears in filenames as `ce-<label>`,
		28	src/schema/README.md:since many entities (such as `ce`) have very short filename elements.
		29	src/schema/README.md:The `entity` field is the entity as it appears in filenames. For example, the `entity` for `ceagent` is `ce`.
		30	src/schema/README.md:Given that all entities appear in filenames, they should all be strings and the `type` field should always be `string`.
		31	src/schema/README.md:For example, `run` should have an index, so a valid key-value pair in a filename would be `run-01`.
		32	src/schema/README.md:Keys are the filenames (without file extensions),
		33	src/schema/README.md:-   `datatypes/*.yaml`: Files in the `datatypes` folder contain information about valid filenames within a given datatype.
		34	src/schema/README.md:    Each dictionary contains a list of suffixes, entities, and file extensions which may constitute a valid BIDS filename.
		35	src/schema/README.md:-   `entities.yaml`: This file simply defines the order in which entities, when present, MUST appear in filenames.
		36	src/schema/README.md:Each dictionary corresponds to a group of suffixes that have the same rules regarding filenames.
		37	src/schema/README.md:**NOTE**: The order in which entities appear in these dictionaries does not reflect how they should appear in filenames.
		38	src/schema/README.md:This file contains a list of entities in the order in which they must appear in filenames.
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "bash -c 'git grep -l '\"'\"'file name'\"'\"' | grep '\"'\"'\\.md'\"'\"' | grep -v MACRO | xargs sed -i -e '\"'\"'s,file name,filename,g'\"'\"''",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
sappelhoff pushed a commit that referenced this pull request Dec 21, 2021
Text is using both 'file name' and 'filename' pretty much to the equal amount
ATM (see git grep outputs below).  Code uses 'filename', and wikipedia has
https://en.wikipedia.org/wiki/Filename and prefers to use 'filename' in it. So
I decided to harmonize  into  'filename'.

	$> git grep  'file name' | grep '\.md' | grep -v MACRO | nl
		 1	src/02-common-principles.md:1.  **File extension** - a portion of the file name after the left-most
		 2	src/02-common-principles.md:are compulsory. For example a particular file name format is required when
		 3	src/02-common-principles.md:saved under a particular file name specified in the standard. This standard
		 4	src/02-common-principles.md:A file name consists of a chain of *entities*, or key-value pairs, a *suffix* and an
		 5	src/02-common-principles.md:`subject`, the file name MUST begin with the string `sub-<label>_ses-<label>`.
		 6	src/02-common-principles.md:If the `session` level is omitted in the folder structure, the file name MUST begin
		 7	src/02-common-principles.md:key/value pair MUST also be included as part of the file names themselves.
		 8	src/02-common-principles.md:produces a human readable file name, such as `sub-01_task-rest_eeg.edf`.
		 9	src/02-common-principles.md:It is evident from the file name alone that the file contains resting state
		10	src/02-common-principles.md:Entities within a file name MUST be unique.
		11	src/02-common-principles.md:For example, the following file name is not valid because it uses the `acq`
		12	src/02-common-principles.md:label, but must be included in file names (similarly to other key names).
		13	src/02-common-principles.md:meaning of file names and setting requirements on their contents or metadata.
		14	src/02-common-principles.md:to suppress warnings or provide interpretations of your file names.
		15	src/03-modality-agnostic-files.md:This file is REQUIRED if `sample-<label>` is present in any file name within the dataset.
		16	src/04-modality-specific-files/05-task-events.md:Where `<matches>` corresponds to task file name. For example:
		17	src/04-modality-specific-files/06-physiological-and-other-continuous-recordings.md:In the template file names, the `<matches>` part corresponds to task file name
		18	src/05-derivatives/02-common-data-types.md:is used to prevent clashing with the original file name.
		19	src/06-longitudinal-and-multi-site-studies.md:and [file names](02-common-principles.md#file-name-structure)
		20	src/99-appendices/03-hed.md:screen or the file name of the stimulus image.
		21	src/99-appendices/03-hed.md:       "LongName": "Stimulus file name",
		22	src/99-appendices/04-entity-table.md:[file name structure](../02-common-principles.md#file-name-structure),
		23	src/99-appendices/06-meg-file-formats.md:that not only the file names, but also the internal file pointers will be
		24	src/99-appendices/09-entities.md:[file name structure](../02-common-principles.md#file-name-structure).
		25	src/CHANGES.md:-   \[FIX] Clarify use of session entity in file names [#532](#532) ([Moo-Marc](https://github.com/Moo-Marc))
		26	src/CHANGES.md:-   \[FIX] Specify marker file names for KIT data (MEG) [#62](#62) ([monkeyman192](https://github.com/monkeyman192))
		27	src/CHANGES.md:-   Added missing `sub-<participant_label>` in behavioral data file names.
		28	src/pregh-changes.md:-   Added missing `sub-<participant_label>` in behavioral data file names.

	$> git grep 'filename' | grep '\.md' | grep -v MACRO | nl
		 1	CONTRIBUTING.md:Make sure that all filename format templates, entity tables, and entity definitions are correct
		 2	src/02-common-principles.md:(with the same filename as the `.nii[.gz]` file, but with a `.json` extension).
		 3	src/03-modality-agnostic-files.md:      "filename": ("REQUIRED", "There MUST be exactly one row for each file."),
		 4	src/03-modality-agnostic-files.md:filename	acq_time
		 5	src/04-modality-specific-files/02-magnetoencephalography.md:which saves the MEG sensor coil positions in a separate file with two possible filename extensions  (`.sqd`, `.mrk`).
		 6	src/05-derivatives/01-introduction.md:    status. Any modification of raw files must use a modified filename that does
		 7	src/05-derivatives/01-introduction.md:    not conflict with the raw filename. Further, any files created as part of a
		 8	src/05-derivatives/01-introduction.md:    derivative dataset must not match a permissible filename of a valid raw
		 9	src/05-derivatives/01-introduction.md:    dataset. Stated equivalently, if any filename in a derivative dataset has a
		10	src/05-derivatives/01-introduction.md:-   Each Derivatives filename MUST be of the form:
		11	src/05-derivatives/01-introduction.md:    `source_entities` MUST be the entire source filename, with the omission of
		12	src/05-derivatives/01-introduction.md:    the source suffix and extension. One exception to this rule is filename
		13	src/05-derivatives/01-introduction.md:-   There is no prohibition against identical filenames in different derived
		14	src/05-derivatives/03-imaging.md:filename.
		15	src/99-appendices/04-entity-table.md:specification, and establishes a common order within a filename.
		16	src/99-appendices/08-coordinate-systems.md:The `scanner` coordinate system is implicit and assumed by default if the derivative filename does not define **any** `space-<label>`.
		17	src/99-appendices/11-qmri.md:filenames will remain the same; however, the optional metadata (third column) may
		18	src/CHANGES.md:-   \[SCHEMA] Use macro for filename templates in file collections appendix [#787](#787) ([tsalo](https://github.com/tsalo))
		19	src/CHANGES.md:-   \[FIX] Accidentally swapped Neuromag/Elekta/MEGIN cross-talk & fine-calibration filename extensions [#621](#621) ([hoechenberger](https://github.com/hoechenberger))
		20	src/CHANGES.md:-   \[INFRA] SCHEMA: Declare entities by concept names, add entity field for filename components [#616](#616) ([effigies](https://github.com/effigies))
		21	src/CHANGES.md:-   \[FIX] Common principles: Fix filename in inheritance principle [#261](#261) ([Lestropie](https://github.com/Lestropie))
		22	src/CHANGES.md:-   \[FIX] Example for IntendedFor was missing session indicator in the filename [#129](#129) ([yarikoptic](https://github.com/yarikoptic))
		23	src/schema/README.md:the entity tables, entity definitions, filename templates, and metadata tables.
		24	src/schema/README.md:-   `entities.yaml`: Entities (key/value pairs in folder and filenames).
		25	src/schema/README.md:This file contains a dictionary in which each entity (key/value pair in filenames) is defined.
		26	src/schema/README.md:they appear in filenames _and_ their full names.
		27	src/schema/README.md:For example, the key for the "Contrast Enhancing Agent" entity, which appears in filenames as `ce-<label>`,
		28	src/schema/README.md:since many entities (such as `ce`) have very short filename elements.
		29	src/schema/README.md:The `entity` field is the entity as it appears in filenames. For example, the `entity` for `ceagent` is `ce`.
		30	src/schema/README.md:Given that all entities appear in filenames, they should all be strings and the `type` field should always be `string`.
		31	src/schema/README.md:For example, `run` should have an index, so a valid key-value pair in a filename would be `run-01`.
		32	src/schema/README.md:Keys are the filenames (without file extensions),
		33	src/schema/README.md:-   `datatypes/*.yaml`: Files in the `datatypes` folder contain information about valid filenames within a given datatype.
		34	src/schema/README.md:    Each dictionary contains a list of suffixes, entities, and file extensions which may constitute a valid BIDS filename.
		35	src/schema/README.md:-   `entities.yaml`: This file simply defines the order in which entities, when present, MUST appear in filenames.
		36	src/schema/README.md:Each dictionary corresponds to a group of suffixes that have the same rules regarding filenames.
		37	src/schema/README.md:**NOTE**: The order in which entities appear in these dictionaries does not reflect how they should appear in filenames.
		38	src/schema/README.md:This file contains a list of entities in the order in which they must appear in filenames.
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "bash -c 'git grep -l '\"'\"'file name'\"'\"' | grep '\"'\"'\\.md'\"'\"' | grep -v MACRO | xargs sed -i -e '\"'\"'s,file name,filename,g'\"'\"''",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
@tsalo tsalo mentioned this pull request Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Add altname or preferred variable name field to entities.yaml in schema
5 participants