Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] clarify no blank and duplicated headers in TSV #1116

Merged
merged 4 commits into from
Jul 12, 2022

Conversation

sappelhoff
Copy link
Member

closes #1105

Technically, one could argue that this is a backward incompatible change, because we never explicitly said that blank or duplicated headers in TSV files are forbidden.

However, invoking common sense™, I think this would be a fair change.

Copy link
Member

@VisLab VisLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this clarification is needed to support error-free downstream processing. I don't see a serious problem with this being a breaking change as there are all kinds of things that are now be validated that were not in the past, so datasets that validated last year to do not validate now. This will stabilize as the specification becomes fully elaborated and the validation fully matches it.

I do have one question about the use of TSV here and in other places. The problem is that there are going to be two kinds of TSV files: tabular files which have column names and timeseries files which are also TSV files but do not have column names in the file, but in the JSON.

I think this should be written to refer to tabular files and a similar requirement be placed on the column names in the JSON files.

@sappelhoff
Copy link
Member Author

I think this should be written to refer to tabular files and a similar requirement be placed on the column names in the JSON files.

good point, thanks. I added this in ea9a3a4

@sappelhoff sappelhoff added this to the 1.7.1 milestone Jun 8, 2022
@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Jun 9, 2022

I think this should be written to refer to tabular files and a similar requirement be placed on the column names in the JSON files.

So basically the json associated to tsv.gz should have the following content for each column?

"LongName": "OPTIONAL",
"Description": ( "RECOMMENDED", "The description of the column.", ),
"Levels": "RECOMMENDED",
"Units": "RECOMMENDED",
"TermURL": "RECOMMENDED",
"HED": "OPTIONAL"

That seems fair to me. Well maybe except for Levels because most of the tvz.gz are for physio / continuous data .

Copy link
Collaborator

@Remi-Gau Remi-Gau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@sappelhoff
Copy link
Member Author

So basically the json associated to tsv.gz should have the following content for each column?

This is how I understood it all the while 🤔

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Jun 9, 2022

Still requires to be somewhat reconciled with the current format of typical content of json for physio that has the column wise info in an array of strings.

{
   "SamplingFrequency": 100.0,
   "StartTime": -22.345,
   "Columns": ["cardiac", "respiratory", "trigger"],
   "cardiac": {
       "Units": "mV"
   }
}

@sappelhoff
Copy link
Member Author

@Remi-Gau I am unsure what you want me to do :-)

@Remi-Gau
Copy link
Collaborator

@Remi-Gau I am unsure what you want me to do :-)

At the moment, nothing.

This can be merged but if more as a foot note that we need to specify things more if people want to use the following in tsv.gz:

  • "LongName"
  • "Description"
  • "Levels"
  • "TermURL"
  • "HED"

If the example in spec,

{
   "SamplingFrequency": 100.0,
   "StartTime": -22.345,
   "Columns": ["cardiac", "respiratory", "trigger"],
   "cardiac": {
       "Units": "mV"
   }

is meant to say that you can have something like this:

{
   "SamplingFrequency": 100.0,
   "StartTime": -22.345,
   "Columns": ["cardiac", "respiratory", "trigger"],
   "cardiac": {
       "Description":  "continuous pulse measurement",
       "Units": "mV",
       "TermURL": "some URL",
       "HED": "..."
   },
   "respiratory": {
       "Units": "mV",
       "TermURL": "some URL"
   },

then is was not clear to me. And also not mentioned in the schema.

Also unclear where to specify the additional metadata, if they vary for each column:

https://bids-specification.readthedocs.io/en/latest/04-modality-specific-files/06-physiological-and-other-continuous-recordings.html#other-recommended-metadata-for-physiological-data

Probably good to keep that in mind with the motion and eyetracking BEP incoming.

@sappelhoff
Copy link
Member Author

yes, that's exactly what is meant. I'll try to make this clearer. Thanks

@Remi-Gau
Copy link
Collaborator

OK then maybe expanding the example might be the best.

@sappelhoff
Copy link
Member Author

@Remi-Gau I have restructured the section and tried to make the situation clearer - I'd be happy for a review.

And also not mentioned in the schema.

This is how it currently looks like in the schema:

---
PhysioColumns:
selectors:
- suffix == "physio"
columns:
cardiac: OPTIONAL
respiratory: OPTIONAL
trigger: OPTIONAL
additional_columns: allowed

What would you like to see improved?

Copy link
Collaborator

@Remi-Gau Remi-Gau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

Couple of times I caught myself thinking "maybe we should mention this" only to keep on reading and realize you had already added it a few lines below.

@VisLab
Copy link
Member

VisLab commented Jul 12, 2022

I have a concern on the physio requiring both Levels and Units. In the past I thought this was an either/or.

@sappelhoff
Copy link
Member Author

I have a concern on the physio requiring both Levels and Units. In the past I thought this was an either/or.

What makes you think that both are simultaneously required at any point? Is there a typo or unclear description somewhere?

@VisLab
Copy link
Member

VisLab commented Jul 12, 2022

I was confused by Remi's explanation above where both Levels and Units are RECOMMENDED. None of the physio examples use both so I guess it is clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Are tabular files in BIDS allowed to have any blank column names?
3 participants