Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra header files for non-compulsory data #9

Open
Lestropie opened this issue Dec 3, 2020 · 2 comments
Open

Extra header files for non-compulsory data #9

Lestropie opened this issue Dec 3, 2020 · 2 comments

Comments

@Lestropie
Copy link

The current proposed structure contains file header.yaml alongside compulsory files offsets.uint64 and positions.3.float16.

Whenever additional data are generated and added to the dataset (e.g. let's use the mean FA along each streamline: dps/mean_fa.float32). One may wish to store additional information related to how that image was generated, what FA image was used, etc.. I think that in such cases, optionally storing an additional file dps/mean_fa.yaml makes most sense, i.e.. changing the file extension only, as is done with JSON files in BIDS. It's worth having this explained and demonstrated in the specification.

@frheault
Copy link
Collaborator

frheault commented Dec 3, 2020

Oh I like this idea. But now, we will likely need a list of reserved tag for this too 😅

Do you have use cases in mind for this?
I can think of "unit", "scaling", "lookup_table"

However, out of pure curiosity: Do you think that in mrtrix (for example) you would try to 'conserve' this information in the file?
For example, if you subsample a TRX from 1M streamlines to 100k, would the output file keep these?

Personally, the metadata dpv/dps/dpg is very important to keep and follow along in the processing, but this is a bit too optional and could add complexity to reader/writer. I know this has nothing to do with specifications and would be an implementation decision. But I am curious about how you would personally handle such information.

@Lestropie
Copy link
Author

But now, we will likely need a list of reserved tags for this too

Very much so. This is precisely the sort of thing that we need to do for the introduction of DWI models into BIDS Derivatives (here, though I've neglected it for quite some time...), and that logic was always going to need to be extended to tractography data, it just looks like that's more likely to happen here rather than in the BIDS space. Though I would probably nevertheless advocate for a BIDS style, i.e. CamelCase variable names, as I would for the primary header file as well.

However, out of pure curiosity: Do you think that in mrtrix (for example) you would try to 'conserve' this information in the file?
For example, if you subsample a TRX from 1M streamlines to 100k, would the output file keep these?

Things start to get very "fun" in the implementation here, especially with MRtrix3's multi-threaded piping construction. Not looking forward to coding it...

Ideally you'd want any header fields that remain relevant to remain in place, and anything invalidated by the operation to be wiped or invalidated. There's a few places where I pull various gymnastics about what header fields should or should not be retained following any particular operation, but it's a lot of effort to automate and get right every time.

Personally, the metadata dpv/dps/dpg is very important to keep and follow along in the processing, but this is a bit too optional and could add complexity to reader/writer. I know this has nothing to do with specifications and would be an implementation decision. But I am curious about how you would personally handle such information.

Will probably create a new MRtrix3 issue (separate to MRtrix3/mrtrix3#411) specifically to discuss how to deal with the prospects of this format within our code, as it'll likely be quite different to others. But it's nevertheless worthwhile going through these use cases; e.g. "subsampling a TRX from 1M to 100k streamlines" would be more efficiently handled by defining either a streamline group or a bitwise DPS to define that selection and interface with that, rather than producing a new TRX. In the latter case though, yes, ideally you would want all associated DPV / DPS data to be correspondingly selected and written to output, which requires internal handling of such; this already happens with SIFT2 weights using dedicated code, but TRX necessitates something more general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants