-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API and DICOM-specific implementations for providing supplemental metadata during conversion #4016
Conversation
Not fully functional, but a place to start connecting bfconvert.
Fiddled with the API a bit to make this work in a more extensible way. Supplying an extra metadata location is no longer tied to DicomWriter specifically, but is in an extra interface that can be implemented by writers that support this feature. Considered implementing bfconvert connectivity via an option in DicomWriter, instead of a new command line argument in bfconvert. Either way would work, but the approach here would allow us to implement a similar feature in other writers later on (if we choose to do so).
...and fix up some minor issues. A simple test like this should now work: $ cat test.json { "BodyPartExamined": { "Value": "BRAIN", "VR": "CS", "Tag": "(0018,0015)" } } $ bfconvert -extra-metadata test.json test.fake test.dcm $ dcdump test.dcm
I tried: echo '{ "PatientID": { "Value": "1234", "VR": "LO", "Tag": "(0010,0020)" } }' >/tmp/crap.json ./bfconvert -extra-metadata /tmp/crap.json -noflat -tilex 256 -tiley 256 -compression JPEG CMU-1.svs /tmp/wsiconverted/crap.dcm But it gave the following warning(s): Ignoring tag Patient ID = 1234 from provider loci.formats.dicom.DicomJSONProvider@3246fb96 and the supplied PatientID value was not present in the output: dckey -k PatientID /tmp/wsiconverted/crap_0_3.dcm |
I think the JSON format is better than the dcdump format for supplying the metadata, and we do not need both. The JSON format to use for this sort of thing is always a challenge, since what is in the standard for DICOMweb is not very user friendly, and what is in dcmqi or my own SetCharacteristicsFromSummary both use only keywords, so they then depend on the conversion tool having a DICOM data dictionary to determine the Data Element Tag and VR. So your approach is a reasonable compromise, though it will be more irritating to use since the caller will have to supply that information. I have no problem adding a BSD-licensed JSON parser dependency. |
How do you see merging or replacing nested metadata working? For example, a common use case is to supply content within SpecimenDescriptionSequence, specifically multiple items of the nested SpecimenPreparationStepContentItemSequence within SpecimenPreparationSequence that describe staining with H&E, fixation and embedding with FFPE, etc. Your default behavior populates a few attributes within SpecimenDescriptionSequence such as the SpecimenUID that is created by the convertor. The simplest solution is probably to allow overwriting the entire sequence. Attached please find an example of some relatively complex nested metadata that describes the sort of thing I normally supply, using the JSON syntax for my own conversion tool example_rms_wsi_metadata.json.zip I agree that preventing overwriting of structural metadata (things likes Rows, Columns) is probably a good idea, though I usually don't prevent that in my own tools, just assume the caller isn't going to do that sort of thing (unless they want to create a deliberately bad object for validator testing). |
I also found that supplying ContainerTypeCodeSequence was ignored - it seems that anything you are populating with default values for standard compliance cannot be overridden (yet). E.g.:
Or maybe I got the syntax wrong, though it didn't complain. Also, how do you plan to allow multiple items in one sequence to be specified? The syntax you describe only seems to allow for one item. |
Each tag contained in a JSON file may now optionally contain a "ResolutionStrategy" property set to "IGNORE", "REPLACE", or "APPEND". IGNORE means that the tag will be ignored if the same tag code has been defined already. REPLACE means that the tag will be used to replace any existing tag with the same code. APPEND means that if there is an existing tag with the same code, the current tag's value will be appended to the pre-existing tag's value array.
…icom-provide-metadata
With a bunch of testing over the last few days, I think the current state of this PR with cc3549e and fb647dd addresses comments so far. For anything defined in JSON that is not a sequence (VR This behavior is now configurable within the JSON, by setting The restriction on overwriting "important" metadata such as An example that demonstrates multiple items in a sequence, and different combinations of
The expected behavior in this example is:
I need to add some unit/integration tests here (which I will do early next week), and then propose to take out of draft status and review for the next minor release. |
The ResolutionStrategy approach looks sound to me ... while it gives the caller the ability to really screw things up if they try, it is also powerful enough to satisfy any use-case. The one thing that would irritate me is the need to specify the tag number (e.g., (0018,0015) for BodyPartExamined). Can you look these up in the data dictionary for standard data elements? That would save the user having to do that and supply it. Only if a (relatively recently added to the standard) keyword is not recognized (error message) should the tag number be required, or for a private data element (for which the caller would also need to know that they need to add a private creator element as well, but that's OK). |
This is a bit lenient as it will ignore case and whitespace, e.g. "Study Date", "StudyDate", and "sTuDydAte " should all resolve to 0008,0020. Also fixes a couple of small discrepancies in the dictionary.
Same would apply to VR, right? I think those should also come from the data dictionary. |
If a VR is not defined, the default is used. If a VR other than the default is defined, a warning will be shown but the user-defined VR will be used.
Also fixes tag/VR lookup for sequences.
Last few commits here add a bunch of unit tests and make It looks like the Maven macos and ubuntu builds are struggling to start (~45 minutes with no indication that the build has actually started), so these may need to be restarted. The Maven Windows builds did pass though, so I'd be surprised if that's a problem with this PR specifically. |
…nvestigation" This reverts commit 09f65ff.
This prevents test run times from depending on the contents of the temp directory. Without this change, some builds failed as the tests didn't finish.
All— Very sorry for the slowness in the response here. I’ve at least partially been the hold up, since in discussions about this PR I keep coming back to the issue of not having a second format where this is implemented. I agree that adding the IExtraMetadataWriter interface is fairly low-impact, but without validating it against another format, it’s hard to know if this interface will be usable elsewhere or if this is more ultimately an implementation detail of the DICOM writer itself. I’ve failed to find time, but I think what would really help us understand the impact of the In the case of OME-TIFF, though, I don’t see how to make use of the new |
It's definitely up to each writer that implements
Yes, that would be down to the input format and the writer implementation.
No, it's entirely up to the implementing writer to say what it allows.
Exactly, yes.
As noted in the PR description, I considered doing exactly this, but thought a more generic bfconvert option would give us the flexibility to use this feature elsewhere if we choose. If making this more DICOM-specific would help to get through review at this point, that's fine; it's not realistic to add and test another writer that implements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I like the concept of being able to attach extra metadata and the API additions here look to be fairly flexible and low impact in its current state. As Josh mentioned, in an ideal world we would have other writer implementations such as OME-TIFF to really be able to test the wider impact, but that is certainly something we can look at implementing in the future. For now having the DICOM implementation alone will provide benefit for end users.
Tested using bfconvert with the new option and some of the sample json provided in the PR. I tested converting both existing DICOM with new extra metadata as well as converting OME-TIFF to DICOM with extra metadata. In both cases converting the file completed successfully, the resulting file could be read and displayed without any exceptions. Inspecting the metadata showed that the additional metadata values were all correct and present.
The resolution strategy also makes sense to me and provides enough power and flexibility to give the user the desired level of control.
Overall the code changes look good from my side and the new tests look to have good coverage. All other builds and tests have remained green with its included so it looks good to merge to me. I will get a follow up docs PR to document the new functionality.
Writers now have the option of implementing the
loci.formats.out.IExtraMetadataWriter
interface, which indicates that the writer can accept additional metadata beyond what might have been in the input data.bfconvert
now includes an-extra-metadata
option, which takes a string that can be passed to the writer if it implementsIExtraMetadataWriter
. In current implementations, that string will be a file path.DicomWriter
now implementsIExtraMetadataWriter
, and can parse metadata from either a .dcdump file (a subset of the output ofdcdump
, seeloci.formats.dicom.DCDumpProvider
) or a .json file in a specific layout (seeloci.formats.dicom.DicomJSONProvider
). The former is meant to be a "quick start" for workflows that need to copy metadata from an existing DICOM dataset. There are almost certainly some gaps in parsing functionality, but some basic examples like this should work:The JSON files are where I would propose to focus further effort. As noted in the header comment in
DicomJSONProvider
, these are loosely based on existing work in https://github.com/QIICR/dcmqi/tree/master/doc/examples, but explicitly define tags and VRs. Some basic examples are:General API, implementation, and overall usability feedback is welcome as always. Some things to consider in particular:
Pixel Spacing
, but allowing e.g.Total Pixel Matrix Rows
to be overwritten is a bad idea.I considered making this more DICOM-specific, using a writer option and the existing
-option
flag inbfconvert
. However, I can imagine that we might want to make use of this feature in other writers in the future. Adding an optional lightweight API seemed like the most flexible path forward.This will require a minor release (due to API updates; this will not affect readers or memo files). Note also that this introduces an
org.json:json
dependency informats-bsd
(similar toformats-gpl
). If that's a problem, can consider other JSON parsers. Opening as a draft PR for now, for 6.14.0 consideration./cc @dclunie, @fedorov