Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fake hashes fail validation #3810

Closed
joshmoore opened this issue Apr 13, 2022 · 5 comments · Fixed by tlambert03/ome-types#146
Closed

Fake hashes fail validation #3810

joshmoore opened this issue Apr 13, 2022 · 5 comments · Fixed by tlambert03/ome-types#146

Comments

@joshmoore
Copy link
Member

When trying to parse this converted fake:

$cat /tmp/plate.fake.ini
plates=1
plateAcqs=1
plateRows=2
plateCols=2
fields=2

ome_types complains about the validation of the XML:

$ome_zarr info /tmp/plate.ome.zarr/
WARNING:ome_zarr.io:version mismatch: detected:FormatV02, requested:FormatV04
WARNING:ome_zarr.io:version mismatch: detected:FormatV04, requested:FormatV02
ERROR:ome_zarr_metadata.spec:failed to parse metadata: 8 validation errors for OME
images -> 0 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 1 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 2 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 3 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 4 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 5 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 6 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 7 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
/private/tmp/plate.ome.zarr [zgroup]
 - metadata
   - Plate
   - bioformats2raw
 - data
   - (1, 1, 1, 1024, 1024)

for this XML:

$ xmlindent /tmp/plate.ome.zarr/OME/METADATA.ome.xml
...
            <HashSHA1>
               1234567890ABCDEF1234567890ABCDEF12345678
            </HashSHA1>

from: https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#Plane_HashSHA1

joshmoore added a commit to ome/ome-zarr-metadata that referenced this issue Apr 13, 2022
- skip parsing if plates exist
- temporarily remove HashSHA1 due to parsing error
  (ome/bioformats#3810)
- insert MetadataOnly after Channel for validation
  (glencoesoftware/bioformats2raw#137)
@melissalinkert
Copy link
Member

Hashes for fake datasets are set here: https://github.com/ome/ome-model/blob/9f1fb5647f3c76473747643808ddb044b7d5ab45/ome-xml/src/main/java/ome/specification/XMLMockObjects.java#L1096

Ideally we'd change XMLMockObjects to use 20 characters for the hash, release ome-model and update the dependency version here. If a fix is needed urgently though, FakeReader could override the HashSHA1 for now.

@joshmoore
Copy link
Member Author

I'm simply stripping them out for the moment (like I'm injecting MetadataOnly) so no huge rush. I couldn't figure out a valid regex, so we might want to add that to the upstream docs as an example once we do.

@sbesson
Copy link
Member

sbesson commented Apr 18, 2022

While trying to establish the language variations in preparation of a deprecation (as discussed in ome/ome-model#158 (comment) ), I have used the following example

<?xml version="1.0" encoding="UTF-8"?>
<OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2016-06 http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd">
   <Experiment ID="Experiment:0" Type="Photobleaching">
      <Description>Experiment</Description>
   </Experiment>
   <Plate ColumnNamingConvention="number" Columns="1" ExternalIdentifier="External Identifier" ID="Plate:0" Name="Plate Name 0" RowNamingConvention="letter" Rows="1" Status="Plate status" WellOriginX="0.0" WellOriginXUnit="µm" WellOriginY="1.0" WellOriginYUnit="µm">
      <Description>Plate 0 of 1</Description>
      <Well Color="255" Column="0" ExternalDescription="External Description" ExternalIdentifier="External Identifier" ID="Well:0_0_0_0" Row="0" Type="Transfection: done">
         <WellSample ID="WellSample:0_0_0_0_0_0" Index="0" PositionX="0.0" PositionXUnit="reference frame" PositionY="1.0" PositionYUnit="reference frame" Timepoint="2006-05-04T18:13:51">
            <ImageRef ID="Image:0"/>
         </WellSample>
      </Well>
      <PlateAcquisition EndTime="2006-05-04T18:13:51" ID="PlateAcquisition:0" Name="PlateAcquisition Name 0" StartTime="2006-05-04T18:13:51">
         <Description>PlateAcquisition 0 of 1</Description>
         <WellSampleRef ID="WellSample:0_0_0_0_0_0"/>
      </PlateAcquisition>
   </Plate>
   <Instrument ID="Instrument:0">
      <Microscope LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789" Type="Upright"/>
      <Laser FrequencyMultiplication="30" ID="LightSource:0" LaserMedium="Alexandrite" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" PockelCell="false" Power="200.0" PowerUnit="mW" RepetitionRate="30.0" RepetitionRateUnit="aHz" SerialNumber="0123456789" Tuneable="false" Type="Dye" Wavelength="200.0" WavelengthUnit="nm"/>
      <Arc ID="LightSource:1" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Power="200.0" PowerUnit="mW" SerialNumber="0123456789" Type="HgXe"/>
      <Filament ID="LightSource:2" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Power="200.0" PowerUnit="mW" SerialNumber="0123456789" Type="Halogen"/>
      <LightEmittingDiode ID="LightSource:3" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Power="200.0" PowerUnit="mW" SerialNumber="0123456789"/>
      <Laser FrequencyMultiplication="30" ID="LightSource:4" LaserMedium="Alexandrite" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" PockelCell="false" Power="200.0" PowerUnit="mW" RepetitionRate="30.0" RepetitionRateUnit="aHz" SerialNumber="0123456789" Tuneable="false" Type="Dye" Wavelength="200.0" WavelengthUnit="nm"/>
      <Detector AmplificationGain="0.0" Gain="1.0" ID="Detector:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Offset="2.0" SerialNumber="0123456789" Type="CCD" Voltage="100" VoltageUnit="V" Zoom="3.0"/>
      <Objective CalibratedMagnification="1.0" Correction="UV" ID="Objective:0" Immersion="Oil" Iris="true" LensNA="0.5" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" NominalMagnification="1.5" SerialNumber="0123456789" WorkingDistance="1.0" WorkingDistanceUnit="µm"/>
      <FilterSet ID="FilterSet:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789"/>
      <Filter ID="Filter:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789" Type="LongPass">
         <TransmittanceRange CutIn="200.0" CutInTolerance="1.0" CutInToleranceUnit="nm" CutInUnit="nm" CutOut="300.0" CutOutTolerance="1.0" CutOutToleranceUnit="nm" CutOutUnit="nm" Transmittance="0.5"/>
      </Filter>
      <Filter ID="Filter:1" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789" Type="LongPass">
         <TransmittanceRange CutIn="200.0" CutInTolerance="1.0" CutInToleranceUnit="nm" CutInUnit="nm" CutOut="300.0" CutOutTolerance="1.0" CutOutToleranceUnit="nm" CutOutUnit="nm" Transmittance="0.5"/>
      </Filter>
      <Dichroic ID="Dichroic:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789"/>
   </Instrument>
   <Image ID="Image:0" Name="test">
      <Description>Image Description 0</Description>
      <ExperimentRef ID="Experiment:0"/>
      <ImagingEnvironment AirPressure="1.0" AirPressureUnit="mbar" CO2Percent="1.0" Humidity="1.0" Temperature="1.0" TemperatureUnit="°C"/>
      <StageLabel Name="StageLabel" X="1.0" XUnit="reference frame" Y="1.0" YUnit="reference frame" Z="1.0" ZUnit="reference frame"/>
      <Pixels BigEndian="false" DimensionOrder="XYZCT" ID="Pixels:0" Interleaved="false" PhysicalSizeX="1" PhysicalSizeXUnit="µm" PhysicalSizeY="1" PhysicalSizeYUnit="µm" PhysicalSizeZ="1" PhysicalSizeZUnit="µm" SignificantBits="8" SizeC="1" SizeT="1" SizeX="512" SizeY="512" SizeZ="1" Type="uint8">
         <Channel AcquisitionMode="FluorescenceLifetime" Color="1687603455" ContrastMethod="Brightfield" EmissionWavelength="300.3" EmissionWavelengthUnit="nm" ExcitationWavelength="400.3" ExcitationWavelengthUnit="nm" Fluor="Fluor" ID="Channel:0:0" IlluminationType="Oblique" NDFilter="1.0" Name="Name" PinholeSize="0.5" PinholeSizeUnit="µm" PockelCellSetting="0" SamplesPerPixel="1">
            <LightSourceSettings Attenuation="1.0" ID="LightSource:0" Wavelength="200.2" WavelengthUnit="nm"/>
            <DetectorSettings Binning="2x2" Gain="1.0" ID="Detector:0" Integration="20" Offset="1.0" ReadOutRate="1.0" ReadOutRateUnit="Hz" Voltage="1.0" VoltageUnit="V" Zoom="3.0"/>
            <LightPath>
               <ExcitationFilterRef ID="Filter:1"/>
               <DichroicRef ID="Dichroic:0"/>
               <EmissionFilterRef ID="Filter:0"/>
            </LightPath>
         </Channel>
         <MetadataOnly/>
         <Plane DeltaT="0.1" DeltaTUnit="s" ExposureTime="10.0" ExposureTimeUnit="s" PositionX="1.0" PositionXUnit="reference frame" PositionY="1.0" PositionYUnit="reference frame" PositionZ="1.0" PositionZUnit="reference frame" TheC="0" TheT="0" TheZ="0">
            <HashSHA1>1234567890ABCDEF1234567890ABCDEF12345678</HashSHA1>
         </Plane>
      </Pixels>
   </Image>
</OME>

Bio-Formats xmlvalid

% ./bftools/xmlvalid out.xml 
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
Validating out.xml
No validation errors found.

Python's xmlschema

>>> import xmlschema
>>> xsd = xmlschema.XMLSchema('http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd')
>>> xsd.validate('out.xml')
>>>

Python ome_type (used above)

>>> import ome_types
>>> ome_types.from_xml('out.xml')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sbesson/Downloads/venv/lib/python3.10/site-packages/ome_types/_convenience.py", line 29, in from_xml
    return OME(**d)  # type: ignore
  File "/Users/sbesson/Downloads/venv/lib/python3.10/site-packages/ome_types/model/ome.py", line 137, in __init__
    super().__init__(**data)
  File "/Users/sbesson/Downloads/venv/lib/python3.10/site-packages/ome_types/_base_type.py", line 80, in __init__
    super().__init__(**data)
  File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for OME
images -> 0 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)

Looking briefly at the ome_type code, I suspect this is related https://github.com/tlambert03/ome-types/blob/eea4f503e80018ca60be7ed0616e258d1471d455/src/ome_autogen.py#L984-L986 which seems to apply the 20 character limit to the ConstrainedStr directly /cc @tlambert03

This does not invalidate the statement in ome/ome-model#158 (comment) that the value of the Plane.HashSHA1 element is neglectable and that we should move towards deprecating this element and removing it from the FakeReader OME-XML representation.

@tlambert03
Copy link

happy to change that bit in ome_types. I can't remember the details now, but I did that when trying to update xmlschema to v >1.5 ... it gave me some annoyances and I guess that's where I ended up. But it would seem to be harmless to remove the constraint ?

@sbesson
Copy link
Member

sbesson commented Apr 18, 2022

You mean a simple no-op class similar to https://github.com/tlambert03/ome-types/blob/eea4f503e80018ca60be7ed0616e258d1471d455/src/ome_autogen.py#L981 ? This would work for this particular use case. I assume there is a built-in way to use the xmlschema encoding/decoding capabilities but as mentioned above, this specific element is outdated and we'll likely move towards removing it from the synthetically generated images.

joshmoore added a commit to joshmoore/ome-types that referenced this issue Sep 15, 2022
While trying to get the `bioformats2raw.layout` specification
(ome/ngff#112) this raised its head
again. Following the discussion in ome/ome-model#158
changing the size checks to `40` passes.

fix: ome/bioformats#3810
fix: ome/napari-ome-zarr#47 (comment)
tlambert03 pushed a commit to tlambert03/ome-types that referenced this issue Sep 16, 2022
* Update Hex40 definition

While trying to get the `bioformats2raw.layout` specification
(ome/ngff#112) this raised its head
again. Following the discussion in ome/ome-model#158
changing the size checks to `40` passes.

fix: ome/bioformats#3810
fix: ome/napari-ome-zarr#47 (comment)

* Bump to min_length=40
@sbesson sbesson closed this as completed Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants