Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing a dataset from GPKG with multiple XML attachments fails #547

Open
olsen232 opened this issue Jan 20, 2022 · 1 comment
Open

Importing a dataset from GPKG with multiple XML attachments fails #547

olsen232 opened this issue Jan 20, 2022 · 1 comment

Comments

@olsen232
Copy link
Collaborator

The error message is currently extremely unhelpful:

Traceback (most recent call last):
  File "kart_cli.py", line 4, in <module>
  File "kart\cli.py", line 334, in entrypoint
  File "lib\site-packages\click\core.py", line 829, in __call__
  File "lib\site-packages\click\core.py", line 782, in main
  File "kart\cli.py", line 157, in invoke
  File "lib\site-packages\click\core.py", line 1259, in invoke
  File "lib\site-packages\click\core.py", line 1066, in invoke
  File "lib\site-packages\click\core.py", line 610, in invoke
  File "lib\site-packages\click\decorators.py", line 21, in new_func
  File "kart\init.py", line 355, in import_
  File "kart\fast_import.py", line 349, in fast_import_tables
  File "kart\fast_import.py", line 532, in _import_single_source
  File "kart\fast_import.py", line 549, in write_blobs_to_stream
  File "kart\fast_import.py", line 543, in write_blob_to_stream
TypeError: a bytes-like object is required, not 'list'

We only support one piece of attached metadata XML, whereas the GPKG spec allows for arbitrarily many. Trying to edit and commit a second XML attachment has a slightly better behaviour - firstly it has a better error message:
Sorry, committing more than one XML metadata file is not supported
And secondly, it's slightly less likely to happen - it's much more likely that a user will try to import an existing GPKG from some other system that happens to have multiple XML attachments than that they will edit the one in their working copy in this way, and if they do, they are more likely to be able to undo what they have done (if all else fails, by running kart reset or similar).

In the GPKGs I have seen, one of the XML attachments is often junk anyway. For instance, in the following example:

First XML attachment:

<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
<qgis version="3.20.3-Odense">
  <identifier></identifier>
  <parentidentifier></parentidentifier>
  <language></language>
  <type></type>
  <title></title>
  <abstract></abstract>
  <contact>
    <name></name>
    <organization></organization>
    <position></position>
    <voice></voice>
    <fax></fax>
    <email></email>
    <role></role>
  </contact>
  <links/>
  <fees></fees>
  <encoding></encoding>
  <crs>
    <spatialrefsys>
      <wkt></wkt>
      <proj4></proj4>
      <srsid>0</srsid>
      <srid>0</srid>
      <authid></authid>
      <description></description>
      <projectionacronym></projectionacronym>
      <ellipsoidacronym></ellipsoidacronym>
      <geographicflag>false</geographicflag>
    </spatialrefsys>
  </crs>
  <extent>
    <spatial minx="0" miny="0" dimensions="2" maxz="0" crs="" maxy="0" minz="0" maxx="0"/>
    <temporal>
      <period>
        <start></start>
        <end></end>
      </period>
    </temporal>
  </extent>
</qgis>

Second XML attachment

<GDALMultiDomainMetadata>
  <Metadata>
    <MDI key="GPKG_METADATA_ITEM_1">&lt;!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'&gt;
&lt;qgis version="3.20.3-Odense"&gt;
  &lt;identifier&gt;&lt;/identifier&gt;
  &lt;parentidentifier&gt;&lt;/parentidentifier&gt;
  &lt;language&gt;&lt;/language&gt;
  &lt;type&gt;&lt;/type&gt;
  &lt;title&gt;&lt;/title&gt;
  &lt;abstract&gt;&lt;/abstract&gt;
  &lt;contact&gt;
    &lt;name&gt;&lt;/name&gt;
    &lt;organization&gt;&lt;/organization&gt;
    &lt;position&gt;&lt;/position&gt;
    &lt;voice&gt;&lt;/voice&gt;
    &lt;fax&gt;&lt;/fax&gt;
    &lt;email&gt;&lt;/email&gt;
    &lt;role&gt;&lt;/role&gt;
  &lt;/contact&gt;
  &lt;links/&gt;
  &lt;fees&gt;&lt;/fees&gt;
  &lt;encoding&gt;&lt;/encoding&gt;
  &lt;crs&gt;
    &lt;spatialrefsys&gt;
      &lt;wkt&gt;&lt;/wkt&gt;
      &lt;proj4&gt;&lt;/proj4&gt;
      &lt;srsid&gt;0&lt;/srsid&gt;
      &lt;srid&gt;0&lt;/srid&gt;
      &lt;authid&gt;&lt;/authid&gt;
      &lt;description&gt;&lt;/description&gt;
      &lt;projectionacronym&gt;&lt;/projectionacronym&gt;
      &lt;ellipsoidacronym&gt;&lt;/ellipsoidacronym&gt;
      &lt;geographicflag&gt;false&lt;/geographicflag&gt;
    &lt;/spatialrefsys&gt;
  &lt;/crs&gt;
  &lt;extent&gt;
    &lt;spatial minx="0" miny="0" dimensions="2" maxz="0" crs="" maxy="0" minz="0" maxx="0"/&gt;
    &lt;temporal&gt;
      &lt;period&gt;
        &lt;start&gt;&lt;/start&gt;
        &lt;end&gt;&lt;/end&gt;
      &lt;/period&gt;
    &lt;/temporal&gt;
  &lt;/extent&gt;
&lt;/qgis&gt;
</MDI>
  </Metadata>
</GDALMultiDomainMetadata>

In this example, the first XML file happens not to contain any useful information, and the second XML file is just a wrapped-and-escaped version of the first XML file that needs extra parsing. The junkier XML file is slightly longer in this case, so we can't use assume that "longer" means "more informative" if we develop a heuristic to decide which XML gets to stay. We can probably detect the case where >90% of an XML file is just the same as the other XML file and the remainder is just boilerplate.

@olsen232
Copy link
Collaborator Author

Error message is improved, and the particular example shown above now drops the second (junk) XML file.
#548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant