Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading a layer having multiple properties with the same name #1178

Closed
Paul-Aime opened this issue Jan 6, 2023 · 7 comments
Closed

Reading a layer having multiple properties with the same name #1178

Paul-Aime opened this issue Jan 6, 2023 · 7 comments
Labels
Milestone

Comments

@Paul-Aime
Copy link

Expected behavior and actual behavior.

When trying to read a layer for which multiple properties have the same name, only the last one is loaded in the returned Collection.

This is definitely bad data format, but GDAL correctly displays all the properties even if they have the same name.

QGIS does append a suffix ("prop", "prop_1", "prop_2", ...).

I think the issue comes from python dictionary with constraints on uniqueness of the keys.

I would have expected to at least get a warning, or better to have the same behavior than in QGIS.

Steps to reproduce the problem.

The data comes from the AIRBUS GeoStore.

wget "https://datadoors.intelligence-airbusds.com/export/v1/static/export/export-20230106-11165137.zip"
unzip -o "export-20230106-11165137.zip" -d "geostore_products"

With GDAL I can see all the properties:

$ ogrinfo geostore_products Export
...
geometry: String (254.0)
geometry: Integer64 (18.0)
geometry: Integer64 (18.0)
geometry: Integer64 (18.0)
geometry: Integer64 (18.0)
...

With fiona I can only see one:

import fiona

fpath = "geostore_products"
c = fiona.open(fpath, layer="Export")

# Only one geometry property
print([p for p in c.schema["properties"].keys() if p.startswith("geo")])  # ['geometry']

# And this is not the interesting one
print(c.schema["properties"]["geometry"])  # int:18

Attempt at tracking down the issue

I suspect that the other "geometry" properties were overridden during translation as a python dictionary, since keys must be unique.

I can trace back that this is true starting from the following line in fiona/collection.py::Collection.__init__#L236:

self.session.start(self, **kwargs)

which leads to the issue being located in one of the two following lines in fiona/ogrext.pyx::Session.start (#L572 or #L577)

self.cogr_ds = gdal_open_vector(path_c, 0, drivers, kwargs)        #L572
self.cogr_layer = GDALDatasetGetLayerByName(self.cogr_ds, name_c)  #L577

if #L572 this further comes down to fiona/ogrtext.pyx::gdal_open_vector#L134 with the following call:

cogr_ds = exc_wrap_pointer(
    GDALOpenEx(path_c, flags, <const char *const *>drvs, <const char *const *>open_opts, NULL)
)

So this is either GDALOpenEx or GDALDatasetGetLayerByName, which come from include "gdal.pxi".

I can find them there, but my understanding of the code stops here:

Operating system

Ubuntu 20.04.5 LTS x86_64

Fiona and GDAL version and provenance

$ conda list
# Name                    Version                   Build  Channel
fiona                     1.8.22          py311h3f14cef_5    conda-forge
gdal                      3.6.1           py311hadb6153_2    conda-forge
libgdal                   3.6.1                he31f7c0_2    conda-forge
@sgillies
Copy link
Member

@Paul-Aime thanks for the report! Indeed Fiona doesn't know what to do with properties that have the same name. What format does Airbus use for export and why are they doing this terrible thing?

It would be pragmatic to do what QGIS does.

@Paul-Aime
Copy link
Author

It is .shp file format.

I think the link I provided to get a sample to reproduce the error still works: https://datadoors.intelligence-airbusds.com/export/v1/static/export/export-20230106-11165137.zip

It seems like terrible file format indeed, and there is more since the feature in question is also called "geometry", see above cross referenced issue #2752 in geopandas, but that's another topic.

I agree that QGIS behavior might be pragmatic, it reminds me of the mangle_dupe_cols in pandas.read_csv function.

@sgillies
Copy link
Member

Looks like Pandas is leaning towards standardizing on "name", "name.1", "name.2", etc.

@Paul-Aime
Copy link
Author

Looks like it is deprecated in pandas though, and leaning towards letting the user choose the pattern in the future.

QGIS does "prop", "prop_1", "prop_2" etc., might be more relevant to follow QGIS if the choice is not let to the user (since GIS related).

A quick first step could be to issue a warning when there are multiple features named the same, specifying that only the last one is kept (current behavior).

Then it is to be considered if "mangling" should be automatic, optional but the default, optional but not the default, and if current behavior ("clobbering") is kept in the case where mangling is not done, or if an error is raised.

Considering discussion in geopandas #2752, it seems like pyogrio prefers no mangling at all and let an error arise in geopandas.

@sgillies
Copy link
Member

sgillies commented Jan 26, 2023

Warning as the first step is a great idea! Let's do that for 1.9.1 and then proceed from there.

@sgillies
Copy link
Member

sgillies commented Feb 9, 2023

@Paul-Aime with the changes in #1201 this is what I now see:

fio info --layer Export --indent 2 geostore_products/
WARNING:fiona.ogrext:Field name collision detected, field is skipped: i=2, key='geometry'
WARNING:fiona.ogrext:Field name collision detected, field is skipped: i=3, key='geometry'
WARNING:fiona.ogrext:Field name collision detected, field is skipped: i=4, key='geometry'
WARNING:fiona.ogrext:Field name collision detected, field is skipped: i=5, key='geometry'
{
  "driver": "ESRI Shapefile",
  "schema": {
    "properties": {
      "date": "str:254",
      "geometry": "str:254",
      "indexDat": "str:254",
      "uid": "str:254",
      "xuid": "str:254",
      "XZIncide": "int:18",
      "YZIncide": "int:18",
      "acrossTr": "int:18",
      "alongTra": "int:18",
      "viewingA": "int:18",
      "browseLa": "str:254",
      "browseMe": "str:254",
      "browseSm": "str:254",
      "overlayG": "str:254",
      "overlayM": "str:254",
      "resoluti": "int:18",
      "id": "str:254",
      "provider": "str:254",
      "sunAzimu": "int:18",
      "sunEleva": "int:18",
      "vehicle": "str:254",
      "cloudCov": "int:18",
      "constell": "str:254",
      "groundSt": "str:254",
      "incidenc": "int:18",
      "name": "str:254",
      "orientat": "int:18",
      "snowCove": "int:18",
      "remoteRa": "str:254",
      "archiveN": "str:254",
      "archiveU": "str:254",
      "category": "str:254",
      "productN": "str:254",
      "productU": "str:254",
      "pin": "str:1",
      "browse": "str:1"
    },
    "geometry": "Polygon"
  },
  "crs": "EPSG:4326",
  "crs_wkt": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]",
  "name": "Export",
  "bounds": [
    -1.611233172732394,
    43.38495043561227,
    -1.345365602139011,
    43.57314289987615
  ],
  "count": 2
}

@Paul-Aime
Copy link
Author

Seems great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants