-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading a layer having multiple properties with the same name #1178
Comments
@Paul-Aime thanks for the report! Indeed Fiona doesn't know what to do with properties that have the same name. What format does Airbus use for export and why are they doing this terrible thing? It would be pragmatic to do what QGIS does. |
It is I think the link I provided to get a sample to reproduce the error still works: https://datadoors.intelligence-airbusds.com/export/v1/static/export/export-20230106-11165137.zip It seems like terrible file format indeed, and there is more since the feature in question is also called "geometry", see above cross referenced issue #2752 in geopandas, but that's another topic. I agree that QGIS behavior might be pragmatic, it reminds me of the |
Looks like Pandas is leaning towards standardizing on "name", "name.1", "name.2", etc. |
Looks like it is deprecated in pandas though, and leaning towards letting the user choose the pattern in the future. QGIS does "prop", "prop_1", "prop_2" etc., might be more relevant to follow QGIS if the choice is not let to the user (since GIS related). A quick first step could be to issue a warning when there are multiple features named the same, specifying that only the last one is kept (current behavior). Then it is to be considered if "mangling" should be automatic, optional but the default, optional but not the default, and if current behavior ("clobbering") is kept in the case where mangling is not done, or if an error is raised. Considering discussion in geopandas #2752, it seems like |
Warning as the first step is a great idea! Let's do that for 1.9.1 and then proceed from there. |
@Paul-Aime with the changes in #1201 this is what I now see:
|
Seems great! |
Expected behavior and actual behavior.
When trying to read a layer for which multiple properties have the same name, only the last one is loaded in the returned Collection.
This is definitely bad data format, but GDAL correctly displays all the properties even if they have the same name.
QGIS does append a suffix ("prop", "prop_1", "prop_2", ...).
I think the issue comes from python dictionary with constraints on uniqueness of the keys.
I would have expected to at least get a warning, or better to have the same behavior than in QGIS.
Steps to reproduce the problem.
The data comes from the AIRBUS GeoStore.
With GDAL I can see all the properties:
With fiona I can only see one:
Attempt at tracking down the issue
I suspect that the other "geometry" properties were overridden during translation as a python dictionary, since keys must be unique.
I can trace back that this is true starting from the following line in
fiona/collection.py::Collection.__init__#L236
:which leads to the issue being located in one of the two following lines in
fiona/ogrext.pyx::Session.start
(#L572 or #L577)if #L572 this further comes down to
fiona/ogrtext.pyx::gdal_open_vector#L134
with the following call:So this is either
GDALOpenEx
orGDALDatasetGetLayerByName
, which come frominclude "gdal.pxi"
.I can find them there, but my understanding of the code stops here:
gcore/gdaldataset.cpp::GDALOpenEx#L3231
gcore/gdaldataset.cpp::GDALDatasetGetLayerByName#L4380
Operating system
Ubuntu 20.04.5 LTS x86_64
Fiona and GDAL version and provenance
$ conda list # Name Version Build Channel fiona 1.8.22 py311h3f14cef_5 conda-forge gdal 3.6.1 py311hadb6153_2 conda-forge libgdal 3.6.1 he31f7c0_2 conda-forge
The text was updated successfully, but these errors were encountered: