-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
geopandas: Mapping int/int64 to int32 for OGR_GMT format #2592
Changes from 1 commit
8cec0c4
eacf45b
f2f10ca
47bc26f
f9b7903
84f87a5
c1b3b43
20891c1
60ba52a
a6d4315
1e05a29
2097add
a4ddcfe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -126,18 +126,27 @@ def tempfile_from_geojson(geojson): | |||||||
E.g. '1a2b3c4d5e6.gmt'. | ||||||||
""" | ||||||||
with GMTTempFile(suffix=".gmt") as tmpfile: | ||||||||
# pylint: disable=import-outside-toplevel | ||||||||
import geopandas as gpd | ||||||||
|
||||||||
os.remove(tmpfile.name) # ensure file is deleted first | ||||||||
ogrgmt_kwargs = {"filename": tmpfile.name, "driver": "OGR_GMT", "mode": "w"} | ||||||||
try: | ||||||||
# Workaround to map int/int64 to int32 | ||||||||
# https://github.com/geopandas/geopandas/issues/967#issuecomment-842877704 | ||||||||
# https://github.com/GenericMappingTools/pygmt/issues/2497 | ||||||||
schema = gpd.io.file.infer_schema(geojson) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, looking into this Of course, we'll still need to force the OGR_GMT output to int32 in PyGMT, but the code would be much simplified with some upstream fixes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There is already an open PR geopandas/geopandas#2265, but it has been inactive for one year. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ok, and I saw geopandas/geopandas#2464 as well, also inactive 😆 Started a small PR at geopandas/geopandas#2950 to just handle int32, let's see if their test suite passes, I took a look at https://fiona.readthedocs.io/en/latest/manual.html#keeping-schemas-simple and
So not sure if the geopandas default driver (ESRI Shapefile?) supports int32 schema. It it doesn't and the test suite breaks, then we might need to handle everything fully in PyGMT. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's still unclear if there will be any changes in GDAL, GeoPandas, Fiona or GMT, but as we are still using old versions of these packages, I believe we still need these workarounds in the next few years, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the patch in geopandas has been merged at geopandas/geopandas#2950, but will need to wait for geopandas v0.13.3 or newer to be released. Even so, int64 will still need to be converted to int32, so yes, we'll need to keep this workaround for a while. |
||||||||
for col, dtype in schema["properties"].items(): | ||||||||
if dtype in ("int", "int64"): | ||||||||
schema["properties"][col] = "int32" | ||||||||
ogrgmt_kwargs["schema"] = schema | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're going to need to read the schema anyway, should we just use the pure There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds a good idea, although I have no experience with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think we can't. Currently, we use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, let's just stick to using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only write schema if schema["properties"] is not None. Should fix that error about
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's not the root cause of the error. Here is a modified version of the import geopandas as gpd
import shapely
linestring = shapely.geometry.LineString([(20, 15), (30, 15)])
polygon = shapely.geometry.Polygon([(20, 10), (23, 10), (23, 14), (20, 14)])
multipolygon = shapely.geometry.shape(
{
"type": "MultiPolygon",
"coordinates": [
[
[[0, 0], [20, 0], [10, 20], [0, 0]], # Counter-clockwise
[[3, 2], [10, 16], [17, 2], [3, 2]], # Clockwise
],
[[[6, 4], [14, 4], [10, 12], [6, 4]]], # Counter-clockwise
[[[25, 5], [30, 10], [35, 5], [25, 5]]],
],
}
)
# Multipolygon first so the OGR_GMT file has @GMULTIPOLYGON in the header
gdf = gpd.GeoDataFrame(
index=["multipolygon", "polygon", "linestring"],
geometry=[multipolygon, polygon, linestring],
)
schema = gpd.io.file.infer_schema(gdf)
print(gdf)
print(schema)
gdf.to_file("test.gmt", driver="OGR_GMT", mode="w") The output of the two
To understand what geopandas/fiona does, I also added the same two print statements to source code The two print statements in the geopandas source code give:
As you can see, both the In our case, our inferred schema has a empty property dict There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think writing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Setting
Notice the lines with the comments Is there another way to handle this? E.g. cast the 'index' column explicitly to a string type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This won't always work, because the index property in a GeoDataFrame object can be integers. For example, in the above example, the object can be defined like
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Probably just need these two lines, call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah sorry, I thought
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks good. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Patch applied in 2097add. |
||||||||
# Using geopandas.to_file to directly export to OGR_GMT format | ||||||||
geojson.to_file(**ogrgmt_kwargs) | ||||||||
except AttributeError: | ||||||||
# pylint: disable=import-outside-toplevel | ||||||||
# Other 'geo' formats which implement __geo_interface__ | ||||||||
import json | ||||||||
|
||||||||
import fiona | ||||||||
import geopandas as gpd | ||||||||
|
||||||||
with fiona.Env(): | ||||||||
jsontext = json.dumps(geojson.__geo_interface__) | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add a unit test to
pygmt/tests/test_geopandas.py
that uses int/int64 columns. Probably a good idea to check uint/uint64 too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line https://github.com/geopandas/geopandas/blob/35463adda663dfa57c9809865ebbb15448823278/geopandas/io/file.py#L640
For any integer-like data types (like int, int32, int64, unit, uint64 and more), the
out_type
is alwaysint
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, then a unit test with just int64 columns should be fine, no need for parametrized tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiji14 I have never used geopandas and I find it challenging for me to add a geopandas test for this PR. Could you please help with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting a little late for me, but let me post the test example I'm working on:
Couldn't get it to plot though, getting some weird errors like:
Might be another bug? I'll try to come up with an alternative example tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pygmt-2m6nvbqu.gmt.txt
This is the temporary OGR/GMT file generated by
tempfile_from_geojson
(with an extra.txt
suffix so that I can upload it to GitHub).The first few lines are:
Apparently, GMT incorrectly parse the
@null
string in the@Jp
and@Jw
lines as an invalid OGR/GMT code. So it looks like an upstream bug.Edit: opened a bug report in GenericMappingTools/gmt#7712.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oo, didn't notice that
@null
. I see at https://proj.org/en/9.2/usage/transformation.html#the-null-grid that you can specify null, by why would there be an@
symbol?!! A little bit of searching in the https://github.com/OSGeo/PROJ repo shows that this might be valid though?Do we need to report this to GMT, geopandas or PROJ? In the meantime, I can try a projection that doesn't have
@null
in the PROJ string.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
@
means the grid is optional (see https://github.com/OSGeo/PROJ/blob/0372c81511a11afa76e01ebb30adb00bcb950772/NEWS#L2936). Also see the search results of=@
: https://github.com/search?q=repo%3AOSGeo%2FPROJ+%3D%40&type=codeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've added the unit test at commit c1b3b43, and used a projection that doesn't have
@null
, feel free to modify if needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the upstream bug has been fixed.