Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encoding support for shapereader #2231

Closed
wants to merge 1 commit into from

Conversation

smartlixx
Copy link
Contributor

@smartlixx smartlixx commented Aug 8, 2023

Rationale

Based on the discussion of #2041, shapereader should support encoding in order to read shape files with non-default encoding.

Currently, for a shape file with GBK encoding aomen.shp (containing the polygon for Macau)

import cartopy.io.shapereader as shpreader
shp = shpreader.Reader('aomen.shp')
list(shp.records())

will produce error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 0: invalid start byte

After this PR, the encoding can be specified

import cartopy.io.shapereader as shpreader
shp = shpreader.Reader('aomen.shp', encoding='gbk')
list(shp.records())

and the results are

[<Record: <POLYGON ((113.55 22.217, 113.55 22.214, 113.555 22.206, 113.559 22.206, 113...>, 
{'BIANMA': 820000, 'NAME': '澳门特别行政区', 'ENAME': 'Aomen Tebianxingzhengqu'}, <fields>>]

Implications

None as far as I know

@lgolston
Copy link
Contributor

There was a previous suggestion to pass kwargs upstream, but explicitly writing encoding as done here looks right to me to ensure compatibility between Basic and Fiona readers. This will also help close other open issues #739 and #1282 related to handling encoding.

@@ -131,9 +131,9 @@ class BasicReader:

"""

def __init__(self, filename, bbox=None):
def __init__(self, filename, bbox=None, encoding='utf-8'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that the defaults between the two readers isn't the same here... I think the **kwargs solution referenced by @lgolston would help by only sending along the encoding when a user explicitly sets it without needing to worry about the default between Fiona and Pyshp.

@@ -186,10 +186,10 @@ class FionaReader:

"""

def __init__(self, filename, bbox=None):
def __init__(self, filename, bbox=None, encoding=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, filename, bbox=None, encoding=None):
def __init__(self, filename, bbox=None, encoding='utf-8'):

Thinking about the comment from @greglucas, there are a lot of different options since the two libraries do slightly different things (pyshp raises a TypeError if encoding=None is passed; fiona attempts to detect the encoding by default while pyshp does not; and both accept different kwargs other than encoding). I believe the simplest and most compatible change would be to simply make utf-8 the default for both. This also does not seem to lose much functionality for fiona; for example I tried reading a gbk encoded shapefile and while fiona did not raise an error, it also did not actually read it correctly without specifying the encoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should restrict users to only using utf-8 by default. If users had previously installed Fiona and were reading in other encodings, they would have been able to use the data, but if we force the encoding now then that would break them. I guess there is a question of whether we should call one of these libraries our defacto standard and go with that and then manipulate keyword arguments to be compatible with that...

I still think passing **kwargs through is the most flexible and least amount of work, very similar to what the stock_img pr is doing by forwarding them on to the underlying function: #2230

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants