Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish Shapefile wih invalid byte sequence #660

Closed
fzadrazil opened this issue Sep 23, 2022 · 5 comments
Closed

Publish Shapefile wih invalid byte sequence #660

fzadrazil opened this issue Sep 23, 2022 · 5 comments
Labels
enhancement New feature or request
Milestone

Comments

@fzadrazil
Copy link
Collaborator

Text encoding of shapefile seems to get changed during the upload to Layman which results in DB error. Steps to reproduce the issue:

  1. Upload this data to Layman using test client
    data.zip
    When opened in QGIS, encoding is set to UTF-8 and attribute values are correct
    image
    image

  2. Publishing fails with this error

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/celery/app/trace.py", line 405, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/code/src/layman/make_celery.py", line 36, in __call__
    return self.run(*args, **kwargs)
  File "/code/src/layman/layer/db/tasks.py", line 63, in refresh_table
    raise LaymanError(err_code, private_data=pg_error)
layman.http.LaymanError: LaymanError code=11 message=Error during import data into DB data=None private_info=b'ERROR 1: COPY statement failed.\nERROR:  invalid byte sequence for encoding "UTF8": 0xc3 0x09\nCONTEXT:  COPY obce_body_5514, line 9\n\n'
  1. Download the input_file from Layman and open in QGIS
    obce_body_5514.zip

  2. Encoding is set to Win-1250 and field values are broken
    image
    image

@fzadrazil fzadrazil added the bug Something isn't working label Sep 23, 2022
@fzadrazil
Copy link
Collaborator Author

Data package including cpg file
data2.zip

@jirik
Copy link
Member

jirik commented Sep 26, 2022

The error is actually correct. Input file contains invalid byte sequence for UTF8 encoding. The invalid sequence is located e.g. in row OB.569399, column VlajkaText: Bílý list s červeným žerďovým pruhem širokým pětinu délky listu se třemi žlutými vztyčenými lipovými listy nad sebou. V bílém poli modrá vodorovná rozsocha se třemi rameny širokými pětinu šířky listu, krajní ramena vycházej�

The incorrect sequence is visualized as . It appears in more places in the file.

@jirik
Copy link
Member

jirik commented Sep 26, 2022

I change label from bug to enhancement, because input file is broken.

@jirik jirik added enhancement New feature or request and removed bug Something isn't working labels Sep 26, 2022
@jirik
Copy link
Member

jirik commented Sep 29, 2022

Could be solved e.g. by shp -> geojson -> iconv -> postgresql:

ogr2ogr -nlt GEOMETRY --config OGR_ENABLE_PARTIAL_REPROJECTION TRUE -unsetFid -a_srs EPSG:5514 -f GeoJSON /vsistdout/ /layman_data/workspaces/browser/layers/t10/input_file/t10.shp | iconv -c -t utf8 | ogr2ogr -nln layer_1013fb7c_96cb_401f_95b2_964649b8c368 -nlt GEOMETRY --config OGR_ENABLE_PARTIAL_REPROJECTION TRUE -lco SCHEMA=browser -f PostgreSQL -unsetFid "PG:host='postgresql' port='5432' dbname='gis' user='docker' password='docker'" -a_srs EPSG:5514 -lco PRECISION=NO /vsistdin/

Some data types and floating point numbers and coordinates on higher decimal places may be different comparing to direct import shp -> postgresql.

@index-git
Copy link
Collaborator

We can force ogr2ogr to export GeoJSON in UTF-8 with lco ENCODING=UTF-8.

@jirik jirik changed the title Encoding of Shapefile field values Publish Shapefile wih invalid byte sequence Oct 3, 2022
@jirik jirik added this to the Future release milestone Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants