Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data error in 1:10m Cultural Vectors Download version 5.0.0 #674

Closed
henrykironde opened this issue Jan 19, 2022 · 10 comments
Closed

Data error in 1:10m Cultural Vectors Download version 5.0.0 #674

henrykironde opened this issue Jan 19, 2022 · 10 comments

Comments

@henrykironde
Copy link

OS: MAC and Linux/Ubuntu

Some of the themes may probably have some errors:
I am not sure what it could be but when trying to dump the data into postgres, the tables below error.

Use shp2pgsql to dump data into Postgres DB

shp2pgsql -d -I -W "utf-8"  -s 4326  "/Users/henrykironde/retriever/10m_cultural/ne_10m_admin_1_states_provinces_lakes.shp" "cultural_10m.ne_10m_admin_1_states_provinces_lakes" | psql -U postgres -d postgres --port 9499 --host localhost

Tables failing

ne_10m_admin_1_states_provinces
ne_10m_admin_1_states_provinces_lakes
ne_10m_admin_0_countries_bdg
ne_10m_admin_2_counties
ne_10m_admin_2_counties_lakes
ne_10m_admin_0_countries_pak

ERROR: length for type varchar must be at least 1
LINE 93: "fclass_iso" varchar(0),

@ImreSamu
Copy link
Collaborator

Thank you for the reports!

Maybe the shp2pgsql program is not working correctly? Can you verify with the latest version?

My suggestion as a workaround: try to use the ~latest ogr2ogr tool. ( https://gdal.org/programs/ogr2ogr.html )
/ this repo is using ogr2ogr for conversion: https://github.com/nvkelso/natural-earth-vector/search?q=ogr2ogr /

./10m_cultural/ne_10m_admin_1_states_provinces_lakes.shp

my quick test:

  • based on this github repo - master
  • debian environments
root@fbcd3ecfaacc:/test# ogr2ogr --version
GDAL 3.2.2, released 2021/03/05

root@fbcd3ecfaacc:/test# ogr2ogr \
    -f Postgresql \
    -lco DIM=2 \
    -lco GEOMETRY_NAME=geometry \
    -lco SCHEMA=public \
    -lco precision=NO \
    -nlt GEOMETRY \
    -overwrite \
    -progress \
    -s_srs EPSG:4326 \
    -t_srs EPSG:4326 \
    PG:"dbname=$PGDATABASE user=$PGUSER host=$PGHOST password=$PGPASSWORD port=$PGPORT" \
    "./natural-earth-vector/10m_cultural/ne_10m_admin_1_states_provinces_lakes.shp"
0...10...20...30...40...50...60...70...80...90...100 - done.

root@fbcd3ecfaacc:/test# psql
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.

test=# \d+ ne_10m_admin_1_states_provinces_lakes
                                                                     Table "public.ne_10m_admin_1_states_provinces_lakes"
+------------+-------------------------+-----------+----------+------------------------------------------------------------------------+----------+-------------+--------------+-------------+
|   Column   |          Type           | Collation | Nullable |                                Default                                 | Storage  | Compression | Stats target | Description |
+------------+-------------------------+-----------+----------+------------------------------------------------------------------------+----------+-------------+--------------+-------------+
| ogc_fid    | integer                 |           | not null | nextval('ne_10m_admin_1_states_provinces_lakes_ogc_fid_seq'::regclass) | plain    |             |              |             |
| featurecla | character varying       |           |          |                                                                        | extended |             |              |             |
| scalerank  | integer                 |           |          |                                                                        | plain    |             |              |             |
| adm1_code  | character varying       |           |          |                                                                        | extended |             |              |             |
| diss_me    | integer                 |           |          |                                                                        | plain    |             |              |             |
| iso_3166_2 | character varying       |           |          |                                                                        | extended |             |              |             |
| wikipedia  | character varying       |           |          |                                                                        | extended |             |              |             |
| iso_a2     | character varying       |           |          |                                                                        | extended |             |              |             |
| adm0_sr    | integer                 |           |          |                                                                        | plain    |             |              |             |
| name       | character varying       |           |          |                                                                        | extended |             |              |             |
| name_alt   | character varying       |           |          |                                                                        | extended |             |              |             |
| name_local | character varying       |           |          |                                                                        | extended |             |              |             |
| type       | character varying       |           |          |                                                                        | extended |             |              |             |
| type_en    | character varying       |           |          |                                                                        | extended |             |              |             |
| code_local | character varying       |           |          |                                                                        | extended |             |              |             |
| code_hasc  | character varying       |           |          |                                                                        | extended |             |              |             |
| note       | character varying       |           |          |                                                                        | extended |             |              |             |
| hasc_maybe | character varying       |           |          |                                                                        | extended |             |              |             |
| region     | character varying       |           |          |                                                                        | extended |             |              |             |
| region_cod | character varying       |           |          |                                                                        | extended |             |              |             |
| provnum_ne | integer                 |           |          |                                                                        | plain    |             |              |             |
| gadm_level | integer                 |           |          |                                                                        | plain    |             |              |             |
| check_me   | integer                 |           |          |                                                                        | plain    |             |              |             |
| datarank   | integer                 |           |          |                                                                        | plain    |             |              |             |
| abbrev     | character varying       |           |          |                                                                        | extended |             |              |             |
| postal     | character varying       |           |          |                                                                        | extended |             |              |             |
| area_sqkm  | integer                 |           |          |                                                                        | plain    |             |              |             |
| sameascity | integer                 |           |          |                                                                        | plain    |             |              |             |
| labelrank  | integer                 |           |          |                                                                        | plain    |             |              |             |
| name_len   | integer                 |           |          |                                                                        | plain    |             |              |             |
| mapcolor9  | integer                 |           |          |                                                                        | plain    |             |              |             |
| mapcolor13 | integer                 |           |          |                                                                        | plain    |             |              |             |
| fips       | character varying       |           |          |                                                                        | extended |             |              |             |
| fips_alt   | character varying       |           |          |                                                                        | extended |             |              |             |
| woe_id     | integer                 |           |          |                                                                        | plain    |             |              |             |
| woe_label  | character varying       |           |          |                                                                        | extended |             |              |             |
| woe_name   | character varying       |           |          |                                                                        | extended |             |              |             |
| latitude   | double precision        |           |          |                                                                        | plain    |             |              |             |
| longitude  | double precision        |           |          |                                                                        | plain    |             |              |             |
| sov_a3     | character varying       |           |          |                                                                        | extended |             |              |             |
| adm0_a3    | character varying       |           |          |                                                                        | extended |             |              |             |
| adm0_label | integer                 |           |          |                                                                        | plain    |             |              |             |
| admin      | character varying       |           |          |                                                                        | extended |             |              |             |
| geonunit   | character varying       |           |          |                                                                        | extended |             |              |             |
| gu_a3      | character varying       |           |          |                                                                        | extended |             |              |             |
| gn_id      | integer                 |           |          |                                                                        | plain    |             |              |             |
| gn_name    | character varying       |           |          |                                                                        | extended |             |              |             |
| gns_id     | integer                 |           |          |                                                                        | plain    |             |              |             |
| gns_name   | character varying       |           |          |                                                                        | extended |             |              |             |
| gn_level   | integer                 |           |          |                                                                        | plain    |             |              |             |
| gn_region  | character varying       |           |          |                                                                        | extended |             |              |             |
| gn_a1_code | character varying       |           |          |                                                                        | extended |             |              |             |
| region_sub | character varying       |           |          |                                                                        | extended |             |              |             |
| sub_code   | character varying       |           |          |                                                                        | extended |             |              |             |
| gns_level  | integer                 |           |          |                                                                        | plain    |             |              |             |
| gns_lang   | character varying       |           |          |                                                                        | extended |             |              |             |
| gns_adm1   | character varying       |           |          |                                                                        | extended |             |              |             |
| gns_region | character varying       |           |          |                                                                        | extended |             |              |             |
| min_label  | double precision        |           |          |                                                                        | plain    |             |              |             |
| max_label  | double precision        |           |          |                                                                        | plain    |             |              |             |
| min_zoom   | double precision        |           |          |                                                                        | plain    |             |              |             |
| wikidataid | character varying       |           |          |                                                                        | extended |             |              |             |
| name_ar    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_bn    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_de    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_en    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_es    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_fr    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_el    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_hi    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_hu    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_id    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_it    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_ja    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_ko    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_nl    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_pl    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_pt    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_ru    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_sv    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_tr    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_vi    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_zh    | character varying       |           |          |                                                                        | extended |             |              |             |
| ne_id      | bigint                  |           |          |                                                                        | plain    |             |              |             |
| name_he    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_uk    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_ur    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_fa    | character varying       |           |          |                                                                        | extended |             |              |             |
| name_zht   | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_iso | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_us  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_fr  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_ru  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_es  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_cn  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_tw  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_in  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_np  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_pk  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_de  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_gb  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_br  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_il  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_ps  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_sa  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_eg  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_ma  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_pt  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_ar  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_jp  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_ko  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_vn  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_tr  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_id  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_pl  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_gr  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_it  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_nl  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_se  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_bd  | character varying       |           |          |                                                                        | extended |             |              |             |
| fclass_ua  | character varying       |           |          |                                                                        | extended |             |              |             |
| geometry   | geometry(Geometry,4326) |           |          |                                                                        | main     |             |              |             |
+------------+-------------------------+-----------+----------+------------------------------------------------------------------------+----------+-------------+--------------+-------------+
Indexes:
    "ne_10m_admin_1_states_provinces_lakes_pkey" PRIMARY KEY, btree (ogc_fid)
    "ne_10m_admin_1_states_provinces_lakes_geometry_geom_idx" gist (geometry)
Access method: heap

@ImreSamu
Copy link
Collaborator

probably related to a NULL value problems.

"Shapefiles do not support NULL values. DBase 7 supports NULL values; shapefiles are pre-dBase 7. " https://support.esri.com/en/technical-article/000004294

now the shp2pgsql creating some invalid varchar(0) columns.

CREATE TABLE "ne_10m_admin_1_states_provinces_lakes" (gid serial,
"featurecla" varchar(20),
"scalerank" int2,
"adm1_code" varchar(9),
"diss_me" int4,
"iso_3166_2" varchar(8),
"wikipedia" varchar(84),

....

"name_he" varchar(63),
"name_uk" varchar(89),
"name_ur" varchar(103),
"name_fa" varchar(92),
"name_zht" varchar(61),
"fclass_iso" varchar(0),                    <-----------
"fclass_us" varchar(0),                     <-----------
"fclass_fr" varchar(0),                       <-----------
"fclass_ru" varchar(12),
"fclass_es" varchar(0),                      <-----------
"fclass_cn" varchar(18),
"fclass_tw" varchar(12),
"fclass_in" varchar(12),
"fclass_np" varchar(12),
"fclass_pk" varchar(12),
"fclass_de" varchar(0),                      <-----------
"fclass_gb" varchar(0),                      <-----------
"fclass_br" varchar(0),                       <-----------
"fclass_il" varchar(0),                         <-----------
"fclass_ps" varchar(0),                       <-----------

@henrykironde
Copy link
Author

Thank you @ImreSamu, I noticed and was wondering if there is a way the data can be supplied with only fields that contain at least something. This way, we could eliminate having 0 field sizes. In case some new data shows up for a field, you could add that field and update the data version.
I am just brain storming.

@nvkelso
Copy link
Owner

nvkelso commented Jan 19, 2022

Seems like it could be an empty string not a null. Would that fix your import issues?

@henrykironde
Copy link
Author

@nvkelso empty strings would not be a good representation of the data in my opinion. I think it would be nice to keep data as original as possible and to a standard.

@ImreSamu
Copy link
Collaborator

@henrykironde :

brain storming ...

🤔
my quick & dirty - work-around .. not a production quality ; maybe you can improve
the trick is sed "s/varchar(0)/varchar(1)/g"

shp2pgsql -s 4326 -D  \
     -d "./natural-earth-vector/10m_cultural/ne_10m_admin_1_states_provinces_lakes.shp" \
    "ne_10m_admin_1_states_provinces_lakes"  |  sed "s/varchar(0)/varchar(1)/g" | psql

or safer & slower ( with insert )

psql -c "drop table if exists ne_10m_admin_1_states_provinces_lakes;"
shp2pgsql -s 4326 -p "./natural-earth-vector/10m_cultural/ne_10m_admin_1_states_provinces_lakes.shp" "ne_10m_admin_1_states_provinces_lakes" | sed "s/varchar(0)/varchar(1)/g" | psql
shp2pgsql -s 4326 -a "./natural-earth-vector/10m_cultural/ne_10m_admin_1_states_provinces_lakes.shp" "ne_10m_admin_1_states_provinces_lakes" | psql

@gasparesganga
Copy link

Hi. Since this issue has been referenced from another one in my php-shapefile library asking me to come up with my 2 cents, here is my take on the subject.

First of all, for ESRI Shapefiles to be compliant to the standard, they should adhere to dBase III+ specs (yep, even if Wikipedia and other sources report it as dBase IV version, they are wrong. It is actually dBase III+). This means that a zero-sized field is not allowed, since field size must be at least 1.
Thinking carefully about it, a field that allows no data has no meaning at all. Not in dBase nor in any DBMS. One could argue that if no data can be stored in each record for that field, why should it be there?

I understand that the purpose here is to have some NULL values in each row for those fields, in order to keep the same basic structure across all the shapefiles in the dataset. But I think that going for some zero-sized fields breaks the standard too much, causing some major headaches to most libraries and GIS software out there.

Since dBase III+ and as a consequence, shapefiles do not allow NULL values nor nullable fields, but shapefiles are the way to go format for interoperability and data sharing in GIS environments, during the years it's been a common practice for software and libraries to extend dBase specs with some simulated NULL values, simply defining those fields that they intend as nullable as they would normally do and inserting as many asterisks (*) as the field size allow to represent NULL values in rows.

E.g. A CHAR field that allows up to 15 characters would represent NULL like that: ***************

This, of course, is not part of the specs, but it allows the representation of NULL values without butchering the standard and yet producing some standard-complying DBFs. After all, any software that is not capable of interpreting *************** as NULL will simply read it as a string. Surely, issues may arise when that nullable field is of NUMERIC or DATE type, but this is still the lesser evil, since a software that is not able to interprete them as NULL would probably sanificate the values and reading it respectively as 0 and 0001-01-01.
Bottom line is: this is not part of the specs but it most likely does not break anything (in case of CHAR fields it does not break anything 100%).

Different software might handle this differently, using another character instead of *.
In my library, which is being used by quite a few open source and commercial projects worldwide, I handled this giving the programmer the chance to allow simulated NULL values and choose which character should be used (OPTION_DBF_NULL_PADDING_CHAR), in order to make as many people as possible happy :-)


In this specific scenario, I think the best, most elegant and above all non-breaking solution would be to re-define those zero-sized fields as CHAR(1) or CHAR(n) and pad it with * or another character of choice.

@nvkelso
Copy link
Owner

nvkelso commented May 12, 2022

This has been resolved in 5.1.1 with an upstream change to mapshaper tool. I'll finish posting the files tonight.

@nvkelso
Copy link
Owner

nvkelso commented May 12, 2022

More specifically, zero length columns are now 1 char long instead (but still null content).

@nvkelso
Copy link
Owner

nvkelso commented May 13, 2022

Fixed in v5.1.1 and released.

@nvkelso nvkelso closed this as completed May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants