Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error loading geographical data with version 7.10.3211 #295

Open
ma-garcia opened this issue Jan 15, 2015 · 37 comments
Open

Error loading geographical data with version 7.10.3211 #295

ma-garcia opened this issue Jan 15, 2015 · 37 comments

Comments

@ma-garcia
Copy link

ma-garcia commented Jan 15, 2015

Hi. I'm running two Virtuoso instances, version 7.00.3203 and version 7.10.3211 on a linux machine.

I use virtuoso jdbc 3 to load geographical data like this:

<http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101>
      a       <http://www.opengis.net/ont/sf#Point> , <http://www.w3.org/2003/01/geo/wgs84_pos#Point> ;
      <http://www.opengis.net/ont/geosparql#asWKT>
              "<http://www.opengis.net/def/crs/OGC/1.3/CRS84>Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#lat>
              "41.635338"^^<http://www.w3.org/2001/XMLSchema#double> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#long>
              "-0.911101"^^<http://www.w3.org/2001/XMLSchema#double> .
  • In version 7.00.3203, data is loaded properly without problem.
  • In version 7.10.3211, I get the following error:

    RDFGE: RDF box with a geometry RDF type and a non-geometry content

If I remove the property asWKT, leaving just the lat and long properties and try to load it in version 7.10.3211, data is loaded properly.

I have checked data syntax and it is correct according to geosparql so I don't know what the problem is.

This issue is related with issue 274.

@HughWilliams
Copy link
Collaborator

HughWilliams commented Jan 16, 2015

I have been able to recreate, even loading from isql using the ttlp() function thus it is not JDBC related:

SQL> ttlp ('http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101       a       http://www.opengis.net/ont/sf#Point , http://www.w3.org/2003/01/geo/wgs84_pos#Point ;       http://www.opengis.net/ont/geosparql#asWKT               "http://www.opengis.net/def/crs/OGC/1.3/CRS84Point(-0.911101 41.635338)"^^http://www.opengis.net/ont/geosparql#wktLiteral ;       http://www.w3.org/2003/01/geo/wgs84_pos#lat               "41.635338"^^http://www.w3.org/2001/XMLSchema#double ;       http://www.w3.org/2003/01/geo/wgs84_pos#long               "-0.911101"^^http://www.w3.org/2001/XMLSchema#double .','','http://geo',0);

**\* Error 42000: VD [Virtuoso Server]RDFGE: RDF box with a geometry RDF type and a non-geometry content
in
rdf_box:(BIF),
DB.DBA.TTLP_RL_TRIPLE_L([executable]/ttlpv.sql:255),
rdf_load_turtle:(BIF),
DB.DBA.TTLP_V([executable]/ttlpv.sql:554),
DB.DBA.TTLP([executable]/sparql.sql:2888),
<Top Level>
at line 17 of Top-Level:
ttlp ('http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101       a       http://www.opengis.net/ont/sf#Point , http://www.w3.org/2003/01/geo/wgs84_pos#Point ;       http://www.opengis.net/ont/geosparql#asWKT               "http://www.opengis.net/def/crs/OGC/1.3/CRS84Point(-0.911101 41.635338)"^^http://www.opengis.net/ont/geosparql#wktLiteral ;       http://www.w3.org/2003/01/geo/wgs84_pos#lat               "41.635338"^^http://www.w3.org/2001/XMLSchema#double ;       http://www.w3.org/2003/01/geo/wgs84_pos#long               "-0.911101"^^http://www.w3.org/2001/XMLSchema#double .','','http://geo',0)
SQL>

This issue has been reported to development to look into ...

@yonyonson
Copy link

yonyonson commented Feb 24, 2015

Any progress on this one? I have following SPARQL failing (with another CRS than WGS84):

INSERT INTO GRAPH http://test.delete.me/ {
    http://test.delete.me/geo http://www.opengis.net/ont/geosparql#geometry "<http://www.opengis.net/def/crs/EPSG/0/25833>POLYGON ((361895.00009999983 7315465.0001000017, 365966.00009999983 7317083.0001000017, 366027.00009999983 7317152.0001000017, 365784.00009999983 7318672.0001000017, 365741.00009999983 7318698.0001000017, 365662.00009999983 7318707.0001000017, 362737.00009999983 7319795.0001000017, 362607.00009999983 7319787.0001000017, 357663.00009999983 7320780.0001000017))"^^http://www.opengis.net/ont/geosparql#wktLiteral
}

Error is:

Virtuoso 42000 Error RDFGE: RDF box with a geometry RDF type and a non-geometry content

SPARQL query:
define sql:big-data-const 0 
#output-format:text/html
define sql:signal-void-variables 1 INSERT INTO GRAPH http://test.delete.me/ {
    http://test.delete.me/geo http://www.opengis.net/ont/geosparql#geometry "<http://www.opengis.net/def/crs/EPSG/0/25833>POLYGON ((361895.00009999983 7315465.0001000017, 365966.00009999983 7317083.0001000017, 366027.00009999983 7317152.0001000017, 365784.00009999983 7318672.0001000017, 365741.00009999983 7318698.0001000017, 365662.00009999983 7318707.0001000017, 362737.00009999983 7319795.0001000017, 362607.00009999983 7319787.0001000017, 357663.00009999983 7320780.0001000017))"^^http://www.opengis.net/ont/geosparql#wktLiteral
}

I am running 07.20.3212 on Ubuntu 14.04 LTS.

@HughWilliams
Copy link
Collaborator

@yonyonson: This issue is still to be resolved, I have reported your occurrence to the bug report so it can be checked also ...

@boferri
Copy link

boferri commented Feb 1, 2016

any news on this issue? currently, we are trying to load the RDF dumps from DNB into Virtuoso (stable/7 (docker container) and develop/7 (self-compiled)) without any success so far. we are always getting the error from above (Virtuoso 42000 Error RDFGE: rdf box with a geometry rdf type and a non geometry content). According to some DNB representatives, an example that includes geo data looks like this (note the reply is in German). Thanks a lot in advance for any help.

@ma-garcia
Copy link
Author

ma-garcia commented Jun 13, 2016

Hi everyone,

I've uploaded my Virtuoso instance to 07.20.3217 and I still have the same problem. But I've found a way to upload the data. Instead of

<http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101>
      a       <http://www.opengis.net/ont/sf#Point> , <http://www.w3.org/2003/01/geo/wgs84_pos#Point> ;
      <http://www.opengis.net/ont/geosparql#asWKT>
              "<http://www.opengis.net/def/crs/OGC/1.3/CRS84>Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#lat>
              "41.635338"^^<http://www.w3.org/2001/XMLSchema#double> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#long>
              "-0.911101"^^<http://www.w3.org/2001/XMLSchema#double> .

I have used this other structure:

<http://www.zaragoza.es/api/recurso/geometry/WGS84/41.635338_-0.911101>
      a       <http://www.opengis.net/ont/sf#Point> , <http://www.w3.org/2003/01/geo/wgs84_pos#Point> ;
      <http://www.opengis.net/ont/geosparql#crs>
              <http://www.opengis.net/def/crs/OGC/1.3/CRS84>;
      <http://www.opengis.net/ont/geosparql#asWKT>
              "Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#lat>
              "41.635338"^^<http://www.w3.org/2001/XMLSchema#double> ;
      <http://www.w3.org/2003/01/geo/wgs84_pos#long>
              "-0.911101"^^<http://www.w3.org/2001/XMLSchema#double> .

My question is if the previous structure is gonna be supported in next versions.

@jakubklimek
Copy link

jakubklimek commented Jan 17, 2017

@HughWilliams I also have the same issue, with the current develop/7 c30a3c7 with the CRS specification in geo:wktLiteral, which is how GeoSPARQL specifies it. Can the Virtuoso processing be turned off somehow? Or can this be fixed?

@HughWilliams
Copy link
Collaborator

@jakubklimek: GeoSPARQL support is being scheduled for the next Virtuoso 8 release ...

@p1d1d1
Copy link

p1d1d1 commented Sep 18, 2017

My experience so far with Virtuoso OS 7.2.4 is that:

@nandana
Copy link

nandana commented Oct 1, 2018

Hi,

I am using OpenLink Virtuoso Server VOS 07.20.3229 and run into the same problem while loading Wikidata. Is there any workaround for this issue until GeoSPARQL support implemented?

So based on what I read from @ma-garcia, if I convert

_:x <http://www.opengis.net/ont/geosparql#asWKT>
              "<http://www.opengis.net/def/crs/OGC/1.3/CRS84>Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>  .

to

_:x <http://www.opengis.net/ont/geosparql#crs>
              <http://www.opengis.net/def/crs/OGC/1.3/CRS84>;
      <http://www.opengis.net/ont/geosparql#asWKT>
              "Point(-0.911101 41.635338)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>  .

I should be able to load the data without an issue to Virtuoso, right?

The other option, according to GH-773, it says that if a new datatype is used (and declare it as a rdfs:subPropertyOf <http://www.opengis.net/ont/geosparql#wktLiteral>), it can prevent Virtuoso from complaining. Are there any other better solution for this? It is kind of a showstopper for us!

Best Regards,
Nandana

@HughWilliams
Copy link
Collaborator

HughWilliams commented Oct 1, 2018

Base GeoSPARQL support was added to the Virtuoso open source develop/7 branch last month, as indicated at --

-- and will soon be added to the stable/7 branch.

Thus I would suggest building Virtuoso and the required Geospatial plugin from the develop/7 branch as detailed in the readme file and test to see if it resolves the problem ...

@p1d1d1
Copy link

p1d1d1 commented Oct 1, 2018

_:x http://www.opengis.net/ont/geosparql#crs

@nandana where is this property coming from?

By the way, I'm on Virtuoso 07.20.3217 and can load geodata (e.g., https://github.com/p1d1d1/p1d1d1.github.io/blob/master/triples/cantons84.nt).
Data type is changed from wktLiteral to virtrdf#Geometry.

@nandana
Copy link

nandana commented Oct 1, 2018

Hi @p1d1d1 @HughWilliams

I just took that from the previous example. Now I looked at the actual data I was loading from Wikidata, most of them are like the following and they are parsed without any error.

@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .

wds:Q31138473-FE72609E-8273-4F0C-819E-713F2C4B4C46 
        ps:P625 "Point(8.61946 59.43613)"^^geo:wktLiteral .

But there are few other statements (related to the coordinates of Mars, I guess) like the following:

@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix wdv: <http://www.wikidata.org/value/> .

wds:Q56632442-fbfdcfd3-4b72-0026-1889-4cd40279f4fd wikibase:rank wikibase:NormalRank ;
	ps:P625 "<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral ;
	psv:P625 wdv:a5d4d0a028e00370b6b216ad3b9b197e .

They are not parsed correctly and they lead to the following error.

SQL> select * from DB.DBA.load_list;
Connected to OpenLink Virtuoso
Driver: 07.20.3229 OpenLink Virtuoso ODBC Driver
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________
/staging/get_test2.ttl                                         https://www.example.org/test2                                                      2           2018.10.1 17:6.7 366785000  2018.10.1 17:6.7 369714000  0           NULL        42000 RDFGE: RDF box with a geometry RDF type and a non-geometry content

I have not read the GeoSPARQL spec in detail but having a quick look at 8.5 Requirements for WKT Serialization (serialization=WKT) (page 34) and other pages, the above representation seems valid, isn't it? If not, I will have to go back to Wikidata devs :)

In either case, I can also try with the develop/7 branch.

@TallTed
Copy link
Collaborator

TallTed commented Oct 1, 2018

@nandana - The described error is expected with VOS prior to the 7.2.6 update, or without the new plugin. Please let us know how things go with the latest develop/7!

@nandana
Copy link

nandana commented Oct 2, 2018

Thanks @TallTed !

I've installed the latest version from develop/7 and following the guide to set GeoSPARQL.

In the startup, it seems that those plugins are loaded correctly.

Tue Oct 02 2018
19:41:44 { Loading plugin 1: Type `plain', file `wikiv' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   WikiV version 0.6 from OpenLink Software
19:41:44   Support functions for WikiV collaboration tool
19:41:44   SUCCESS plugin 1: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/wikiv.so }
19:41:44 { Loading plugin 2: Type `plain', file `mediawiki' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   MediaWiki version 0.1 from OpenLink Software
19:41:44   Support functions for MediaWiki collaboration tool
19:41:44   SUCCESS plugin 2: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/mediawiki.so }
19:41:44 { Loading plugin 3: Type `plain', file `creolewiki' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   CreoleWiki version 0.1 from OpenLink Software
19:41:44   Support functions for CreoleWiki collaboration tool
19:41:44   SUCCESS plugin 3: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/creolewiki.so }
19:41:44 { Loading plugin 8: Type `plain', file `proj4' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   plain version 3230 from OpenLink Software
19:41:44   Cartographic Projections support based on Frank Warmerdam's proj4 library
19:41:44   SUCCESS plugin 8: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/proj4.so }
19:41:44 { Loading plugin 9: Type `plain', file `geos' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   plain version 3230 from OpenLink Software
19:41:44   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
19:41:44   SUCCESS plugin 9: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/geos.so }
19:41:44 { Loading plugin 10: Type `plain', file `shapefileio' in `/wikidata/virtuoso/lib/virtuoso/hosting'
19:41:44   ShapefileIO version 0.1virt71 from OpenLink Software
19:41:44   Shapefile support based on Frank Warmerdam's Shapelib
19:41:44   SUCCESS plugin 10: loaded from /wikidata/virtuoso/lib/virtuoso/hosting/shapefileio.so }
19:41:44 OpenLink Virtuoso Universal Server
19:41:44 Version 07.20.3230-pthreads for Linux as of Oct  2 2018

but I still get the same error.

SQL> select * from DB.DBA.load_list;
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

/wikidata/data/geo-test.ttl                                                https://www.example.org/test                                                      2           2018.10.2 19:42.15 16983000  2018.10.2 19:42.15 18424000  0           NULL        42000 RDFGE: RDF box with a geometry RDF type and a non-geometry content

1 Rows. -- 0 msec.

Here's the content I am loading.

@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix wdv: <http://www.wikidata.org/value/> .

wds:Q56632442-fbfdcfd3-4b72-0026-1889-4cd40279f4fd a wikibase:BestRank ;
        wikibase:rank wikibase:NormalRank ;
        ps:P625 "<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral ;
        psv:P625 wdv:a5d4d0a028e00370b6b216ad3b9b197e .

Do you see any reason for the error?

@TallTed
Copy link
Collaborator

TallTed commented Oct 2, 2018

@nandana - It is often helpful to columnize your Turtle, as this can reveal oddness in the data that isn't so easy to see when the text is more tightly spaced.

wds:Q56632442-fbfdcfd3-4b72-0026-1889-4cd40279f4fd 
    a               wikibase:BestRank ;
    wikibase:rank   wikibase:NormalRank ;
    ps:P625         "<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral ;
    psv:P625        wdv:a5d4d0a028e00370b6b216ad3b9b197e .

This appears to me to not be a valid geo:wktLiteral --

"<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral

I think that's meant to be --

"Point(1.13 3.83)"^^geo:wktLiteral

I cannot tell where <http://www.wikidata.org/entity/Q111> belongs here, but it's certainly not within the geo:wktLiteral.

@nandana
Copy link

nandana commented Oct 2, 2018

It looks odd to me too, but isn't such URIs are allowed in the WKT representation of geometry? For example, here it is says

A notable feature is that the CRS URI is concatenated with the WKT string in the literal.

or the following example from the GeoSPARQL spec Sec 8.5

A second example below encodes the same point using <http://www.opengis.net/def/crs/EPSG/0/4326>: a WGS 84 geodetic latitude-longitude spatial reference system (note that this spatial reference system defines a different axis order):"<http://www.opengis.net/def/crs/EPSG/0/4326>Point(33.95 -83.38)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

This was taken directly from the Wikidata dump. If someone with GeoSPARQL expertise can confirm indeed this is not a valid value for geo:wktLiteral datatype, I can raise the issue in Wikidata.

@TallTed
Copy link
Collaborator

TallTed commented Oct 3, 2018

@nandana -

I see. You're correct, the GeoSPARQL spec does permit any valid URI to be used in this position (which seems silly, as it could lead to all sorts of nonsense data). I'm not sure why this data is not being accepted.

@pkleef, @IvanMikhailov -- Any comment?


That said, I note that <http://www.wikidata.org/entity/Q111> isn't a URI for a CRS (Coordinate Reference System); it's a URI for the planet Mars, as you'll see if you dereference it yourself — and there are multiple CRS in use for Martian features/locations.

Wikidata don't seem to have figured out how to address this, and their current kludge means only one CRS per celestial body (possibly including Earth; I didn't read deeply enough to be sure about this) in Wikidata, although multiple CRS are already used for locations on Mars, the Moon, and others (though perhaps not on Wikidata)...

This is clearly an evolving space (pardon the pun).

@nandana
Copy link

nandana commented Oct 4, 2018

Thanks @TallTed !

One quick question related to Virtuoso + GeoSPARQL plugins. When VOS complains about a certain error, is there a way to get an idea about which line it is about (similar to the info given for syntax errors)?

E.g.

06:10:55 PL LOG: File /wikidata/data/1_99/wikidump-000000004.ttl error 42000 RDFGE: RDF box with a geometry RDF type and a non-geometry content

These are files with ~24 million lines each and it is impossible to detect where the errors are using manual inspections. I could check for the previous type of errors using grep but this is something else.

@TallTed
Copy link
Collaborator

TallTed commented Oct 4, 2018

@nandana - I see your point about the error message, and have raised it internally to development. You may want to create an issue specific to that, so we can be sure to notify you when that enhancement is implemented.

@nandana
Copy link

nandana commented Oct 5, 2018

Thanks @TallTed ! I just created an issue for this.

@p1d1d1
Copy link

p1d1d1 commented Oct 5, 2018

That wikidata Triple is IMHO not correct GeoSPARQL, since the URI is not an URI for a CRS.
The error in Virtuoso is related, by the way, to the presence of this URI. Virtuoso doesn't support so far wkt serializations with the CRS-URI (PS: this URI is nor mandatory)

@p1d1d1
Copy link

p1d1d1 commented Oct 5, 2018

... and I'd personally avoid having this URI in the geometry serialization. Most geo-libraries won't read this as valid wkt

@pkleef pkleef closed this as completed in 9cb9e2b Oct 9, 2018
@pkleef pkleef reopened this Oct 9, 2018
@pkleef
Copy link
Collaborator

pkleef commented Oct 9, 2018

When you try to load the following data using the latest version of virtuoso from develop/7:

@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix wdv: <http://www.wikidata.org/value/> .

wds:Q56632442-fbfdcfd3-4b72-0026-1889-4cd40279f4fd wikibase:rank wikibase:NormalRank ;
	ps:P625 "<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)"^^geo:wktLiteral ;
	psv:P625 wdv:a5d4d0a028e00370b6b216ad3b9b197e .

you receive the following error:

Error 22023: VD [Virtuoso Server]TURTLE RDF loader, line 1: GEO11: IRI is not known, consider
registering it via DB.DBA.SYS_V7PROJ4_SR_IRIS (near row 1 col 39 of
'<http://www.wikidata.org/entity/Q111> Point(1.13 3.83)')

The reason as correctly explained by @p1d1d1 is that this IRI does not represent a correct Coordinate Reference System (CRS) URI.

Without a well defined coordinate system, these points are as useful as an amount without specifying the currency.

@pkleef pkleef closed this as completed Oct 9, 2018
@p1d1d1
Copy link

p1d1d1 commented Oct 9, 2018

@pkleef very nice implementation

@nandana
Copy link

nandana commented Oct 9, 2018

@pkleef thanks for the patch! I will try it out.

@p1d1d1 I certainly agree with you. I will try to raise the issue in the Wikidata side too!

@TallTed
Copy link
Collaborator

TallTed commented Oct 10, 2018

@pkleef - It appears to me that 9cb9e2b closes #789, not #295.

@TallTed TallTed reopened this Oct 10, 2018
@TallTed
Copy link
Collaborator

TallTed commented Oct 10, 2018

@pkleef, @p1d1d1 -

My reading of GeoSPARQL is that any URI may be used as a CRS URI within a wktLiteral -- i.e., there is no reference to a canonical list/registry of acceptable/valid CRS URIs.

This makes some sense, as all possible CRS do not yet exist, so future CRS can be supported without needing to update GeoSPARQL. I would think it better if there were a registry of CRS (similar to IANA MIME registry) and/or relationships between them (such that geodata expressed in "Mars 1979" can be related to "Mars 2000"), but this should not be necessary to load data in either/both/any CRS to an RDF store.

The specific CRS URI should be evaluated only when queries need to compare geodata -- at which point such operations should lead to errors indicating that, for instance, "CRS <http://example.com/mars2000> is not known", or "CRS <http://example.com/mars2000> has no known relation to CRS <http://example.com/mars1979>", or "CRS <http://example.com/mars2000> has no known relation to CRS <http://www.opengis.net/def/crs/OGC/1.3/CRS84>", or the like.

(Yes, using a URI that does not refer to a CRS can lead to nonsense interpretation, just like "15 quatloos" cannot be usefully compared to "USD 15", but "15 quatloos" can be usefully compared to "30 quatloos" whenever that new cryptocurrency is created/defined.)

@p1d1d1
Copy link

p1d1d1 commented Oct 11, 2018

@TallTed you're right, according to GeoSPARQL

Valid geo:wktLiterals are formed by concatenating a valid, absolute URI as defined in [RFC 2396], one or more spaces (Unicode U+0020 character) as a separator, and a WKT string as defined in Simple Features

But then it also says:

For geo:wktLiterals, the beginning URI identifies the spatial reference system for the geometry. The OGC maintains a set of CRS URIs under the http://www.opengis.net/def/crs/ namespace

So URI identifying Mars are not ok for me. This lead as you say to nonsense data.
I personally don't 100% agree with the OGC approach putting URI in the wkt serialization, since such a kind of strings are non read as valid wkt by desktop GIS and I gues also by web-mapping libraries (to be tested). I'd have preferred an additional property hasCRS. But his is another story.

@nandana
Copy link

nandana commented Oct 11, 2018

@p1d1d1 Yes, I agree this might not be the best design decision from GeoSPARQL on how to represent CRS information in RDF. But I am more inclined towards what @TallTed said. Though they are not optimal, I think they are legal according to the current spec. I don't read the second paragraph you cited as OGC maintains an exclusive set of CRS URIs and nothing else is accepted.

@TallTed
Copy link
Collaborator

TallTed commented Oct 11, 2018

@p1d1d1 - As @nandana says, the spec says that the "OGC maintains a set of CRS URIs", it does not say the "OGC maintains the set of CRS URIs" — i.e., there is nothing that says that only URIs from that OGC list are valid CRS URIs.

As things stand, the user is responsible for avoiding nonsense data, which may be painful for them, but this also means that they have the flexibility to adopt any appropriate CRS which may be developed in the future -- regardless of whether the OGC continues to maintain that list, or the OGC endorses/accepts the user's preferred new CRS, etc.

As to whether this is the optimal solution... I think it's a similar conundrum to langtagged string literals. The CRS is necessary to interpret the wktLiteral, just as a langtag is necessary to interpret a text literal string, so it must be an integral part of the coordinate literal string. Doing something new with the literal typing would be problematic, as it's already problematic to handle langtagging, so embedding the CRS within the wktLiteral makes a lot of sense to me.

Maybe there could be some syntactic sugar, such that "all wktLiteral in this serialized data file are based on CRS xyz", but that would lead to copy-and-paste issues when people mix small portions of data from multiple such files without the CRS declaring statements, and they'd be back in today's problematic state...

Desktop GIS and other tools which read this data as invalid are just old -- so they want an older version of the data, which assumed that only one CRS did or ever would exist. Such tools will be updated to understand the new data which recognizes the reality of multiple CRS in active use, and old datasets which are based on any CRS other than the now-declared-default will be updated to include the CRS they're based on, and all will be well again. (Really, all will be well for the first time, as there have long been multiple CRS in use, and data sets which did not declare the CRS in use therein were effectively meaningless, and worse when used in combination with other data sets with undeclared — and frequently different — CRS.)

@nandana
Copy link

nandana commented Oct 12, 2018

@TallTed @p1d1d1
TL;DR - is there a way to (a) turn off this validation of geo:wktLiteral or (b) make Virtuoso continue parsing a file ignoring the erroneous triple?

Long version:
I want to load the Wikidata dump to Virtuoso to do some performance benchmarks comparing it with other triplestores such as Blazegraph. This issue is blocking me from loading Wikidata into Virtuoso. As a quick hack (while ignoring all GeoSPARQL data for the moment), I have done a grep/sed to remove all CRS URIs from wktLiterals to see if I could load all data after that.

For example, converting the following

wd:Q2267142 wdt:P31 wd:Q1439394 ;
        wdt:P376 wd:Q3303 ;
        wdt:P2824 "2727" ;
        wdt:P625 "<http://www.wikidata.org/entity/Q3303> Point(-358.26 11.3)"^^geo:wktLiteral ;
        wdt:P2386 "+170"^^xsd:decimal ;
        p:P31 wds:Q2267142-B08676C6-584A-41A0-8B62-DB3F7CE635A1 .

to

wd:Q2267142 wdt:P31 wd:Q1439394 ;
        wdt:P376 wd:Q3303 ;
        wdt:P2824 "2727" ;
        wdt:P625 "Point(-358.26 11.3)"^^geo:wktLiteral ;
        wdt:P2386 "+170"^^xsd:decimal ;
        p:P31 wds:Q2267142-B08676C6-584A-41A0-8B62-DB3F7CE635A1 .

Nevertheless, now this data is interpreted against WGS84 and this is still failing as the coordinates are out of the bounds (i.e., -180). It seems there is no easy way out of this. Is there a way to load the rest of the data while loosing the GeoSPARQL data with other CRSs or any other alternative to load Wikidata?

@TallTed
Copy link
Collaborator

TallTed commented Oct 12, 2018

@nandana — I am not immediately aware of a way to switch off these data checks; @IvanMikhailov or @pkleef may have a suggestion.

That said — assuming that all the data you have to load is similar to the above, and that the CRS-including geo:wktLiteral values are predicate+value lines which end with semicolons, you could run a slightly different grep/sed to simply remove those entire lines, leaving only whitespace. If some of these lines are period-ended, or if they might be the first line of an entity's description — so the line starts with the entity's URI — the grep/sed gets more complex ... but still, I think, reasonably doable.

Thinking further, I might suggest you consider using some different dataset(s) and/or queries for your benchmarking, such as those developed by the LDBC, and written about in the old Virtuoso Blog among other places.

@pfps
Copy link

pfps commented Nov 21, 2018

I have ruun into this problem as well, and patched VOS as described in https://community.openlinksw.com/t/non-terrestrial-geo-literals/359
This patch appears to be working find, but I'm not doing anything with the geo-literals so there may be hidden problems.

@TallTed
Copy link
Collaborator

TallTed commented Jun 26, 2019

@pkleef, @IvanMikhailov, @kidehen, @openlink -- By not properly fixing this issue such that Virtuoso supports any URI as a valid CRS URI (as the spec requires, as discussed in detail above, particularly at 1 and 2), possibly among other fixes, we are leading/causing people like @asanchez75 to disable a significant chunk of data validation code, and @pfps to comment out a smaller selection, which may cause significant problems down the line -- not only by preferring non-OpenLink-branches of VOS.

Virtuoso is a DBMS. Virtuoso should be managing the data people want to manage, not "protecting" people by refusing to manage data we don't like (i.e., geodata based on non-terrestrial or nonsensical or simply unknown CRS systems).

TallTed added a commit to TallTed/virtuoso-opensource that referenced this issue Jul 23, 2019
* Added timeout to the sparql query

* Fixed i18n issues in fct

* Added support for IRIs from RDFviews

* Added statistics about users of IRI as subject or object in graph to Metadata page

* Commented out test code

* Fixed use RDFa instead of microdata

* Commented out debug lines

* Fixed issues using current prefixes for DBpedia in demos

* Fixed issues in b3s_get_types on weird data

* Fixed issues with /describe page behind a (ssl) proxy

* Added preview snippet for embedded content

* Added support for optionally tagging links as 'nofollow'

* Fixed missing css

* Fixed move usual proxyIRI-handling above embedded fragments

* Fixed issues with https in /describe content negotiation Alternates and Location headers

* Fixed use https:// to work well with secure pages

* Fixed issues loading insecure content into secure page

* Added guard against shortning already short URLs

* Added check to make sure page is in range 0 .. last

* Fixed disable input fields to goto specific page if there is only a single page

* Added function urilbl_ac_init_state to reflect state of Entity Data generation

* Fixed typo

* Fixed missing call to onchange so page refreshes when Show x rows selector changes

* Fixed redirect URL encoding after sponging

* Fixed URL encoding in pager

* Fixed bad URL encoding in tabs

* Disabled unwanted 'extra' decoding

* Fixed use TTLP_V

* Fixed bad check

* Fixed i18n issue

* Fixed do counts separately instead of using UNION to avoid issue with expected/generated columns error

* Fixed error handling

* Fixed issue "Value of ANY type column too long" with long shapes

* Updated VAD version

* Renamed binsrc/b3s to binsrc/fct

* Revert "Fixed use TTLP_V"

This reverts commit 33568d4.

* Fixed missing variable declaration

* Moved binsrc/yacutia binsrc/conductor

* Fixed issue building with --debug-mtx

* Updated version to 7.2.6-rc1

* Updated NEWS and ChangeLog in preparation for upcoming release

* Fixed insufficient localization of errors on loading Turtle

Closes: openlink#295

* Fixed GPFs on outermost-level empty shape (they're impossible on SHP and WKT/WKB reads but can be represented in EWKT)

* Fixed version info

* Fixed initialization when proj data is not available on the filesystem

* Fixed version number

* Fixed issue with PATCH command on collections

* Added support for ',meta' for collections

* Fixed issue with SSL HTTP authentication

* Fixed missing cert argument in calls to DAV_AUTHENTICATE_SSL

* Fixed check function names for invalid characters

* Fixed merge error trying to call ACL_VALIDATE

* Fix issues leaving too many open statement handles in server side JDBC/ODBC call

* Updated JDBC driver version

* Rebuild JDBC providers

* Added missing header for ',meta' collections

* Moved search controls in the main pane

* Fixed issue linking against opensssl 1.0.1 or older

* Fixed use function to set SSL accept state

* Fixed missing EXPORT for proj4 plugin

* Fixed check for LibreSSL version

* Fixed portability issue with Alpine and other Linux versions that do not use glibc

* Fixed portability issue using a C++ compiler

* Update ODBC driver for support UTF16

* Fix SQLConnectW for UTF16

* Fix CLIw.c

* Fixed stack overwrite

* Fixed portability issue on Mac OS X

* Fixed issues converting from SQL_C_CHAR to SQL_BINARY

* Optimized OPTIONS method for WebDAV collections

* Fixed output content requested by 'Accept: application/ld+json; profile='https://www.w3.org/ns/activitystreams' header

* Fixed issue with creating LDP collection data

* Fixed some issues based on the LDP testsuite execution

http://w3c.github.io/ldp-testsuite/

* Fixed issue creating LDP collection data

* Fixed issue with WebID selection

* Updated Conductor VAD version

* Fixed typo

* Added support for property paths in federated SPARQL queries

closes issue openlink#734

* Updated Copyright to include 2019

* Fix Jena3 provider for support Jena 3.10.0

* Rebuild Jena3 provider

* Added separate links for text/n3 and text/turtle

* Updated Memento timegate URL

* Fixed issue with link concat order

* Fix issues with Connection Option CHARSET=UTF-8

* Fixed issue with UTF-16 on SQLGetDiagRec

* Fixed ODBC error string prefix

* method VirtuosoConnection.isConnectionLost() was updated for use Statement instead of PreparedStatement

* Updated JDBC driver to version 3.112

* Rebuild JDBC drivers

* Updated version to 114

* Removed deprecated Virtuoso v5 DDL statements

* Fixed typo

* Added new bif function bif_boxint_range_arg

* Fixed snprintf into buffers that are too short to always fit the output

* Fixed issue with freeing fields

* Fixed missing error check in sslt_qst_get

* Fixed handling of DTD validation error in dtd_compile

* Fixed misleading indentation

* Added extra checks to blob test programs

* Fixed issue getting .ttl file with curl

* Fixed dav browser page size

* Fixed issue with WebDAV HTTP Response Headers

* Fixed issues with LDP implementation

* Fixed memory leak due to bad allocation of GEO_POINTLIST shapes

* Fixed missing args for MALLOC_DEBUG

* Added support for JSON output

* Fixed issue with checkpoint_interval(-1)

* Added new function scheduler_interval()

* Disable both checkpoint and scheduler when starting bulkload

* Added missing headers for rdf-editor to browse DAV tree

* Fixed keep existing options

* Fixed use fully qualified names

* Fixed use fully qualified names

* Fixed login when conductor is behind a proxy

* Fixed use fully qualified names

* Fixed compiler warnings

* Fixed do not try to run ttlp_v in atomic mode

* Backported default_geo_type stub for r2rml

* Fixed issue editing HTML and XSLT files with Chrome browser

* Updated Conductor VAD version

* Updated FCT VAD version

* Fixed issue returning the correct client IP when using a proxy

* Fixed remove box around uid and gid arguments

* Fixed initialize struct

* Fixed debug lines

* Fixed return correct GEO type

* Fixed issue with r2rml generating empty columns section

* Fixed potential issue with cartesian product in R2RML

* Fixed br to be wellformed

* Fix issue with handle RDFFormat.BINARY in method RepositoryConnection.add()

* Updated RDF4J version

* Rebuild RDF4J provider

* Fixed backslash escaped characters in QNames of SPARQL queries. (Bug16599)

* Bug fixes

 - fixed count bug
 - fixed filter bug

* Fixed missing variable declaration

* Fixed missing grant

* Fixed use new API calls for sponger

* Fixed clear http session

* Fixed use params for permalink

* Fixed issues editing soap services

* Update for ODBC driver for better integration with iODBC DM for support autoswitch between UCS4/UTF16 modes

* Fix issue with insert XMLLiteral

* Updated version of Jena 3 provider

* Rebuild Jena 3 provider

* DIsable Unix related block for fix compilation errors on Windows

* Disabled txn check for backup dump

Closes: openlink#578

* Fixed perform +restore-crash-dump in foreground

* Fixed windows build issues for VS 2017

* Fixed typo

* Fixed wrong percent range set

* Fixed issue that enabled virtual paths content to be shown with PROPFIND command

* Fixed issue with PATCH command and sparql updates

* Fixed issue with ',acl' files

* Fixed searching with tag conditions in DynaRes DETs

* Added new optimizations for WebDAV

* Added new column COL_FULL_PATH
* Added support to move lost collections to '/DAV/.lost+found/' collection.
* Added triggers to check the collection hierarchy before updates.
* Added perfomance improvements for some often used functions.
* Added additional checks for some API calls.
* Updated triggers and procedures to use the new column COL_FULL_PATH
* Fixed issue in conductor showing folder content after rename.
* Fixed issue with ID of DET collections and optimize DAV_SEARCH_ID
* Removed unused columns
* Removed old v5 schema updates

* Updated DAV_DIR_LIST API function to support limit and order parameters

* Added new option PublicDebug to virtuoso.ini

* New functions bif_st_n_points() and bif_st_point_n()

* Disabled triggers generation for RDF view referencing SQL views

* Conductor - updated default suggested ASK statement by IMAP filters.

* Fixed issue with registry page - row exceeded max lenght

* Fixed some warnings and simplified some pages by removing unneeded table tags

* Fixed https based client

* do not check server certificate when 'insecure' flag is set
* register CA certificates when db:key is used

* Fixed showing the path by virtual paths used by WebDAV browser

* Fixed support for large limits for sizes of transactions

* Fixed issue with timezone and scheduler view

* Added icon support for some DET folder types

* Added support for unfoldable internal functions in execution plan

* Fixed syntax of '__tag of XXX' to support it in signed literals

Added support for
        __tag of dictionary reference
        __tag of stream
        __tag of XXX handle

* Replacing "magic numbers" of DV_xxx in SQL code with readable "__tag of".

* Fixed issue with autocommit mode

When row auto commit mode revert the non-txn insert flag back after vec
insert. Otherwise sequence will be logged separately and txn rb image
will grow a lot

* Fixed issue with DML inside DDL statement

* Fixed bug with dates without timezone by PROPFIND command

* Fixed issue with xslt file used by WebDAV skin

* Fixed issue with dates without timezone

* Removed duplicate procedures which already exist in binary

* Fixed issue with Accept header and *,meta files

* Fixed missing files in build list

* Updated list of extension to icon relations for WebDAV

* Fixed issue mapping LDP enabled folders and PUT command

* Fixed issue with HTTP commands in CalDAV and CardDAV protocols

* Removed DAV_SEARCH_SOME_ID

* Fixed issue deleting folder under version control

* Added WebDAV max browser lines parameter

* Fixed issue with bad filter in search function

* Removed duplicate procedures used for validation

* Fixed validation of boolean values

* Fixed issue with popup WebDAV folder selection in filters

* Fixed presentation and working with files under version control

* Fixed issue with bad URL oin address bar by WebDAV browser

* Added autofocus for the most used forms in WebDAV browser

* Fixed labels

* Added initial support for onBehalfOf

* Added support for Escape key to close the WebDAV browser

* Removed deprecated functions

* Updated Conductor VAD version

* Fixed issue with grouping: name could be dv_symbol

* Added option to view CalDAV and CardDAV resources

* Fixed getting listen interface for FTP server

* Fixed issue in testsuite

* Fixed issue in http testsuite

405 (Method is not allowed) - MKCOL can only be executed on an unmapped URL

* Fixed issue with VXML

The file:// must be used for local path (e.g. starting from www root)

* Fixed issue with buffer size

* Added parameter for ses params function

* Updated FCT vad version

* Fixed issue when login into ODS

Closes: 652

* Updated ODS Framework VAD version

* Fixed windows portability issue

* Added support for insecure flag to SSL based soap call

* Fixed compiler warnings

* Fixed issue reading large blobs

* Fixed issue with db_to_str_place

* Fixed compiler warning

* Added extra debug macros

* Fixed compare canonical dtp for nvarchar for vec ssl casts

* Fixed small issues in testsuite

* Fixed issue building MALLOC_DEBUG

* Fixed issue creating URIs by DAV meta subsystem.

* Fixed output of PROPFIND command

* FIxed issue returning correct value for WebDAV versioning

* Fixed issue creating user's IRIs

* Fixed issue checking auto version feature for resource

* Added function to determine resource extension

* Added qt_record error file and option to skip comments

* Fixed issues in testsuite

* Fixed make sure http headers are not send twice

* Fixed portability issue

* Updated Conductor VAD version

* Added support for mkstemp

* Fixed check for cli_session as bootstrap and scheduler do not have it

* Fixed missing argument

* Fixed dav option in crawler to get links

* Added support for reporting current rss status

* Fixed clear the association between target and carts list

* Fixed syntax in export of crawler target

* Updated Conductor version

* Fix issues with POINTZ

* Updated version of JDBC driver

* Rebuild JDBC drivers

* Fixed comment

* Fixed issue with POST for application/sparql-query (w3)
@pfps
Copy link

pfps commented Mar 27, 2020

I've just run into this again. Is there a plan for finally fixing this bug in Virtuoso?

@TallTed
Copy link
Collaborator

TallTed commented Dec 3, 2020

@pkleef, @IvanMikhailov, @kidehen, @openlink, @HughWilliams --

Any update on this issue?

@katarinarak
Copy link

Hi, I just run into this issue and wonder why it is still not fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests