Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF Literal "1"^^xsd:boolean should _not_ coerce to True #847

Closed
ashleysommer opened this issue Sep 20, 2018 · 5 comments
Closed

RDF Literal "1"^^xsd:boolean should _not_ coerce to True #847

ashleysommer opened this issue Sep 20, 2018 · 5 comments
Labels
bug Something isn't working parsing Related to a parsing. SPARQL
Milestone

Comments

@ashleysommer
Copy link
Contributor

ashleysommer commented Sep 20, 2018

I encountered this issue while working on pySHACL.
Specifically, this bug is causing a failure in one of the tests in the standard data-shapes-test-suite here uniqueLang-002-shapes.ttl.
This test relies on the fact that "1"^^xsd:boolean is an invalid Literal, and when testing equality of this Literal against RDF True, it should be not equal.

A simple code recreation:

import rdflib
from rdflib.namespace import XSD
fail_bool = rdflib.Literal("1", datatype=XSD.boolean)
true_bool = rdflib.Literal("true", datatype=XSD.boolean)
print("value: {} , datatype: {} ".format(
    str(fail_bool._value), str(fail_bool.datatype)))
try:
    assert not (fail_bool == true_bool),\
        "\"1\" should not equal \"true\""
except AssertionError as a:
    print("assertion fail: \n{}".format(str(a)))

This is a more complete example:
https://gist.github.com/ashleysommer/87f0b9660a71de380889f98745af2f74

I've tracked down the problem to this line in the XSDToPython map:

URIRef(_XSD_PFX + 'boolean'): lambda i: i.lower() in ['1', 'true'],

lambda i: i.lower() in ['1', 'true']
should be changed to
lambda i: i.lower() == 'true'

@gromgull
Copy link
Member

thanks @ashleysommer ! Could you make a PR and I'll merge?

@ashleysommer
Copy link
Contributor Author

Ok. I will today.

ashleysommer added a commit to ashleysommer/rdflib that referenced this issue Oct 18, 2018
@ashleysommer ashleysommer mentioned this issue Oct 18, 2018
@joernhees joernhees added this to the rdflib 5.0.0 milestone Oct 27, 2018
@joernhees joernhees added SPARQL parsing Related to a parsing. bug Something isn't working labels Oct 27, 2018
@white-gecko
Copy link
Member

I think this is wrong. As pointed out in #913 the XML Schema Definition Language (XSD) defines the lexical mapping of boolean as:

booleanRep ::= 'true' | 'false' | '1' | '0'

This is also mentioned in w3c/data-shapes#98 . As I understand it w3c/data-shapes#98 speaks about a specific case for a SHACL validation not for the general case of how to deal with xsd:boolean.
The RDFlib should stick to the correct standards here.

#856 resp. f547599 should be reverted and a fix according to #913 should be introduced.

@ashleysommer
Copy link
Contributor Author

ashleysommer commented Apr 5, 2020

@white-gecko
Ok, I'm going to have to look further into this.

I initially wrote this issue (and PR #856) from the perspective of the python SHACL implementation.

There are spec-driven test files used to implement unit tests for all SHACL implementations. And one of these tests relies on the condition that a backing RDF implementation should treat the given Literal: "1"^^xsd:boolean as an "Invalid Literal", and when comparing equality to a valid literal "true"^^xsd:boolean should return False (not equal).

rdflib doesn't have any concept of an "Invalid Literal", (which might be something to think about as a feature down the track). So the only way to get that SHACL unit test to pass correctly with rdflib's rudimentary Literal handling, was to patch the action of parsing a "1" with datatype=xsd:boolean, to give a python value of False.

Note, I think there's some confusion here about the distinction between a Typed Literal, and a typeless literal.

For example, for typeless literals all of these are valid:

  • <myuri> ex:hasPet true
  • <myuri> ex:hasPet false
  • <myuri> ex:hasPet 1
  • <myuri> ex:hasPet 0
    (Except the last two will coerce to xsd:integer when parsed, not xsd:boolean).

Whereas typed literals are like the following:

  • <myuri> ex:hasPet "true"^^xsd:boolean
  • <myuri> ex:hasPet "false"^^xsd:boolean
  • <myuri> ex:hasPet "1"^^xsd:boolean
  • <myuri> ex:hasPet "0"^^xsd:boolean
    Where the last two are what this thread is about.

@white-gecko
That XSD spec you linked is specifically for the XML world, I'm not sure all of the rules in there apply to the RDF world. I think the use of xsd in RDF (even when serialized in rdf+xml format) is a subset of the XML-xsd spec.

This document is specific to RDF: https://www.w3.org/TR/swbp-xsch-datatypes/#boolean but it also seems to agree with you, it seems to agree that "1"^^xsd:boolean is not an invalid literal.

@white-gecko
Copy link
Member

Thank you for the pointer to W3C Working Group Note XML Schema Datatypes in RDF and OWL. It states explicitly:

Boolean is a datatype with value space {true,false}, lexical space {"true", "false","1","0"} and lexical-to-value mapping {"true"→true, "false"→false, "1"→true, "0"→false}. "true"^^xsd:boolean is a typed literal, while "true" is a plain literal.

Regarding the parsing and serialization. I think we should distinguish between the RDFlib data model which should be abstract from the serialization formats. We should not mix the interpretation of Turtle abbreviations with the interpretation of lexical values.

More comments in #913.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parsing Related to a parsing. SPARQL
Projects
None yet
Development

No branches or pull requests

4 participants