Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph_diff fails on presence of unrelated blank nodes #1797

Open
aucampia opened this issue Apr 9, 2022 Discussed in #1610 · 0 comments
Open

Graph_diff fails on presence of unrelated blank nodes #1797

aucampia opened this issue Apr 9, 2022 Discussed in #1610 · 0 comments
Labels
bug Something isn't working

Comments

@aucampia
Copy link
Member

aucampia commented Apr 9, 2022

This tracks #1294 which was a bug but incorrectly converted to a discussion #1610.

The xfail test for this is here:

class TestConsistency(unittest.TestCase):
@expectedFailure
def test_consistent_ids(self) -> None:
"""
This test verifies that `to_canonical_graph` creates consistent
identifiers for blank nodes even when the graph changes.
It does this by creating two triple sets `g0_ts` and `g1_ts`
and then first creating a canonical graph with only the first
triple set (cg0), and then a canonical graph with both triple
sets (cg1), and then confirming the triples in cg0 is a subset
of cg1.
This will fail if the `to_canonical_graph` does not generate
consistent identifiers for blank nodes when the graph changes.
This property is essential for `to_canonical_graph` to
be useful for diffing graphs.
"""
bnode = BNode()
g0_ts: _TripleSet = {
(bnode, FOAF.name, Literal("Golan Trevize")),
(bnode, RDF.type, FOAF.Person),
}
bnode = BNode()
g1_ts: _TripleSet = {
(bnode, FOAF.name, Literal("Janov Pelorat")),
(bnode, RDF.type, FOAF.Person),
}
g0 = Graph()
g0 += g0_ts
cg0 = to_canonical_graph(g0)
cg0_ts = GraphHelper.triple_set(cg0)
g1 = Graph()
g1 += g1_ts
cg1 = to_canonical_graph(g1)
cg1_ts = GraphHelper.triple_set(cg1)
assert cg0_ts.issubset(
cg1_ts
), "canonical triple set cg0_ts should be a subset of canonical triple set cg1_ts"

I likely will never have time to rewrite this properly, but I will refer to this issue to keep track of it.

Discussed in #1610

Originally posted by rob-metalinkage April 23, 2021
If both graph has multiple blank nodes graph diff fails to detect matches. It succeeds for a single blank node case.

code:

from pprint import pprint

from rdflib import Graph
from rdflib.compare import graph_diff, to_canonical_graph

g1t = """@prefix : <http://test.org#> .
@prefix owl:   <http://www.w3.org/2002/07/owl#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

:C1  a          owl:Class ;
        rdfs:subClassOf  [ a                   owl:Restriction ;
                           owl:minCardinality  "1"^^xsd:int ;
                           owl:onProperty      rdfs:label
                         ] ;
        rdfs:label   "C1"@en ."""

g2t =  g1t + """ :C2  a          owl:Class ;
        rdfs:subClassOf  [ a                   owl:Restriction ;
                           owl:minCardinality  "1"^^xsd:int ;
                           owl:onProperty      rdfs:comment
                         ] ;
        rdfs:label   "C2"@en ."""
g1 = Graph()
g2 = Graph()

g1.parse(data=g1t, format='turtle')
# g1.parse(source="profile.ttl", format='turtle')
#g2.parse(source="source.ttl", format='turtle')
g2.parse(data=g2t, format='turtle')


in_both, in_first, in_second = graph_diff(g1,g2)
print("should be empty as all in g1 is in g2")
pprint(in_first.serialize(format='turtle'))

in_both, in_first, in_second = graph_diff(g1,g1)

print("should be empty as all in g1 is in g1")
pprint(in_first.serialize(format='turtle'))


output: (which should be things only in first (in g1) - g2 is g1 + stuff so this should always be empty..

should be empty as all in g1 is in g2

(b'@prefix ns1: <http://www.w3.org/2002/07/owl#> .\n@prefix rdfs: <http://ww'
 b'w.w3.org/2000/01/rdf-schema#> .\n@prefix xsd: <http://www.w3.org/2001/XML'
 b'Schema#> .\n\n<http://test.org#C1> rdfs:subClassOf [ a ns1:Restriction ;\n '
 b'           ns1:minCardinality "1"^^xsd:int ;\n            ns1:onProperty '
 b'rdfs:label ] .\n\n')

should be empty as all in g1 is in g1

b'\n'
@aucampia aucampia added needs more work The PR needs more work before being merged or further reviewed. bug Something isn't working and removed needs more work The PR needs more work before being merged or further reviewed. labels Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant