-
Notifications
You must be signed in to change notification settings - Fork 173
SPARQL_Bottom_Up_Semantics
Preface: In the 1.5.2 release we’ve implemented a couple of fixes regarding issues related to SPARQL’s bottom-up evaluation approach and associated variable scoping problems. If you encounter regressions with some of your queries after upgrading to 1.5.2, this Wiki page may help you identify ill-designed queries that are not in line with the official SPARQL 1.1 semantics.
One of the most tricky parts in SPARQL semantics is its bottom-up semantics: informally speaking, bottom-up semantics means that subqueries and nested groups are logically evaluated first. This semantics has some interesting (and sometimes unintuitive) implications for the visibility of variables in subgroups and subqueries. This help page explains the basic concepts and sketches situations in which the bottom-up semantics may lead to unexpected query results.
Let’s start out with a very simple dataset that we will use throughout the upcoming examples:
<http://example.com/Alice> a <http://example.com/Person> .
<http://example.com/Flipper> a <http://example.com/Animal> .
This is, we have two triples, one stating the Alice is a person and the other one stating that Flipper is an Animal. To illustrate the basic idea of SPARQL evaluation, let’s start out with a simple query:
SELECT ?s ?personType WHERE {
BIND(<http://example.com/Person> AS ?personType)
?s a ?o
FILTER(?o=?personType)
}
Q1 binds the URI <http://example.com/Person> to variable ?personType and extracts all instances of this type. Naively following the W3C’s semantics, evaluation of this triple pattern proceeds as follows:
1. Evaluate the BIND expression, which gives us a single, so-called mapping:
{ ?personType -> <http://example.com/Person> }
2. Evaluate the triple pattern ?s a ?o
against the data, which gives
us two mappings, namely
{ ?s -> <http://example.com/Alice>, ?o -> <http://example.com/Person> },
{ ?s -> <http://example.com/Flipper>, ?o -> <http://example.com/Animal> }
The interesting point here is that, when strictly following the
semantics as defined by the W3C, this pattern is evaluated independently
from the previous expression, the join is performed in a subsequent step
(see 3. below). This is, of course, not a very efficient approach, so
for the sake of performance query engines such as Blazegraph typically
choose to “inject” the binding obtained from 1. into the triple pattern,
evaluating the pattern ?s a <
http://example.com/Person>
instead.
3. The two partial results are then “joined” together, combining those mappings that share the same bindings for variable ?o. In our case, the mapping from 1. joins with both mappings from 2.. The join gives us the following two mappings:
{ ?s -> <http://example.com/Alice>, ?o -> <http://example.com/Person>, ?personType -> <http://example.com/Person> },
{ ?s -> <http://example.com/Flipper>, ?o -> <http://example.com/Animal>, ?personType -> <http://example.com/Person> }
4. In the next step, we apply the FILTER expression, retaining those mappings for which variable ?o and ?personType coincide, namely
{ ?s -> <http://example.com/Alice>, ?o -> <http://example.com/Person>, ?personType -> <http://example.com/Person> }
Note that, as defined in the SPARQL standard, FILTER expressions are applied after evaluating a join group, so we could safely move the FILTER expression right in the beginning without changing the evaluation process (and hence, without changing the result)
5. Finally, as per SELECT clause, we project on variables ?s and ?personType to get the final query result. Written in tabular form, here’s the outcome:
?s | ?personType
--------------------------------------------------------
<http://example.com/Alice> | <http://example.com/Person>
As you may know, SPARQL allows to nest expressions into subgroups. Let’s now modify our example and look at the following query:
SELECT ?s ?personType WHERE {
BIND(<http://example.com/Person> AS ?personType)
{
?s a ?o
FILTER(?o=?personType)
}
}
The only change here are the { }
brackets - they put the triple
pattern and the FILTER expression into a separate, nested group. Now the
bottom-up evaluation of SPARQL becomes relevant, since the inner group
is evaluated first:
1. Evaluate the BIND expression, which gives us the mapping (as before):
{ ?type -> <http://example.com/Person> }
2. We next turn towards an evaluation of the subgroup that is enclosed by the inner { } brackets - following a bottom-up paradigm, this evaluation is done independently from the result computed in 1. 2a. As before in step 2., we evaluate the triple pattern “?s a ?o” against the data, which gives us two mappings, namely
{ ?s -> <http://example.com/Alice>, ?o -> <http://example.com/Person> },
{ ?s -> <http://example.com/Flipper>, ?o -> <http://example.com/Animal> }
2b. Now, the FILTER (?o=?type) is applied on the two mappings from 2a. And here’s the problem: these two mappings do not contain a binding for variable ?type (because it is not in scope). In such cases, the SPARQL semantic defines that the FILTER condition evaluates to an “error”, in which case the FILTER rejects all mappings. After step 2, we thus end up with no mapping.
3. Finally, the single mapping from 1. is joined with the empty result from 2., so we end up with the empty result.
While the query from before may look unnecessarily complex and somewhat constructed, the following example is probably much closer to situations as they might show up in practical applications:
SELECT ?person ?nonPerson ?type WHERE {
BIND(<http://example.com/Person> AS ?type)
{
?person a ?o
FILTER(?o=?type)
}
UNION
{
?nonPerson a ?o
FILTER(?o!=?type)
}
}
In this UNION query, the BIND expression is used to introduce a variable binding that is intended to be reused in the two parts of the UNION : the idea is to extract all instances of type <http://example.com/Person> in the first part of the UNION, and all the others in the second part of the UNION.
You may be surprised that this query does not work as expected: the two blocks of the UNION open up new scopes, in which the ?type variable is not known. For the same reasons as in the example before, both FILTER expression evaluate to error and we end up with the empty result. One way to fix this is by (redundantly) pushing the BIND into the two parts of the UNION:
SELECT ?person ?nonPerson ?type WHERE {
{
BIND(<http://example.com/Person> AS ?type)
?person a ?o
FILTER(?o=?type)
}
UNION
{
BIND(<http://example.com/Person> AS ?type)
?nonPerson a ?o
FILTER(?o!=?type)
}
}
This query will give us results as expected, namely:
?person | ?nonPerson | ?type
------------------------------------------------------------------------------------------
<http://example.com/Alice> | | <http://example.com/Person>
| <http://example.com/Flipper> | <http://example.com/Animal>
The problem sketched above is not restricted to the usage of variables in FILTER expressions. Similar issues may pop up whenever using variables in nodes that “consume” variables without matching them against the dataset. In particular, using a triple pattern with a variable in an inner scope is not a problem: the variables are matched against the dataset independently from the outside, and will be joined with the outside part at a later point in time. But when using SPARQL 1.1 constructs such as instance BIND or VALUES, you may run into the same problems as sketched before by means of the FILTER expression. Look at the following query, which aims at extracting all persons (first part of the UNION) and all animals (second part of the UNION):
SELECT ?s ?type WHERE {
BIND("http://example.com/" AS ?typeBase)
{
BIND(URI(CONCAT(?typeBase,"Person")) AS ?type)
?s a ?o
FILTER(?o=?type)
}
UNION
{
BIND(URI(CONCAT(?typeBase,"Animal")) AS ?type)
?s a ?o
FILTER(?o=?type)
}
}
The problem is essentially the same: we bind variable ?typeBase outside. In the inner UNION blocks, we try to bind variable ?type based on ?typeBase - but the latter is not in scope here. Hence, the query returns the empty result.
The same can happen with VALUES clauses, as illustrated by the following variation:
SELECT ?s ?type WHERE {
{
BIND(URI(CONCAT(?typeBase,"Person")) AS ?type)
?s a ?o
FILTER(?o=?type)
}
UNION
{
BIND(URI(CONCAT(?typeBase,"Animal")) AS ?type)
?s a ?o
FILTER(?o=?type)
}
} VALUES (?typeBase) { ("http://example.com/") }
Again, ?typeBase (which is bound in the outer VALUES clause) is not visible when computing the BINDs for variable ?type, and hence the query will fail to produce results.
As shortly discussed before, strictly following the bottom-up semantics is typically not a good choice when it comes to evaluation performance. In fact, for many patterns it is well-known that top-down evaluation, where we inject mappings from outside into inner subgroups, can lead to significant speedups. The good news is that, for a broad range of SPARQL queries, bottom-up and top-down evaluation yield the same results. For instance, this holds for the class of SPARQL queries built only from triple patterns connected through “.” (so-called conjunctive queries).
Blazegraph exploits such situations to rewrite queries in such a way that they can be evaluated top-down where possible. There are various techniques and tricks that are implemented for this purpose: in many cases, subgroups can just be flattened out without changing semantics, FILTERs variables in subgroups that are known to be unbound can be renamed to avoid clashes, etc. With the fixes in 1.5.2 we’ve ruled out various inconsistencies between Blazegraph and the official W3C spec.
Although the examples above were somewhat artifically designed to illustrate the issues arising in the context of SPARQL’s bottom-up semantics, we’ve repeatedly observed ill designed queries of this style. We hope that the thoughts in this post help our readers to avoid the common pitfalls and write better, standard-compliance SPARQL in the future.