Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cartesian product? more Optionals/Unions? #6

Open
VladimirAlexiev opened this issue Aug 31, 2018 · 4 comments
Open

Cartesian product? more Optionals/Unions? #6

VladimirAlexiev opened this issue Aug 31, 2018 · 4 comments
Labels
question Further information is requested

Comments

@VladimirAlexiev
Copy link

Hi! Interesting work but I have some doubts. Assume this query (where appearance is a string, for simplicity)

hero {
  name
  appearance
}
  • doesn't this suffer badly from Cartesian product syndrome? If a hero has 3 name and 5 appearance, won't this return 15 rows?
  • shouldn't it use more Optionals or Unions? If a hero got no appearances, I still want to get his name

I think both of these woes can be remedied by using Union, eg

{?hero ex:name ?name}
union
{?hero ex:appearance ?name}

You could retrospect field characteristics in the schema to optimize the query.

  • if a field is required ! you don't need to wrap it in optional/union
  • if a field is not array [...] you don't need to use union

So the above sparql is the worst case (both fields are optional arrays).
Your README shows the best case (both are required singleton).

If name is required singleton but appearance is optional singleton, the query could be

{?hero ex:name ?name}
optional {?hero ex:appearance ?name}

Complex questions:

  • How to group patterns with the same/different characteristics
  • How to trace characteristics down the object hierarchy: maybe you need some subquery nesting?
@rubensworks
Copy link
Owner

doesn't this suffer badly from Cartesian product syndrome? If a hero has 3 name and 5 appearance, won't this return 15 rows?

You're right, this will indeed return 15 rows in this case, because internally the query will be returned to a SPARQL query like the following:

SELECT ?hero_name ?hero_appearance WHERE {
  _:b1 <hero> ?hero.
  ?hero <name> ?hero_name.
  ?hero <appearance> ?hero_appearance.
}

This is exactly the reason why I've created a SPARQL JSON results to tree package as well: https://github.com/rubensworks/sparqljson-to-tree.js
This works well together with this script, and can compact results like these.

shouldn't it use more Optionals or Unions? If a hero got no appearances, I still want to get his name

You're right, optional fields are not supported at the moment. The only fields that are optional at the moment are those that are wrapped in a fragment (... type { ... }). I intend to add special meaning to the @optional directive to wrap these fields inside an OPTIONAL.

You could retrospect field characteristics in the schema to optimize the query.

I intentionally ignore anything related to the GraphQL schema in this project. The assumption here is that there is no GraphQL server, and as such no GraphQL schema either. As an alternative, we exploit the JSON-LD context to act as a GraphQL schema.

But you're right, if we would have the GraphQL schema, there are a couple of things that could be improved.

How to group patterns with the same/different characteristics

You should be able to use fragments for this.

How to trace characteristics down the object hierarchy: maybe you need some subquery nesting?

I'm not sure what you mean by this exactly, could you give an example?

@rubensworks rubensworks added the question Further information is requested label Aug 31, 2018
@VladimirAlexiev
Copy link
Author

Compacting after a combinatorial explosion doesn't sound like a scalable approach. You better avoid the explosion in the first place.

GQL fields are optional by default: mandatory fields have the "!" annotation. So I think it's important to take into account the Optional case.

I think that GQL servers without a schema are very few, and can be considered deficient (not sure whether GQL makes the schema mandatory, but most reasonable servers have it anyway).

By "grouping patterns" I mean that eg if you have 5 single fields and 3 multiple fields, you can use one query with 5 patterns; plus 3 Union queries.
By "down the hierarchy" I mean how would you do this grouping when you have a hierarchy of nested objects. Maybe by nesting subqueries? But SPARQL subquery semantics is pretty bad, they are supposed to be evaluated first...

@rubensworks
Copy link
Owner

GQL fields are optional by default: mandatory fields have the "!" annotation. So I think it's important to take into account the Optional case.

That's an interesting remark. I aimed to be compatible with SPARQL's default non-optional semantics, while GraphQL fields are indeed optional by default. I'll include this in my future research.

I think that GQL servers without a schema are very few, and can be considered deficient (not sure whether GQL makes the schema mandatory, but most reasonable servers have it anyway).

Agreed. This package however does not aim to work with GraphQL servers, but with SPARQL endpoints/engines. I'm also investigating the (dis)advantages of (not) including GraphQL schema's in this conversion.

By "grouping patterns" I mean that eg if you have 5 single fields and 3 multiple fields, you can use > one query with 5 patterns; plus 3 Union queries.
By "down the hierarchy" I mean how would you do this grouping when you have a hierarchy of nested objects. Maybe by nesting subqueries? But SPARQL subquery semantics is pretty bad, they are supposed to be evaluated first...

Interesting. I haven't considered query optimization in this phase. My assumption here was that query engines would be able to optimize the query themselves. But indeed, there are a couple of things that are possible at this level already.

@VladimirAlexiev
Copy link
Author

This package however does not aim to work with GraphQL servers, but with SPARQL endpoints/engines.

Then your package should provide a GQL schema. The disadvantages of not including one are numerous: can't be used with GraphiQL, queries can't be validated, etc.

"Grouping patterns": it's not about query optimization, it's about issuing a correct query. If you ask for Cartesian product, you will get it, and SPARQL servers cannot optimize it away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants