Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query batching capabilities to the schema stitching layer #524

Closed
michaelstaib opened this issue Jan 23, 2019 · 3 comments
Closed

Add query batching capabilities to the schema stitching layer #524

michaelstaib opened this issue Jan 23, 2019 · 3 comments
Assignees
Milestone

Comments

@michaelstaib
Copy link
Member

michaelstaib commented Jan 23, 2019

Batching

Introduction

The current schema stitching layer send request to the remote queries as they appear. This can be problematic since we will run into the same n+1 issues then with database calls. With this issue we will introduce a new batching layer that will be hidden behind the IRemoteQueryClient.

Rewriting the Query

The IRemoteQueryClient is the way query against a remote schema. Each IRemoteQueryClient instance represents one remote schema. The stitching layer will delegate parts of a query against the stitching layer to a remote schema.

We now want the IRemoteQueryClient to act like a DataLoader and merge requests into one request and batch this one to the remote schema. There are a view things to consider here:

  • The batch size has to be configurable
    This is important since the remote schema might have a max allowed complexity.

  • We have auto generated remote requests and we have requests written by developers themeselfs.

Let us say we have three requests agains one remote schema. The first and second requests are auto-generated requests by the stitching layer.
The third request is created by a developer.

Request 1:

query foo($global: String, $arg_var: String) @__hc_auto {
  a(a: $global) {
    b(b: $arg_var) {
      c
      ...abc
    }
  }
}

fragment abc on C {
  d
}

Request 2:

query bar($global: String, $arg_var: String) @__hc_auto {
  b: a(a: $global) {
    b(b: $arg_var) {
      c
    }
  }
  c: a(a: $global) {
    b(b: $arg_var) {
      c
    }
  }
}

Request 3:

query baz($a: String $b: String) {
  d(a: $a) {
    e(b: $b) {
      ... def
    }
  }
}

fragment def on E {
  f {
    .. abc
  }
}

fragment abc on F {
  g
}

Request 1 and request 2 are basically branches from the original query whereas the developer request might be something completly different.

Variables from the original request are not rewritten and are merged in the new request so if request 1 and 2 are both using the variable '$global' from the original request than we just have to declare this variable once in the merged request without changing this. Variables that are defined by the user or generated by the stitching engine will be rewritten to have a name prefix that identifies the request from which they stem from.

In order to avoid field collisions and in order to be able to pick the result apart we have to apply field aliases to the root fields. Like with local variables we will combine the request prefix with the response name in the following way: {requestPrefix}_{responseName}.

The response name is the alias name of a field if the alias name is specified; otherwise the response name is the field name.

Lastly, fragment definitions from the original request are not rewritten and are integrated and merges as they are. Fragment definitions from user-defined queries are rewritten to use the request prefix in the way root field aliases are rewritten to accomodate the request prefix.

query merged($global: String $__req_1_arg_var: String $__req_2_arg_var: String $__req_3_a: String $__req_3_b: String) {
  __req_1_a: a(a: $global) {
    b(b: $__req_1_arg_var) {
      c
    }
  }

  __req_2_b: a(a: $global) {
    b(b: $__req_2_arg_var) {
      c
    }
  }

  __req_2_c: a(a: $global) {
    b(b: $__req_2_arg_var) {
      c
    }
  }

  __req_3_d: d(a: $_req_3_a) {
    e(b: $_req_3_b) {
      ... _req_3_def
    }
  }
}

fragment abc on C {
  d
}

fragment _req_3_def on E {
  f {
    .. _req_3_abc
  }
}

fragment _req_3_abc on F {
  g
}

Handling the Response

Errors

Field errors that have the path property defined will be delegated to the response of their request since the first path element will tell us to which request we have to delegate the error.

Errors that do not have the path property defined will be delegated to one of the results so that they are not outputted multiple times.

If the remote schema does only return errors without returning data then we will send exceptions to the result tasks.

Data

The data can be easily divided by using the root response name since we have used request aliases.

Extensions

For now we will ignore any extension data.

@michaelstaib
Copy link
Member Author

#341

@michaelstaib
Copy link
Member Author

This one is now implemented and will be included with 0.8.0-preview.1

@michaelstaib
Copy link
Member Author

We opted to not mark operations with @__hc_auto since we would have to parse the query that to get this information.

We now are using the request properties and added a property IsAutoGenerated.

We could make the merged queries smaller, but I would lead to a more complex rewriter, so for now we are living with the slightly larger queries and let leave it to the remote schema to optimize these.

Also we might want to deactivate batching or fix the batch size. Or maybe in future we want to have a fixed batch complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants