Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer and cache calculation of Engine reporting signature. #2670

Merged
merged 3 commits into from
May 22, 2019

Conversation

abernix
Copy link
Member

@abernix abernix commented May 13, 2019

This moves the Apollo Engine Reporting signature calculation (which the
Apollo Cloud uses to aggregate similar operations) to perform
asynchronously, after the response has been returned to the client, and also
caches the signature for use by additional operations which match the same
queryHash (which is an exact SHA-256 hex digest of the operation body
received from the client).

While the signature calculation is relatively quick on small operations, with
sustained load and more complex operations, this recurring calculation can
be more costly.

As a bit of validation to the success of this change, on a very basic
performance benchmark, using a schema with 1000 String fields (i.e. a
type Query with Field_1: String through Field_1000: String) and an
incoming operation which selects from all 1000 of those fields (i.e
Field_1), this showed quite an improvement:

┌───────────────┬───────────────┬───────────────┐
│               │ Before        │ After         │
├───────────────┼───────────────┼───────────────┤
│ Percentile, % │ Resp. Time, s │ Resp. Time, s │
├───────────────┼───────────────┼───────────────┤
│           0.0 │         0.076 │         0.024 │
│          50.0 │         0.342 │          0.14 │
│          90.0 │         0.388 │         0.161 │
│          95.0 │          0.41 │         0.164 │
│          99.0 │         0.444 │          0.17 │
│          99.9 │         0.486 │         0.177 │
│         100.0 │         0.487 │         0.196 │
└───────────────┴───────────────┴───────────────┘

Of course, this is a relatively simple example and still includes actual
GraphQL execution, but the win factor is certainly there. Other tests with
more dynamic fields and a higher cardinality of unique operations also
showed improvement, though extremely high cardinality of complex operations
(which have more expensive execution characteristics) certainly made the
win factor less pronounced.

It's not surprising that this calculation is a bit expensive since it
requires a number of normalization steps. Luckily, we can let this all
happen after the request is sent to the client and it's highly cacheable.

By default, we'll use a relatively small cache which should store a large
number of operations, and to avoid introducing configuration surface area
right now without much evidence as to the need, we'll simply log a message
periodically if we're ejecting operations.

@abernix abernix requested a review from martijnwalraven as a code owner May 13, 2019 17:24
@abernix abernix requested a review from jbaxleyiii May 13, 2019 17:24
This moves the Apollo Engine Reporting signature calculation (which the
Apollo Cloud uses to aggregate similar operations) to perform
asynchronously, after the response has been returned to the client, and also
caches the signature for use by additional operations which match the same
`queryHash` (which is an exact SHA-256 hex digest of the operation body
received from the client).

While the signature calculation is relatively quick on small operations, with
sustained load and more complex operations, this recurring calculation can
be more costly.

As a bit of validation to the success of this change, on a very basic
performance benchmark, using a schema with 1000 `String` fields (i.e.  a
`type Query` with `Field_1: String` through `Field_1000: String`) and an
incoming operation which selects from all 1000 of those fields (i.e
`Field_1`), this showed quite an improvement:

```
┌───────────────┬───────────────┬───────────────┐
│               │ Before        │ After         │
├───────────────┼───────────────┼───────────────┤
│ Percentile, % │ Resp. Time, s │ Resp. Time, s │
├───────────────┼───────────────┼───────────────┤
│           0.0 │         0.076 │         0.024 │
│          50.0 │         0.342 │          0.14 │
│          90.0 │         0.388 │         0.161 │
│          95.0 │          0.41 │         0.164 │
│          99.0 │         0.444 │          0.17 │
│          99.9 │         0.486 │         0.177 │
│         100.0 │         0.487 │         0.196 │
└───────────────┴───────────────┴───────────────┘
```

Of course, this is a relatively simple example and still includes actual
GraphQL execution, but the win factor is certainly there.  Other tests with
more dynamic fields and a higher cardinality of unique operations also
showed improvement, though extremely high cardinality of complex operations
(which have more expensive execution characteristics) certainly made the
_win_ factor less pronounced.

It's not surprising that this calculation is a bit expensive since it
requires a number of normalization steps.  Luckily, we can let this all
happen after the request is sent to the client and it's highly cacheable.

By default, we'll use a relatively small cache which should store a large
number of operations, and to avoid introducing configuration surface area
right now without much evidence as to the need, we'll simply log a message
periodically if we're ejecting operations.
@abernix abernix force-pushed the abernix/a-e-r-signature-calcs branch from b135a17 to eb3bc7b Compare May 13, 2019 17:59
 - apollo-cache-control@0.7.0-alpha.0
 - apollo-engine-reporting@1.2.0-alpha.0
 - apollo-server-azure-functions@2.6.0-alpha.0
 - apollo-server-cloud-functions@2.6.0-alpha.0
 - apollo-server-cloudflare@2.6.0-alpha.0
 - apollo-server-core@2.6.0-alpha.0
 - apollo-server-express@2.6.0-alpha.0
 - apollo-server-fastify@2.6.0-alpha.0
 - apollo-server-hapi@2.6.0-alpha.0
 - apollo-server-integration-testsuite@2.6.0-alpha.0
 - apollo-server-koa@2.6.0-alpha.0
 - apollo-server-lambda@2.6.0-alpha.0
 - apollo-server-micro@2.6.0-alpha.0
 - apollo-server-plugin-base@0.5.0-alpha.0
 - apollo-server-plugin-response-cache@0.2.0-alpha.0
 - apollo-server-testing@2.6.0-alpha.0
 - apollo-server@2.6.0-alpha.0
 - apollo-tracing@0.7.0-alpha.0
 - graphql-extensions@0.7.0-alpha.0
@@ -226,7 +301,7 @@ export class EngineReportingAgent<TContext = any> {
this.reportSize >=
(this.options.maxUncompressedReportSize || 4 * 1024 * 1024)
) {
this.sendReportAndReportErrors();
await this.sendReportAndReportErrors();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of this await shouldn't change execution dynamics (and I haven't changed what sendReportAndReportErrors does) since addTrace itself is never awaited, but the nested un-awaited Promise seemed a bit too wily for my liking.

@abernix abernix added this to the Release 2.6.0 milestone May 13, 2019
@abernix
Copy link
Member Author

abernix commented May 13, 2019

This has been published into an alpha:

Successfully published:
 - apollo-engine-reporting@1.2.0-alpha.0
 - apollo-server-azure-functions@2.6.0-alpha.0
 - apollo-server-cloud-functions@2.6.0-alpha.0
 - apollo-server-cloudflare@2.6.0-alpha.0
 - apollo-server-core@2.6.0-alpha.0
 - apollo-server-express@2.6.0-alpha.0
 - apollo-server-fastify@2.6.0-alpha.0
 - apollo-server-hapi@2.6.0-alpha.0
 - apollo-server-koa@2.6.0-alpha.0
 - apollo-server-lambda@2.6.0-alpha.0
 - apollo-server-micro@2.6.0-alpha.0
 - apollo-server@2.6.0-alpha.0

}: {
queryHash: string;
operationName: string;
documentAST?: DocumentNode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is always either a document AST or a string, should this be a single DocumentNode | string argument?

// calculating it asynchronously. The `addTrace` method will
// `await` the `signature` if it's a Promise, prior to putting it
// on the stack of traces to deliver to the cloud.
const cachedSignature = await this.signatureCache.get(queryHash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the signature calculated per document or per operation? If the latter, we should probably also use operationName as part of the key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great cache-tch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed via 5e7084a. Thank you!

lastSignatureCacheWarn = new Date();
console.warn(
[
'This server is processing a high number of unique operations. ',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a nice touch!

@abernix abernix force-pushed the abernix/a-e-r-signature-calcs branch from 5e7084a to b55da84 Compare May 22, 2019 15:38
@abernix abernix merged commit 2360ab1 into release-2.6.0 May 22, 2019
@abernix abernix deleted the abernix/a-e-r-signature-calcs branch May 22, 2019 16:39
abernix added a commit that referenced this pull request May 20, 2020
Also, add a comment indicating the intent.  The `logger` in this case will
be a request-specific logger when available or the general "Engine" logger
otherwise.

Related: #2670
abernix added a commit that referenced this pull request May 20, 2020
Also, add a comment indicating the intent.  The `logger` in this case will
be a request-specific logger when available or the general "Engine" logger
otherwise.

Related: #2670
abernix added a commit that referenced this pull request May 20, 2020
Also, add a comment indicating the intent.  The `logger` in this case will
be a request-specific logger when available or the general "Engine" logger
otherwise.

Also adjust recently introduced test to accurately return a `Promise` in its
mock.

Related: #2670
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants