Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(@apollo/gateway) Call unref on the federated gateway's setInterval. #3105

Merged
merged 2 commits into from
Jul 30, 2019

Conversation

abernix
Copy link
Member

@abernix abernix commented Jul 29, 2019

When the Apollo Gateway is configured in so-called "managed mode", it uses a
setInterval timer to periodically check for updates from the Apollo Graph
Manager (also known as Apollo Engine).

However, since the Timer's unref method wasn't invoked, the timer would
continue to place tasks on the event loop, even when the server had been
requested to be shutdown. By calling unref, Node.js switches the timer to
an internal timer which doesn't block the event-loop.

While we could alternatively use a process.on event to detect when a
shutdown signal had been received and invoke the Gateway's stop method
accordingly, this should be sufficient for this particular implementation.
(As opposed to the implementation in apollo-engine-reporting which does
exactly this since it needs to ensure that all queued traces are transmitted
to the cloud prior to process exit.

abernix added 2 commits July 30, 2019 01:28
…rval`.

When the Apollo Gateway is configured in so-called "managed mode", it uses a
`setInterval` timer to periodically check for updates from the Apollo Graph
Manager (also known as Apollo Engine).

However, since the Timer's `unref` method wasn't invoked, the timer would
continue to place tasks on the event loop, even when the server had been
requested to be shutdown.  By calling `unref`, Node.js switches the timer to
an internal timer which doesn't block the event-loop.

While we could alternatively use a `process.on` event to detect when a
shutdown signal had been received and invoke the Gateway's `stop` method
accordingly, this should be sufficient for this particular implementation.
(As opposed to the implementation in `apollo-engine-reporting`  which does
exactly this since it needs to ensure that all queued traces are transmitted
to the cloud prior to process exit.
@abernix abernix added this to the Release 2.8.x milestone Jul 29, 2019
@abernix abernix merged commit 73019e9 into master Jul 30, 2019
@abernix abernix deleted the abernix/unref-setInterval branch July 30, 2019 21:34
abernix added a commit to apollographql/federation that referenced this pull request Sep 4, 2020
…llographql/apollo-server#3105)

* fix(@apollo/gateway) Call `unref` on the federated gateway's `setInterval`.

When the Apollo Gateway is configured in so-called "managed mode", it uses a
`setInterval` timer to periodically check for updates from the Apollo Graph
Manager (also known as Apollo Engine).

However, since the Timer's `unref` method wasn't invoked, the timer would
continue to place tasks on the event loop, even when the server had been
requested to be shutdown.  By calling `unref`, Node.js switches the timer to
an internal timer which doesn't block the event-loop.

While we could alternatively use a `process.on` event to detect when a
shutdown signal had been received and invoke the Gateway's `stop` method
accordingly, this should be sufficient for this particular implementation.
(As opposed to the implementation in `apollo-engine-reporting`  which does
exactly this since it needs to ensure that all queued traces are transmitted
to the cloud prior to process exit.

* Add CHANGELOG.md for apollographql/apollo-server#3105.

Apollo-Orig-Commit-AS: apollographql/apollo-server@73019e9
glasser added a commit to apollographql/federation that referenced this pull request Feb 9, 2021
The (not particularly documented) ApolloGateway.stop() method didn't reliably
stop the gateway from polling. All it did was cancel a timeout. But if it is
called while pollServices is in the middle of running, it would do nothing, and
then pollServices would carry on and set another timeout.

Better semantics would be for stop() to reliably stop polling: allow the current
poll to complete, ensure that there will be no more polls, and then (async)
return. This PR implements those semantics, by implementing an explicit state
machine for ApolloGateway's polling.

One reason that these bugs were able to survive is that calling stop() is often
unnecessary. In apollographql/apollo-server#3105 we
chose to `unref` the polling timeout to allow the Node process to exit if it's
the only thing left on the event loop, instead of encouraging users of
`ApolloGateway` to be responsible and call `stop()` themselves. While this may
be reasonable when the gateway's lifecycle is merely a schema polling timer, we
may have future changes to the gateway where proper lifecycle handling is more
important. So this PR also moves away from the world where it's fine to not
bother to explicitly stop the gateway.

That said, in the common case, we don't want users to have to write gateway
stopping code. It makes more sense for stopping the `ApolloGateway` to be the
responsibility of `ApolloServer.stop()`, just as `ApolloServer.stop()` can
trigger events in plugins. So in the recently-released Apollo Server v2.20,
`ApolloServer.stop()` calls `ApolloGateway.stop()`. This should mean that in
most cases, the missing `unref` shouldn't keep the process running, as long as
you've run `ApolloServer.stop()` (and if you don't, it's likely that other parts
of the server are keeping your process running).

What if you're still running an old version of Apollo Server? For a bit of
backwards compatibility, `ApolloGateway` detects if it appears to be connected
to Apollo Server older than v2.18. If so, it continues to do the old
unref().  If you're using v2.18 or v2.19, the upgrade to v2.20 should be pretty
painless (it's mostly just minor bugfixes). If you're using ApolloGateway on its
own for some reason, and this change causes your processes to hang on shutdown,
adding a `stop()` call should be pretty straightforward. (If we learn that this
change really is devastating, we can always go back to an unconditional
timer.unref() later.)

Fixes #4428.
glasser added a commit to apollographql/federation that referenced this pull request Feb 9, 2021
The (not particularly documented) ApolloGateway.stop() method didn't reliably
stop the gateway from polling. All it did was cancel a timeout. But if it is
called while pollServices is in the middle of running, it would do nothing, and
then pollServices would carry on and set another timeout.

Better semantics would be for stop() to reliably stop polling: allow the current
poll to complete, ensure that there will be no more polls, and then (async)
return. This PR implements those semantics, by implementing an explicit state
machine for ApolloGateway's polling.

One reason that these bugs were able to survive is that calling stop() is often
unnecessary. In apollographql/apollo-server#3105 we
chose to `unref` the polling timeout to allow the Node process to exit if it's
the only thing left on the event loop, instead of encouraging users of
`ApolloGateway` to be responsible and call `stop()` themselves. While this may
be reasonable when the gateway's lifecycle is merely a schema polling timer, we
may have future changes to the gateway where proper lifecycle handling is more
important. So this PR also moves away from the world where it's fine to not
bother to explicitly stop the gateway.

That said, in the common case, we don't want users to have to write gateway
stopping code. It makes more sense for stopping the `ApolloGateway` to be the
responsibility of `ApolloServer.stop()`, just as `ApolloServer.stop()` can
trigger events in plugins. So in the recently-released Apollo Server v2.20,
`ApolloServer.stop()` calls `ApolloGateway.stop()`. This should mean that in
most cases, the missing `unref` shouldn't keep the process running, as long as
you've run `ApolloServer.stop()` (and if you don't, it's likely that other parts
of the server are keeping your process running).

What if you're still running an old version of Apollo Server? For a bit of
backwards compatibility, `ApolloGateway` detects if it appears to be connected
to Apollo Server older than v2.18. If so, it continues to do the old
unref().  If you're using v2.18 or v2.19, the upgrade to v2.20 should be pretty
painless (it's mostly just minor bugfixes). If you're using ApolloGateway on its
own for some reason, and this change causes your processes to hang on shutdown,
adding a `stop()` call should be pretty straightforward. (If we learn that this
change really is devastating, we can always go back to an unconditional
timer.unref() later.)

Fixes apollographql/apollo-server#4428.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant