Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): Consider timeout in shutdown an error #8050

Merged
merged 1 commit into from
Dec 18, 2023

Conversation

tomi
Copy link
Contributor

@tomi tomi commented Dec 15, 2023

Summary

If the process doesn't shutdown within a time limit, exit with error code.

  1. conceptually something timing out is an error.
  2. on successful exit we close down the DB connection gracefully. On an exit timeout we rather not do that, since it will wait for any active connections to close and would possible block the exit.

Related tickets and issues

Review / Merge checklist

  • PR title and summary are descriptive. Remember, the title automatically goes into the changelog. Use (no-changelog) otherwise. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.

    A bug is not considered fixed, unless a test is added to prevent it from happening again.
    A feature is not complete without tests.

If the process doesn't shutdown within a time limit, exit with error code.
@tomi tomi requested a review from netroy December 15, 2023 15:42
@@ -439,7 +439,7 @@ export const schema = {
env: 'QUEUE_RECOVERY_INTERVAL',
},
gracefulShutdownTimeout: {
doc: 'How long should n8n wait for running executions before exiting worker process',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this to a generic config variable, instead of having a worker specific one? it would be a breaking change, but we can add it in BREAKING-CHANGES.md. Since it will affect a very small minority of users, I think it should be okay.
@krynble WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking whether it would make senes to have different timeouts by process type. Say 10 seconds for webhooks, 1 minute for mains and 2 minutes for workers, depending on the use case. Is it overkill?

If we use this approach, then we could have on "global default" value of 30 seconds but timers can be set separately.

Do we have any specific goals we're trying to achieve with this change other than changing this exit to a crash instead of successful? I'm fine to moving to a single global timeout as long as we continue supporting the old value for some time until users have a change to change, while sending a warning to the console.

I think relying solely on the breaking changes log isn't ideal.

Copy link
Contributor Author

@tomi tomi Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are couple reasons behind this change. First is that conceptually something timing out is an error. Second is that on successful exit we close down the DB connection gracefully. On an exit timeout we rather not do that, since it will wait for any active connections to close and would possible block the exit. I updated the PR description for better reasoning.

IMO it would make sense to have a more generic config for this. I don't think we have to create separate vars for separate process types, as they are separate deployments anyways and have different env vars. Uou can then define a different exit timeout value for each separately.

I can create a separate PR for adding a more generic config variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood - makes sense. Thanks for clarifying.

About the generic variable, I strongly suggest we continue supporting the existing name to prevent breaking existing deployment, while warning users that the current environment variable has been deprecated and should be simply renamed, similar to how we did for deprecating MySQL.

@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Dec 15, 2023
Copy link
Contributor

✅ All Cypress E2E specs passed

Copy link

cypress bot commented Dec 18, 2023

Passing run #3375 ↗︎

0 305 5 0 Flakiness 0

Details:

🌳 🖥️ browsers:node18.12.0-chrome107 🤖 tomi 🗃️ e2e/*
Project: n8n Commit: 02c788cc52
Status: Passed Duration: 06:29 💡
Started: Dec 18, 2023 8:35 AM Ended: Dec 18, 2023 8:42 AM

Review all test suite changes for PR #8050 ↗︎

@tomi tomi merged commit 4cae976 into master Dec 18, 2023
28 checks passed
@tomi tomi deleted the fix-consider-timeout-in-shutdown-an-error branch December 18, 2023 08:53
@github-actions github-actions bot mentioned this pull request Dec 21, 2023
ivov added a commit that referenced this pull request Dec 21, 2023
#
[1.22.0](https://github.com/n8n-io/n8n/compare/n8n@1.21.0...n8n@1.22.0)
(2023-12-21)


### Bug Fixes

* **core:** Close db connection gracefully when exiting
([#8045](#8045))
([e69707e](e69707e))
* **core:** Consider timeout in shutdown an error
([#8050](#8050))
([4cae976](4cae976))
* **core:** Do not display error when stopping jobless execution in
queue mode ([#8007](#8007))
([8e6b951](8e6b951))
* **core:** Fix shutdown if terminating before hooks are initialized
([#8047](#8047))
([6ae2f5e](6ae2f5e))
* **core:** Handle multiple termination signals correctly
([#8046](#8046))
([67bd8ad](67bd8ad))
* **core:** Initialize queue once in queue mode
([#8025](#8025))
([53c0b49](53c0b49))
* **core:** Prevent axios from force setting a form-urlencoded
content-type ([#8117](#8117))
([bba9576](bba9576))
* **core:** Remove circular references before serializing executions in
public API ([#8043](#8043))
([989888d](989888d))
* **core:** Restore workflow ID during execution creation
([#8031](#8031))
([c5e6ba8](c5e6ba8))
* **core:** Use relative imports for dynamic imports in
SecurityAuditService ([#8086](#8086))
([785bf99](785bf99))
* **core:** Stop binary data restoration from preventing execution from
finishing ([#8082](#8082))
([5ffff1b](5ffff1b))
* **editor:** Add back credential `use` permission
([#8023](#8023))
([329e5bf](329e5bf))
* **editor:** Cleanup Executions page component
([#8053](#8053))
([2689c37](2689c37))
* **editor:** Disable auto scroll and list size check when clicking on
executions ([#7983](#7983))
([fcb8b91](fcb8b91))
* **editor:** Ensure execution data overrides pinned data when copying
in executions view ([#8009](#8009))
([1d1cb0d](1d1cb0d))
* **editor:** Fix copy/paste issue when switch node is in workflow
([#8103](#8103))
([4b86926](4b86926))
* **editor:** Make keyboard shortcuts more strict; don't accept extra
Ctrl/Alt/Shift keys ([#8024](#8024))
([8df49e1](8df49e1))
* **editor:** Show credential share info only to appropriate users
([#8020](#8020))
([b29b4d4](b29b4d4))
* **editor:** Turn off executions list auto-refresh after leaving the
page ([#8005](#8005))
([e3c363d](e3c363d))
* **editor:** Update image sizes in template description not to be full
width always ([#8037](#8037))
([63a6e7e](63a6e7e))
* **ActiveCampaign Node:** Fix pagination issue when loading tags
([#8017](#8017))
([1943857](1943857))
* **HTTP Request Node:** Do not create circular references in HTTP
request node output ([#8030](#8030))
([5b7ea16](5b7ea16))
* Upgrade axios to address CVE-2023-45857
([#7713](#7713))
([64eb9bb](64eb9bb))


### Features

* Add option to `returnIntermediateSteps` for AI agents
([#8113](#8113))
([7806a65](7806a65))
* **core:** Add config option to prefer GET request over LIST when using
Hashicorp Vault ([#8049](#8049))
([439a22d](439a22d))
* **core:** Add N8N_GRACEFUL_SHUTDOWN_TIMEOUT env var
([#8068](#8068))
([614f488](614f488))
* **editor:** Add lead enrichment suggestions to workflow list
([#8042](#8042))
([36a923c](36a923c))
* **editor:** Finalize workers view
([#8052](#8052))
([edfa784](edfa784))
* **editor:** Gracefully ignore invalid payloads in postMessage handler
([#8096](#8096))
([9d22c7a](9d22c7a))
* **editor:** Upgrade frontend tooling to address a few vulnerabilities
([#8100](#8100))
([19b7f1f](19b7f1f))
* **Filter Node:** Overhaul UI by adding the new filter component
([#8016](#8016))
([3d53052](3d53052))
* **Respond to Webhook Node:** Overhaul with improvements like returning
all items ([#8093](#8093))
([32d397e](32d397e))


### Performance Improvements

* **editor:** Improve canvas rendering performance
([#8022](#8022))
([b780436](b780436))

Co-authored-by: ivov <ivov@users.noreply.github.com>
@janober
Copy link
Member

janober commented Dec 21, 2023

Got released with n8n@1.22.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team Released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants