-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): Consider timeout in shutdown an error #8050
Conversation
If the process doesn't shutdown within a time limit, exit with error code.
@@ -439,7 +439,7 @@ export const schema = { | |||
env: 'QUEUE_RECOVERY_INTERVAL', | |||
}, | |||
gracefulShutdownTimeout: { | |||
doc: 'How long should n8n wait for running executions before exiting worker process', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this to a generic config variable, instead of having a worker specific one? it would be a breaking change, but we can add it in BREAKING-CHANGES.md
. Since it will affect a very small minority of users, I think it should be okay.
@krynble WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking whether it would make senes to have different timeouts by process type. Say 10 seconds for webhooks, 1 minute for mains and 2 minutes for workers, depending on the use case. Is it overkill?
If we use this approach, then we could have on "global default" value of 30 seconds but timers can be set separately.
Do we have any specific goals we're trying to achieve with this change other than changing this exit to a crash instead of successful? I'm fine to moving to a single global timeout as long as we continue supporting the old value for some time until users have a change to change, while sending a warning to the console.
I think relying solely on the breaking changes log isn't ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are couple reasons behind this change. First is that conceptually something timing out is an error. Second is that on successful exit we close down the DB connection gracefully. On an exit timeout we rather not do that, since it will wait for any active connections to close and would possible block the exit. I updated the PR description for better reasoning.
IMO it would make sense to have a more generic config for this. I don't think we have to create separate vars for separate process types, as they are separate deployments anyways and have different env vars. Uou can then define a different exit timeout value for each separately.
I can create a separate PR for adding a more generic config variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood - makes sense. Thanks for clarifying.
About the generic variable, I strongly suggest we continue supporting the existing name to prevent breaking existing deployment, while warning users that the current environment variable has been deprecated and should be simply renamed, similar to how we did for deprecating MySQL.
✅ All Cypress E2E specs passed |
Passing run #3375 ↗︎
Details:
Review all test suite changes for PR #8050 ↗︎ |
# [1.22.0](https://github.com/n8n-io/n8n/compare/n8n@1.21.0...n8n@1.22.0) (2023-12-21) ### Bug Fixes * **core:** Close db connection gracefully when exiting ([#8045](#8045)) ([e69707e](e69707e)) * **core:** Consider timeout in shutdown an error ([#8050](#8050)) ([4cae976](4cae976)) * **core:** Do not display error when stopping jobless execution in queue mode ([#8007](#8007)) ([8e6b951](8e6b951)) * **core:** Fix shutdown if terminating before hooks are initialized ([#8047](#8047)) ([6ae2f5e](6ae2f5e)) * **core:** Handle multiple termination signals correctly ([#8046](#8046)) ([67bd8ad](67bd8ad)) * **core:** Initialize queue once in queue mode ([#8025](#8025)) ([53c0b49](53c0b49)) * **core:** Prevent axios from force setting a form-urlencoded content-type ([#8117](#8117)) ([bba9576](bba9576)) * **core:** Remove circular references before serializing executions in public API ([#8043](#8043)) ([989888d](989888d)) * **core:** Restore workflow ID during execution creation ([#8031](#8031)) ([c5e6ba8](c5e6ba8)) * **core:** Use relative imports for dynamic imports in SecurityAuditService ([#8086](#8086)) ([785bf99](785bf99)) * **core:** Stop binary data restoration from preventing execution from finishing ([#8082](#8082)) ([5ffff1b](5ffff1b)) * **editor:** Add back credential `use` permission ([#8023](#8023)) ([329e5bf](329e5bf)) * **editor:** Cleanup Executions page component ([#8053](#8053)) ([2689c37](2689c37)) * **editor:** Disable auto scroll and list size check when clicking on executions ([#7983](#7983)) ([fcb8b91](fcb8b91)) * **editor:** Ensure execution data overrides pinned data when copying in executions view ([#8009](#8009)) ([1d1cb0d](1d1cb0d)) * **editor:** Fix copy/paste issue when switch node is in workflow ([#8103](#8103)) ([4b86926](4b86926)) * **editor:** Make keyboard shortcuts more strict; don't accept extra Ctrl/Alt/Shift keys ([#8024](#8024)) ([8df49e1](8df49e1)) * **editor:** Show credential share info only to appropriate users ([#8020](#8020)) ([b29b4d4](b29b4d4)) * **editor:** Turn off executions list auto-refresh after leaving the page ([#8005](#8005)) ([e3c363d](e3c363d)) * **editor:** Update image sizes in template description not to be full width always ([#8037](#8037)) ([63a6e7e](63a6e7e)) * **ActiveCampaign Node:** Fix pagination issue when loading tags ([#8017](#8017)) ([1943857](1943857)) * **HTTP Request Node:** Do not create circular references in HTTP request node output ([#8030](#8030)) ([5b7ea16](5b7ea16)) * Upgrade axios to address CVE-2023-45857 ([#7713](#7713)) ([64eb9bb](64eb9bb)) ### Features * Add option to `returnIntermediateSteps` for AI agents ([#8113](#8113)) ([7806a65](7806a65)) * **core:** Add config option to prefer GET request over LIST when using Hashicorp Vault ([#8049](#8049)) ([439a22d](439a22d)) * **core:** Add N8N_GRACEFUL_SHUTDOWN_TIMEOUT env var ([#8068](#8068)) ([614f488](614f488)) * **editor:** Add lead enrichment suggestions to workflow list ([#8042](#8042)) ([36a923c](36a923c)) * **editor:** Finalize workers view ([#8052](#8052)) ([edfa784](edfa784)) * **editor:** Gracefully ignore invalid payloads in postMessage handler ([#8096](#8096)) ([9d22c7a](9d22c7a)) * **editor:** Upgrade frontend tooling to address a few vulnerabilities ([#8100](#8100)) ([19b7f1f](19b7f1f)) * **Filter Node:** Overhaul UI by adding the new filter component ([#8016](#8016)) ([3d53052](3d53052)) * **Respond to Webhook Node:** Overhaul with improvements like returning all items ([#8093](#8093)) ([32d397e](32d397e)) ### Performance Improvements * **editor:** Improve canvas rendering performance ([#8022](#8022)) ([b780436](b780436)) Co-authored-by: ivov <ivov@users.noreply.github.com>
Got released with |
Summary
If the process doesn't shutdown within a time limit, exit with error code.
Related tickets and issues
Review / Merge checklist
(no-changelog)
otherwise. (conventions)