-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UA] Tight worker loop can cause high CPU usage #60950
[UA] Tight worker loop can cause high CPU usage #60950
Conversation
Pinging @elastic/es-ui (Team:Elasticsearch UI) |
x-pack/plugins/upgrade_assistant/server/lib/reindexing/worker.ts
Outdated
Show resolved
Hide resolved
The worker scheduler should only sleep when it cannot process any in progress operations. Additionally, logic has been added for handling of queue operations that have been in the queue for a long time and may be viewed as still in small window of time by wokers that do not have the credentials to process those reindex operations.
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Tested locally and works as expected.
) { | ||
// TODO: This tight loop needs something to relax potentially high CPU demands so this padding is added. | ||
// This scheduler should be revisited in future. | ||
await new Promise(res => setTimeout(res, WORKER_PADDING_MS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do you mind using resolve
instead of res
. I first read it as response
(that I always shortened as res
! 😄 )
@elasticmachine merge upstream |
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
* Addded worker padding to save some CPU * Updated comments * Update worker scheduler and add a new util The worker scheduler should only sleep when it cannot process any in progress operations. Additionally, logic has been added for handling of queue operations that have been in the queue for a long time and may be viewed as still in small window of time by wokers that do not have the credentials to process those reindex operations. * res 👉🏻resolve Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Addded worker padding to save some CPU * Updated comments * Update worker scheduler and add a new util The worker scheduler should only sleep when it cannot process any in progress operations. Additionally, logic has been added for handling of queue operations that have been in the queue for a long time and may be viewed as still in small window of time by wokers that do not have the credentials to process those reindex operations. * res 👉🏻resolve Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* upstream/master: (69 commits) Adding PagerDuty icon to connectors cards (elastic#60805) Fix drag and drop flakiness (elastic#61993) Grok debugger migration (elastic#60658) Endpoint: Fix resolver SVG position issue (elastic#61886) [SIEM] version 7.7 rule import (elastic#61903) Added styles to make combobox list items wider for alerting flyout (elastic#61894) [UA] Tight worker loop can cause high CPU usage (elastic#60950) [ML] DF Analytics results table: use index pattern field format if one exists (elastic#61709) [ML] Catching unknown index pattern errors (elastic#61935) [Discover] Deangularize and euificate sidebar (elastic#47559) Endpoint: Add ts-node dev dependency (elastic#61884) Add an onBlur handler for the kuery bar. Only resubmit when input changes. (elastic#61901) [ML] Handle Empty Partition Field Values in Single Metric Viewer (elastic#61649) Auto interval on date histogram is getting displayed as timestamp per… (elastic#59171) [Maps] Explicitly pass fetch function to ems-client (elastic#61846) [SIEM][CASE] Fix aria-labels and translations (elastic#61670) [ML] Settings: Increase number of items that can be paged in calendars and filters lists (elastic#61842) [EPM] update epm filepath route (elastic#61910) APM] Set ignore_above to 1024 for telemetry saved object (elastic#61732) [Logs UI] Log stream row rendering (elastic#60773) ...
* Addded worker padding to save some CPU * Updated comments * Update worker scheduler and add a new util The worker scheduler should only sleep when it cannot process any in progress operations. Additionally, logic has been added for handling of queue operations that have been in the queue for a long time and may be viewed as still in small window of time by wokers that do not have the credentials to process those reindex operations. * res 👉🏻resolve Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Addded worker padding to save some CPU * Updated comments * Update worker scheduler and add a new util The worker scheduler should only sleep when it cannot process any in progress operations. Additionally, logic has been added for handling of queue operations that have been in the queue for a long time and may be viewed as still in small window of time by wokers that do not have the credentials to process those reindex operations. * res 👉🏻resolve Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* master: (64 commits) Adding PagerDuty icon to connectors cards (elastic#60805) Fix drag and drop flakiness (elastic#61993) Grok debugger migration (elastic#60658) Endpoint: Fix resolver SVG position issue (elastic#61886) [SIEM] version 7.7 rule import (elastic#61903) Added styles to make combobox list items wider for alerting flyout (elastic#61894) [UA] Tight worker loop can cause high CPU usage (elastic#60950) [ML] DF Analytics results table: use index pattern field format if one exists (elastic#61709) [ML] Catching unknown index pattern errors (elastic#61935) [Discover] Deangularize and euificate sidebar (elastic#47559) Endpoint: Add ts-node dev dependency (elastic#61884) Add an onBlur handler for the kuery bar. Only resubmit when input changes. (elastic#61901) [ML] Handle Empty Partition Field Values in Single Metric Viewer (elastic#61649) Auto interval on date histogram is getting displayed as timestamp per… (elastic#59171) [Maps] Explicitly pass fetch function to ems-client (elastic#61846) [SIEM][CASE] Fix aria-labels and translations (elastic#61670) [ML] Settings: Increase number of items that can be paged in calendars and filters lists (elastic#61842) [EPM] update epm filepath route (elastic#61910) APM] Set ignore_above to 1024 for telemetry saved object (elastic#61732) [Logs UI] Log stream row rendering (elastic#60773) ...
Summary
In Upgrade Assistant, when there are multiple Kibana instances sharing an ES cluster, the worker loop can consume a lot of CPU under certain conditions.
How to reproduce on master
x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_handler.ts
comment out the line that readscredentialStore.set(reindexOp, headers);
. This will simulate a situation where we are a Kibana instance that does not have the user credentials required for furthering the reindex operation - this is the key to unlocking the performance bug.Solution
The simplest solution was just to add some padding in the form of simulated sleep.
Additional
There was also a (small) potential issue with queued items that could still be seen as stale (see #60770). We now let workers without credentials to update the reindex op double check queued operations.