-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ResponseOps]: error creating event log index template at startup #134098
Comments
Pinging @elastic/response-ops (Team:ResponseOps) |
Looks like it's happening with the alias creation as well, haven't seen it for the index creation though. Happened to notice that some of this code changed in [Elasticsearch client: no longer default to using meta: true |
Looking at the implementation, turns out that if we can't create the template, we check (again) to see if it exists, as the message we were getting back indicating it didn't wasn't something we could easily check. So this is really not good. The second check should have returned kibana/x-pack/plugins/event_log/server/es/cluster_client_adapter.ts Lines 175 to 206 in fd4b8e3
|
Found a different problem, but in the same place, seems like we can address it when we fix this issue. I'm seeing a few of these in some logs:
The old "user mixing templates into 'system' indices". Unavoidable, for cases where a user uses such a broad template pattern. Seems like we can at least bump the priority of our own template. We'll have to see what other indices used by the stack set theirs to, I think either 100 or 1000. Not sure if we can do better. Moving to a datastream would probably fix this (but not sure), and that may be a big task, but worthy of consideration, I think. I'm not sure if we can be more precise in our pattern, but that |
resolves elastic#134098 Adds retry logic to the initialization of elasticsearch resources, when Kibana starts up. Recently, it seems this has become a more noticeable error - that race conditions occur where two Kibana's initializing a new stack version will race to create the event log resources. We believe we'll see the end of these issues with some retries, chunked around the 4 resource-y sections of the initialization code. We're using [p-retry][] (which uses [retry][]), to do an exponential backoff starting at 2s, then 4s, 8s, 16s, with 4 retries (so 5 actual attempted calls). Some randomness is added, since there's a race on. [p-retry]: https://github.com/sindresorhus/p-retry#p-retry [retry]: https://github.com/tim-kos/node-retry#retry
resolves #134098 Adds retry logic to the initialization of elasticsearch resources, when Kibana starts up. Recently, it seems this has become a more noticeable error - that race conditions occur where two Kibana's initializing a new stack version will race to create the event log resources. We believe we'll see the end of these issues with some retries, chunked around the 4 resource-y sections of the initialization code. We're using [p-retry][] (which uses [retry][]), to do an exponential backoff starting at 2s, then 4s, 8s, 16s, with 4 retries (so 5 actual attempted calls). Some randomness is added, since there's a race on. [p-retry]: https://github.com/sindresorhus/p-retry#p-retry [retry]: https://github.com/tim-kos/node-retry#retry Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…6363) resolves elastic#134098 Adds retry logic to the initialization of elasticsearch resources, when Kibana starts up. Recently, it seems this has become a more noticeable error - that race conditions occur where two Kibana's initializing a new stack version will race to create the event log resources. We believe we'll see the end of these issues with some retries, chunked around the 4 resource-y sections of the initialization code. We're using [p-retry][] (which uses [retry][]), to do an exponential backoff starting at 2s, then 4s, 8s, 16s, with 4 retries (so 5 actual attempted calls). Some randomness is added, since there's a race on. [p-retry]: https://github.com/sindresorhus/p-retry#p-retry [retry]: https://github.com/tim-kos/node-retry#retry Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit f6e4c2f)
…6363) resolves elastic#134098 Adds retry logic to the initialization of elasticsearch resources, when Kibana starts up. Recently, it seems this has become a more noticeable error - that race conditions occur where two Kibana's initializing a new stack version will race to create the event log resources. We believe we'll see the end of these issues with some retries, chunked around the 4 resource-y sections of the initialization code. We're using [p-retry][] (which uses [retry][]), to do an exponential backoff starting at 2s, then 4s, 8s, 16s, with 4 retries (so 5 actual attempted calls). Some randomness is added, since there's a race on. [p-retry]: https://github.com/sindresorhus/p-retry#p-retry [retry]: https://github.com/tim-kos/node-retry#retry Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit f6e4c2f)
…136647) resolves #134098 Adds retry logic to the initialization of elasticsearch resources, when Kibana starts up. Recently, it seems this has become a more noticeable error - that race conditions occur where two Kibana's initializing a new stack version will race to create the event log resources. We believe we'll see the end of these issues with some retries, chunked around the 4 resource-y sections of the initialization code. We're using [p-retry][] (which uses [retry][]), to do an exponential backoff starting at 2s, then 4s, 8s, 16s, with 4 retries (so 5 actual attempted calls). Some randomness is added, since there's a race on. [p-retry]: https://github.com/sindresorhus/p-retry#p-retry [retry]: https://github.com/tim-kos/node-retry#retry Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit f6e4c2f) Co-authored-by: Patrick Mueller <patrick.mueller@elastic.co>
…136646) resolves #134098 Adds retry logic to the initialization of elasticsearch resources, when Kibana starts up. Recently, it seems this has become a more noticeable error - that race conditions occur where two Kibana's initializing a new stack version will race to create the event log resources. We believe we'll see the end of these issues with some retries, chunked around the 4 resource-y sections of the initialization code. We're using [p-retry][] (which uses [retry][]), to do an exponential backoff starting at 2s, then 4s, 8s, 16s, with 4 retries (so 5 actual attempted calls). Some randomness is added, since there's a race on. [p-retry]: https://github.com/sindresorhus/p-retry#p-retry [retry]: https://github.com/tim-kos/node-retry#retry Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit f6e4c2f) Co-authored-by: Patrick Mueller <patrick.mueller@elastic.co>
Kibana version: 8.3.0
Describe the bug:
Noticed this is happening intermittently during our kbn-alert-load daily runs:
Not great. Appears to be a race condition here:
kibana/x-pack/plugins/event_log/server/es/init.ts
Lines 191 to 202 in 7bfcb52
I think we'll want to check the other resource creation bits as well, and perhaps when we fix this, we can also fix #127029, which is kinda related - some kind of a timing issue we should be able to work around by refactoring the EL initialization (include the mapping when we create the index, not just the template).
The text was updated successfully, but these errors were encountered: