-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update alert documents when the write index changes #110788
Conversation
x-pack/plugins/rule_registry/server/utils/create_lifecycle_executor.ts
Outdated
Show resolved
Hide resolved
e10635e
to
626cdf4
Compare
⏳ Build in-progress, with failures
History
To update your PR or re-run it, just comment with: |
4 similar comments
⏳ Build in-progress, with failures
History
To update your PR or re-run it, just comment with: |
⏳ Build in-progress, with failures
History
To update your PR or re-run it, just comment with: |
⏳ Build in-progress, with failures
History
To update your PR or re-run it, just comment with: |
⏳ Build in-progress, with failures
History
To update your PR or re-run it, just comment with: |
Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui) |
💛 Build succeeded, but was flaky
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @mgiota |
I tested above two scenarios and here are the findings: Scenario 1 ✅ Scenario 2 🐞 But the old alert got written in the new index as well This is a new bug that we spotted and it is because ILM policy deleted the old indices after rollover. UPDATE |
Isn't the whole point of this bugfix to ensure that it's being written to the old index again? Otherwise the duplication I described in the issue would occur, which we are trying to avoid. Maybe I'm misunderstanding. The video and the screenshots. Which is happening when? |
@weltenwort I updated my comment above as per our discussion. I hope it is more clear now for other reviewers. And yes the expected behavior is for old alerts to keep being written in the old index again after a rollover. |
Hm, seems like we have some edge cases to smooth out. I'll take a look asap. |
So after some collaborative investigation we realized this fix uncovered a different problem related to the ILM policy associated with the alerting indices by default. We'll track and resolve that separately. 😌 |
Here's the new ticket #111029. It should also solve the issue with the empty reason field for recovered alerts I pasted above |
Functionality looks good 👍 However, just to throw a spanner in the works, I didn't see this behaviour:
In both scenario 1 and 2 I didn't have my old index deleted. Checking the code now. Test resultsScenario 1
Scenario 2
(Timestamps show the updates continuing after rollover without deleting the old index) |
@Kerry350 I was also not able to reproduce the bug in Scenario 2. It looks like unconditionally ILM deleted the indices for me when I was testing. I will update the description. Thanks for checking it out so thoroughly |
@Kerry350 I will enable automerge, since I will be off for a couple of hours. |
👍 I'll keep an eye too in case anything goes wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
* first draft(work in progress) * add back missing await * disable require_alias flag only when we update * cleanup
I approved before noticing the |
* first draft(work in progress) * add back missing await * disable require_alias flag only when we update * cleanup
Thanks for the review! I think auto-backport also works retroactively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
indexName | ||
? { index: { _id: event[ALERT_UUID]!, _index: indexName, require_alias: false } } | ||
: { index: { _id: event[ALERT_UUID]! } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, I'd just like to ask if you thought about using create
+ update
instead of index
and decided to keep index
for both operations. It looks like this executor collects the full set of document fields, so it's safer to use index
(create or replace the whole doc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we considered update
but figured using index
gives tighter control over the full document content (e.g. allows for removal of fields). We might refactor to not fetch the full content and use update
in the future, but it looked like too much of a change for a late 7.15.0 bug-fix.
Looks like this PR has backport PRs but they still haven't been merged. Please merge them ASAP to keep the branches relatively in sync. |
* first draft(work in progress) * add back missing await * disable require_alias flag only when we update * cleanup Co-authored-by: mgiota <giota85@gmail.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Fixes #110519
📝 Summary
The proposed solution in the above ticket to remove
require_alias: true
didn't work, because the mappings wouldn't be installed correctly. This is because of the current logic where resources are installed only when the bulk operation fails https://github.com/elastic/kibana/blob/master/x-pack/plugins/rule_registry/server/rule_data_client/rule_data_client.ts#L139. If the flag is missing and we would like to index some new data, resources wouldn't be installed and alerts table wouldn't render any data.So the fix was to:
require_alias
: trueHow to test
For both scenarios here's the command you can use in Dev tools to get the alerts that are being indexed
Scenario 1 (an ongoing alert gets written in old index after rollover)
.internal.alerts-observability.logs.alerts-default-000001
POST .alerts-observability.logs.alerts-default/_rollover
GET .alerts-observability.logs.alerts-default
.internal.alerts-observability.logs.alerts-default-000001
Scenario 2 (an ongoing alert SHOULD be written in the old index after rollover and after a new rule type was created)
.internal.alerts-observability.logs.alerts-default-000001
POST .alerts-observability.logs.alerts-default/_rollover
GET .alerts-observability.logs.alerts-default
.internal.alerts-observability.logs.alerts-default-000002
.internal.alerts-observability.logs.alerts-default-000001
=>We just spotted a bug 🐞 🐛 , where ILM policy deleted the old indices after rollover and the old alert got indexed in[RAC] Alert ILM policy shouldn't delete old indices after rollover #111029.internal.alerts-observability.logs.alerts-default-000002
instead