Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"event not processed by enough 'sink' nodes" error affecting vault.audit.log_request_failure metric #25549

Closed
KKatsnel opened this issue Feb 21, 2024 · 6 comments
Labels
core/audit core/telemetry core Issues and Pull-Requests specific to Vault Core enhancement feature-request

Comments

@KKatsnel
Copy link

Describe the bug
After upgrading to Vault 1.15.5 we are still observing "event not processed by enough 'sink' nodes" errors in our Vault log. #23871 has partially fixed the problem, however it only addresses audit responses and not audit requests. There is a similar issue - #24376, but it is in a "Closed" state, and nobody answers the questions asked.
Is it possible to ignore the error and avoid incrementing the vault_audit_log_request_failure metric? It is quite an important metric, so we don't want to disable an alert for it.

error:1 error occurred: * event not processed by enough 'sink' nodes @message:failed to audit request path:auth/token/revoke-self @level:error @timestamp:Feb 20, 2024 @ 20:40:02.113 ...

To Reproduce
STR provided in #23871

Expected behavior
vault.audit.log_request_failure metric is not incremented in this case

Environment:

  • Vault Server Version (retrieve with vault status): 1.15.5
  • Vault CLI Version (retrieve with vault version): 1.15.5
  • Server Operating System/Architecture: Linux/amd64
@lboynton
Copy link
Contributor

FYI this issue can be worked around by setting VAULT_AUDIT_DISABLE_EVENTLOGGER=true as an environment variable so that Vault uses the previous audit log behaviour used in Vault 1.14.x and earlier. This env var is being removed in Vault 1.16 though: #24764

This will prevent us from upgrading to 1.16.

@peteski22 peteski22 added core/audit core/telemetry core Issues and Pull-Requests specific to Vault Core enhancement feature-request labels Mar 6, 2024
@peteski22
Copy link

Hi @lboynton thanks for the report.

We've got this on our radar to look at internally, but cannot commit to saying if/when it will be addressed.

It's understandable frustration that if Vault didn't do it before, but does it now (fail this way with server logs and telemetry) - that's not good. But also an awkward one as the audit system isn't incorrect in what it's reporting, it cannot audit due to the context timeout.

This issue report (and everything connected to it) has been linked up internally, so when we do have progress to report you'll be aware.

@Kasama
Copy link
Contributor

Kasama commented Mar 19, 2024

Hello. We are also facing this same issue without VAULT_AUDIT_DISABLE_EVENTLOGGER=true. Here's a graph with the metrics for audit logs on Vault 1.15.6 before and after setting VAULT_AUDIT_DISABLE_EVENTLOGGER=true.

It seems that the new eventlogger does a lot more requests, causes higher latency and fails with the event not processed by enough 'sink' nodes error every once in a while.

image

@lboynton
Copy link
Contributor

I guess #26616 is a potential fix

@peteski22
Copy link

Just an update for folks, we're actively looking at this one at the moment, in a few different ways... One or more of those might end up making it out into the wild. Thanks for the patience.

@peteski22
Copy link

OK this #26616 should close the issue. We have another ticket to discuss internally which is linked which may show up later. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core/audit core/telemetry core Issues and Pull-Requests specific to Vault Core enhancement feature-request
Projects
None yet
Development

No branches or pull requests

4 participants