"event not processed by enough 'sink' nodes" affecting audit requests #24376

lboynton · 2023-12-05T18:04:29Z

Describe the bug
After upgrading to Vault 1.15.3 I am still observing "event not processed by enough 'sink' nodes" errors in our Vault log. #23871 has partially fixed the problem, however it only addresses audit responses and not audit requests. Is there a similar fix to apply for audit requests too?

2023-12-05T15:49:28.249Z [ERROR] core: failed to audit request: path=sys/leases/renew
  error=
  | 1 error occurred:
  | \t* event not processed by enough 'sink' nodes
  | 
  
2023-12-05T15:49:33.158Z [ERROR] secrets.system.system_072cc7dc: lease renewal failed: lease_id=database/creds/readonly/weDQJArsWMkCUwVBf7zzOVwg
  error=
  | failed to renew entry: resp: (*logical.Response)(nil) err: 1 error occurred:
  | \t* timeout: context canceled
  |

To Reproduce
TBC - I'm going to see if I can reproduce it in a clean dev Vault install.

Expected behavior
No error message.

Environment:

Vault Server Version (retrieve with vault status): 1.15.3
Vault CLI Version (retrieve with vault version): 1.15.3
Server Operating System/Architecture: MacOS arm64/Linux amd64

The text was updated successfully, but these errors were encountered:

fixes hashicorp#24376

miagilepner · 2023-12-07T10:26:03Z

Hi, I'm talking to the team to try to determine what the clarify the correct behavior here. This was something that we explicitly didn't change as part of #23871 because if the request context is canceled before the request can be audited, no state will have changed in Vault.

marcboudreau · 2023-12-12T19:12:13Z

Hi 👋,
I can provide a bit of context here to explain the observed behaviour. Starting in Vault 1.15.0, a different framework is used to publish audit log records (for both request and response) to their various Audit Devices (which is what 'sink' is referring to in the error). This new framework will return an error if the request deadline has been exceed (tracked by the context); whereas the old framework did not check if the deadline had been exceeded before writing the audit record. The API response was the same though, an error would be returned because the request took too long to service. The difference being that in 1.15.x, the response audit record is missing from the Audit Devices and the server logs include the above mentioned error; contrasted with < 1.15.x, where there was a response audit record showing the error returned to the API client written to the Audit Devices and no error in the server logs.

marcboudreau · 2023-12-13T00:46:01Z

Quick update: as of Vault 1.15.4, a change was introduce to avoid this exact issue. A new context with a fresh timeout value of 5 seconds is used to publish the response audit log record to the Audit Devices, so Value should now exhibit the same behaviour as prior to 1.15.0 when the context deadline has been reached while processing a request, which is that the response audit log will be written and will show the error that was sent back to the client.

marcboudreau · 2023-12-14T18:27:40Z

I'm going to close this Issue since the undesirable behaviour has been addressed already in Vault 1.15.4.

lboynton · 2023-12-14T21:19:23Z

A partial fix in #23871 was made and landed in Vault 1.15.3, I'm not aware of any changes in 1.15.4 that fixes this.

Hi, I'm talking to the team to try to determine what the clarify the correct behavior here. This was something that we explicitly didn't change as part of #23871 because if the request context is canceled before the request can be audited, no state will have changed in Vault.

Would it be possible to simply ignore the error and avoid incrementing the vault_audit_log_request_failure metric? It should be possible to use this metric to know when an audit device fails, however it is misleading here as the audit device is not failing.

KKatsnel · 2024-02-06T15:52:32Z

@marcboudreau Hi,
We've just faced this issue after upgrading to 1.15.5, and there is no answer to the question from the previous message: would it be possible to ignore the error and avoid incrementing the vault_audit_log_request_failure metric?
It is quite an important metric, so we don't want to disable an alert for it.

nilune · 2024-02-14T14:15:43Z

1.15.5 The same issue with metric

aphorise · 2024-06-17T10:27:29Z

This issue is likely resolved since Vault 1.16.3 as per the notes on issue: #25549 & CHANGELOG.md notes:

core/audit: Audit logging a Vault request/response will now use a minimum 5 second context timeout. If the existing context deadline occurs later than 5s in the future, it will be used, otherwise a new context, separate from the original will be used. [GH-26616]

aphorise · 2024-10-04T10:57:24Z

A correction to the earlier comment:

~~This issue is likely resolved since Vault 1.16.3 as per the notes on issue: #25549 & CHANGELOG.md notes:~~

Fixes were included up to version 1.16.10 (for other use cases) - however the best version to target that is likely to resolve all know occurrence is Vault 1.16.10 or higher

miagilepner added the core/audit label Dec 5, 2023

lboynton mentioned this issue Dec 5, 2023

Audit: logging a request uses a separate 5 second timeout #24377

Closed

lboynton added a commit to lboynton/vault that referenced this issue Dec 5, 2023

Audit: logging a request uses a separate 5 second timeout

3fc3f7b

fixes hashicorp#24376

marcboudreau closed this as completed Dec 14, 2023

KKatsnel mentioned this issue Feb 21, 2024

"event not processed by enough 'sink' nodes" error affecting vault.audit.log_request_failure metric #25549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"event not processed by enough 'sink' nodes" affecting audit requests #24376

"event not processed by enough 'sink' nodes" affecting audit requests #24376

lboynton commented Dec 5, 2023 •

edited

Loading

miagilepner commented Dec 7, 2023

marcboudreau commented Dec 12, 2023

marcboudreau commented Dec 13, 2023

marcboudreau commented Dec 14, 2023

lboynton commented Dec 14, 2023

KKatsnel commented Feb 6, 2024

nilune commented Feb 14, 2024

aphorise commented Jun 17, 2024

aphorise commented Oct 4, 2024

"event not processed by enough 'sink' nodes" affecting audit requests #24376

"event not processed by enough 'sink' nodes" affecting audit requests #24376

Comments

lboynton commented Dec 5, 2023 • edited Loading

miagilepner commented Dec 7, 2023

marcboudreau commented Dec 12, 2023

marcboudreau commented Dec 13, 2023

marcboudreau commented Dec 14, 2023

lboynton commented Dec 14, 2023

KKatsnel commented Feb 6, 2024

nilune commented Feb 14, 2024

aphorise commented Jun 17, 2024

aphorise commented Oct 4, 2024

lboynton commented Dec 5, 2023 •

edited

Loading