Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ACL errors consistently when blocking query timeout is reached. #20876

Merged
merged 5 commits into from
Mar 22, 2024

Conversation

ishustava
Copy link
Contributor

@ishustava ishustava commented Mar 18, 2024

Please see #20790 for reproduction steps.

Currently, when a client starts a blocking query and an ACL token expires within that time, Consul will return an "ACL not found" error with a 403 status code. However, sometimes if an ACL token is invalidated at the same time as the query's deadline is reached, Consul will instead return an empty response with a 200 status code.

This is because of the events being executed:

  1. Client issues a blocking query request with timeout t.
  2. ACL is deleted.
  3. Server detects a change in ACLs and force closes the gRPC stream.
  4. Client resubscribes with the same token and resets its state (view).
  5. Client sees "ACL not found" error.

If ACL is deleted before step 4, the client is unaware that the stream was closed due to an ACL error and will return an empty view (from the reset state) with the 200 status code.

To fix this problem, we introduce another state to the subsciption to indicate when a change to ACLs has occured. If the server sees that there was an error due to ACL change, it will re-authenticate the request and return an error if the token is no longer valid.

Fixes #20790

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

Currently, when a client starts a blocking query and an ACL token expires within
that time, Consul will return ACL not found error with a 403 status code. However,
sometimes if an ACL token is invalidated at the same time as the query's deadline is reached,
Consul will instead return an empty response with a 200 status code.

This is because of the events being executed.
1. Client issues a blocking query request with timeout `t`.
2. ACL is deleted.
3. Server detects a change in ACLs and force closes the gRPC stream.
4. Client resubscribes with the same token and resets its state (view).
5. Client sees "ACL not found" error.

If ACL is deleted before step 4, the client is unaware that the stream was closed due to
an ACL error and will return an empty view (from the reset state) with the 200 status code.

To fix this problem, we introduce another state to the subsciption to indicate when a change
to ACLs has occured. If the server sees that there was an error due to ACL change, it will
re-authenticate the request and return an error if the token is no longer valid.

Fixes #20790
@ishustava ishustava marked this pull request as ready for review March 19, 2024 17:44
@ishustava ishustava requested a review from a team as a code owner March 19, 2024 17:44
@ishustava ishustava force-pushed the ishustava/NET-8431 branch 2 times, most recently from 3b87e11 to 13739ee Compare March 19, 2024 22:48
@ishustava ishustava merged commit d747b51 into main Mar 22, 2024
89 of 91 checks passed
@ishustava ishustava deleted the ishustava/NET-8431 branch March 22, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blocking query of the health API "/health/service/<service>" may return empty nodes for non-empty services.
2 participants