Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle access tokens that expire after authentication stage #104893

Closed
azasypkin opened this issue Jul 8, 2021 · 4 comments · Fixed by #122155
Closed

Handle access tokens that expire after authentication stage #104893

azasypkin opened this issue Jul 8, 2021 · 4 comments · Fixed by #122155
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Security/Authentication Platform Security - Authentication impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:small Small Level of Effort research Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!

Comments

@azasypkin
Copy link
Member

azasypkin commented Jul 8, 2021

The authentication stage is the only stage in a request lifecycle when Kibana can properly handle expired access tokens either by refreshing them or re-initiating authentication. This approach served us reasonably well in the past, but we can do better.

There are cases when Kibana needs more time to process user request and hence re-use the access token multiple times. The longer access token is used after authentication stage the higher the chance that it can expire in the middle of request processing. If this happens Kibana may return 401 error that will trigger logout (see the flow №3 in the Current flows section).

There are a plenty of ways to tackle this, for example:

  • Core can provide an extension point that Security plugin can hook into to handle 401 errors returned by the Elasticsearch client before they are thrown to the consumer. Here we can potentially refresh token or re-initiate authentication, and ask Core to retry request. It only applies to the scoped clients, internal clients don't need this functionality.
  • On the client side we can cause logouts only if 401 error is returned by the authentication layer (special response header?)

I'm leaning towards the first option, but any other ideas and suggestions are welcome.


The diagramming tool: https://mermaid.live/

Current flows

  1. [Happy Path] SAML/OpenID Connect/Kerberos/PKI/Token flow - Access token is valid
Source

sequenceDiagram
    autonumber
    Client->>Kibana (HTTP): User request
    Kibana (HTTP)-->>Kibana (Security): Can request be authenticated?
    Kibana (Security)-->>Kibana (Elasticsearch Client): Does request have an associated user session?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here it's { ... access token ... }
    Kibana (Security)-->>Kibana (Elasticsearch Client): Is the access token still valid?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here's the user info { ... name, roles ... }
    Kibana (Security)-->>Kibana (HTTP): Yes, store *auth* HTTP headers
    Kibana (HTTP)->>Kibana (App): Handle request
    Kibana (App)->>Kibana (Elasticsearch Client): App request (with *auth* HTTP headers)
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)->>Kibana (App): App response
    Kibana (App)->>Kibana (HTTP): Done, here's the response
    Kibana (HTTP)->>Client: Response

mermaid-diagram-20211116115852

  1. [Happy Path] SAML/OpenID Connect/Kerberos/Token/PKI flow - Access token is expired (BEFORE authc stage), but refresh token is valid (there is no refresh token in case of PKI, we re-use the peer certificate instead)
Source

sequenceDiagram
    autonumber
    Client->>Kibana (HTTP): User request
    Kibana (HTTP)-->>Kibana (Security): Can request be authenticated?
    Kibana (Security)-->>Kibana (Elasticsearch Client): Does request have an associated user session?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here it's { ... access token ... }
    Kibana (Security)-->>Kibana (Elasticsearch Client): Is the access token still valid?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 401 Unauthorized
    Kibana (Elasticsearch Client)-->>Kibana (Security): No, the token seems to be expired
    Kibana (Security)-->>Kibana (Elasticsearch Client): [SAML\OIDC\Kerberos\Token] Can the token be refreshed?<br>[PKI] Can the *peer certificate* be exchanged to a new token pair?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here are the *new* tokens and user info { ... name, roles ... }
    Kibana (Security)-->>Kibana (Elasticsearch Client): Update session with *new* access and refresh tokens
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Done
    Kibana (Security)-->>Kibana (HTTP): Yes, store *auth* HTTP headers
    Kibana (HTTP)->>Kibana (App): Handle request
    Kibana (App)->>Kibana (Elasticsearch Client): App request (with *auth* HTTP headers)
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)->>Kibana (App): App response
    Kibana (App)->>Kibana (HTTP): Done, here's the response
    Kibana (HTTP)->>Client: Response

mermaid-diagram-20211117105951

  1. [Unhappy Path] SAML/OpenID Connect/Kerberos/PKI/Token flow - Access token is expired (AFTER authc stage), refresh token isn't used
Source

sequenceDiagram
    autonumber
    Client->>Kibana (HTTP): User request
    Kibana (HTTP)-->>Kibana (Security): Can request be authenticated?
    Kibana (Security)-->>Kibana (Elasticsearch Client): Does request have an associated user session?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here it's { ... access token ... }
    Kibana (Security)-->>Kibana (Elasticsearch Client): Is the access token still valid?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here's the user info { ... name, roles ... }
    Kibana (Security)-->>Kibana (HTTP): Yes, store *auth* HTTP headers
    Kibana (HTTP)->>Kibana (App): Handle request
    Kibana (App)->>Kibana (Elasticsearch Client): App request (with *auth* HTTP headers)
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 401 Unauthorized
    Kibana (Elasticsearch Client)->>Kibana (App): 401 Unauthorized: Token is expired 
    Kibana (App)->>Kibana (HTTP): Failed, here's the error
    Note over Kibana (HTTP),Kibana (App): App can handle this error in two following ways:<br> 1. Return original 401 and *force user to log out (the most frequent way)*<br/> 2. Return 50x and let subsequent request to refresh token at the authc stage (steps 2-15 in the refresh flow) 
    Kibana (HTTP)->>Client: Error response

mermaid-diagram-20211116125131

Proposed flows

  1. [Happy Path] SAML/OpenID Connect/Kerberos/PKI/Token flow - Access token is expired (AFTER authc stage), refresh token is valid (or peer certificate in case of PKI)
Source

sequenceDiagram
    autonumber
    Client->>Kibana (HTTP): User request
    Kibana (HTTP)-->>Kibana (Security): Can request be authenticated?
    Kibana (Security)-->>Kibana (Elasticsearch Client): Does request have an associated user session?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here it's { ... access token ... }
    Kibana (Security)-->>Kibana (Elasticsearch Client): Is the access token still valid?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here's the user info { ... name, roles ... }
    Kibana (Security)-->>Kibana (HTTP): Yes, store *auth* HTTP headers
    Kibana (HTTP)->>Kibana (App): Handle request
    Kibana (App)->>Kibana (Elasticsearch Client): App request (with *auth* HTTP headers)
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 401 Unauthorized
    Kibana (Elasticsearch Client)-->>Kibana (HTTP): 401 Unauthorized: Token is expired
    Note over Kibana (HTTP),Kibana (Elasticsearch Client): ⚠️ We should distinguish this case from the case when Kibana (Security) itself receives a `401 Unauthorized` 
    Kibana (HTTP)-->>Kibana (Security): Can request be re-authenticated?
    Kibana (Security)-->>Kibana (Security): Check if 401 is because of expired token.
    Kibana (Security)-->>Kibana (Elasticsearch Client): Can the token be refreshed?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here are the *new* tokens and user info { ... name, roles ... }
    Kibana (Security)-->>Kibana (Elasticsearch Client): Update session with *new* access and refresh tokens
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Done
    Kibana (Security)-->>Kibana (HTTP): Yes, store *auth* HTTP headers
    Kibana (HTTP)-->>Kibana (Elasticsearch Client): Retry app request (with *new auth* HTTP headers)
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)->>Kibana (App): App response
    Kibana (App)->>Kibana (HTTP): Done, here's the response
    Kibana (HTTP)->>Client: Response

mermaid-diagram-20211117123750

  1. [Uhappy Path] SAML/OpenID Connect/Kerberos/Token/PKI flow - Both access and refresh tokens are expired (AFTER authc stage)
Source

sequenceDiagram
    autonumber
    Client->>Kibana (HTTP): User request
    Kibana (HTTP)-->>Kibana (Security): Can request be authenticated?
    Kibana (Security)-->>Kibana (Elasticsearch Client): Does request have an associated user session?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here it's { ... access token ... }
    Kibana (Security)-->>Kibana (Elasticsearch Client): Is the access token still valid?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 200 OK
    Kibana (Elasticsearch Client)-->>Kibana (Security): Yes, here's the user info { ... name, roles ... }
    Kibana (Security)-->>Kibana (HTTP): Yes, store *auth* HTTP headers
    Kibana (HTTP)->>Kibana (App): Handle request
    Kibana (App)->>Kibana (Elasticsearch Client): App request (with *auth* HTTP headers)
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 401 Unauthorized
    Kibana (Elasticsearch Client)-->>Kibana (HTTP): 401 Unauthorized: Token is expired
    Note over Kibana (HTTP),Kibana (Elasticsearch Client): ⚠️ We should distinguish this case from the case when Kibana (Security) itself receives a `401 Unauthorized` 
    Kibana (HTTP)-->>Kibana (Security): Can request be re-authenticated?
    Kibana (Security)-->>Kibana (Security): Check if 401 is because of expired token.
    Kibana (Security)-->>Kibana (Elasticsearch Client): Can the token be refreshed?
    Kibana (Elasticsearch Client)-->>Kibana (Elasticsearch Client): Elasticsearch: 400 Bad Request
    Kibana (Elasticsearch Client)-->>Kibana (Security): No, user needs to re-login<br>[SAML/OIDC] Via external Identity Provider<br>[Token] Via Kibana login page<br>[PKI] Via new valid certificate<br>[Kerberos] Via SPNEGO handshake
    Note over Kibana (Security),Kibana (Elasticsearch Client): There is no need in invalidating the current session,<br>it will be invalidated by the subsequent request
    Kibana (Security)-->>Kibana (HTTP): No, re-authentication requires user's intervention
    Kibana (HTTP)-->>Kibana (Elasticsearch Client): Return *original* error
    Kibana (Elasticsearch Client)->>Kibana (App): 401 Unauthorized: Token is expired 
    Kibana (App)->>Kibana (HTTP): Failed, here's the error
    Note over Kibana (HTTP),Kibana (App): App can handle this error in two following ways:<br> 1. Return original 401 and *force user to log out (the most frequent way)*<br/> 2. Return 50x and let subsequent request to refresh token at the authc stage (steps 2-15 in the refresh flow) 
    Kibana (HTTP)->>Client: Error response

mermaid-diagram-20211117125155

@azasypkin azasypkin added Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! Feature:Security/Authentication Platform Security - Authentication Feature:New Feature New feature not correlating to an existing feature label labels Jul 8, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security (Team:Security)

@legrego
Copy link
Member

legrego commented Jul 12, 2021

The first option sounds more holistic to me, but I think it'd be worth drawing a diagram first to make sure we understand what the flow would look like, and how we would handle all possible outcomes at each step of the request lifecycle.

@azasypkin
Copy link
Member Author

but I think it'd be worth drawing a diagram first to make sure we understand what the flow would look like, and how we would handle all possible outcomes at each step of the request lifecycle.

Yep, it's a good point, there can be a plenty of different "sub-flows" and outcomes.

@azasypkin
Copy link
Member Author

azasypkin commented Aug 4, 2021

Even though the issue isn't new, we see more and more customers are hit by it and it may escalate pretty quickly, especially since Kibana in Cloud relies on token-based authentication by default (via SAML). That's why we'd like to investigate possible solutions and understand required effort as early as possible.

I met with @mshustov earlier today to discuss the options we have. Even though the first proposal from #104893 (comment) is definitely not the simplest change, it still sounds like the most reasonable one: introduce a new extension point in the Core that would allow security plugin to try to handle 401 errors thrown by the Elasticsearch client before we return them to the consumers (plugins).

At this point it's not completely clear what the ideal API would look like, but I'll try to outline what we know so far:

When should we try to handle 401 errors?

  • Only when "scoped" Elasticsearch client is used. Elasticsearch client that acts on behalf of the internal Kibana user doesn't require any special handling.
  • Only when request to Elasticsearch is attributed with Authorization header provided by the AuthenticationHandler (the one registered via registerAuth hook). If consumers override client's Authorization header with FakeRequest and alike we shouldn't interfere.
  • Only when request isn't made by the Security plugin itself (otherwise we can get into infinite recursion 🙂 ). This will be probably solved automatically by the solution for the previous point since we always override Authorization header in security plugin. But if not, maybe we'd need a special TransportRequestOptions option?
  • Only when authentication failure handler is registered (see details in the next section).

What functionality would Security plugin need from Core to handle 401 errors?

  • Security plugin would need an extension point it can use to "subscribe" to 401 errors in the scoped Elasticsearch client.
  • Security plugin should get access to the original user request that Elasticsearch client was scoped to. We need to provide this request to the Core's CookieSessionStorageFactory we receive at the setup stage to retrieve the session associated with the user's request.
  • Security plugin should get access to the authentication error itself, it would help us to understand if it makes sense to try to re-authneticate request.

In semi-pseudo-code it might look like something like this:

core.xxxxxx.onAuthenticationFailure(async (error, request, toolkit) => {
  const reauthenticationResult = await authenticator.reauthenticate(request, error);
  if (reauthenticationResult.succeeded()) {
    // This updates Core's internal authentication state, the authentication
    // headers it will attach to every subsequent scoped request to Elasticsearch
    // and potentially authentication response headers.
    return toolkit.authenticated({
      state: reauthenticationResult.user,
      requestHeaders: reauthenticationResult.authHeaders,
      responseHeaders: reauthenticationResult.authResponseHeaders,
    });
  }

  // Doesn't feel like we need `customError`/`rejected`. If re-authentication
  // cannot be performed, we just return original `401` error to the consumer.
  // We also don't need to handle cases that require redirects.
  return toolkit.notHandled();
});

What Core is supposed to do to handle 401 errors?

At this point we think that Core might need to intercept 401 errors thrown by the scoped Elasticsearch client assuming all conditions from When should we try to handle 401 errors? and invoke authentication failure handler. Authentication failure handler will let Core know if it could re-authenticate request or not. If yes, then Core can update its internal authentication state and re-try request with the updated credentials. But If re-authentication attempt fails or cannot be performed, then Core can just return original 401 error to the consumer.

Unresolved questions

  • [Resolved] I haven't validated if this approach would work for Kerberos yet, but going to figure out We use refresh tokens for Kerberos that are valid for 24 hours. That means that the chance that we need to perform SPNEGO in the middle of the request handling flow is relatively small, and we might want to require "re-login" in this case to keep the solution simple.
  • It's not completely clear yet if it'd be possible to intercept Elasticsearch client's errors transparently for the consumers. I played a bit with the recently added KibanaTransport that serves as a custom Transport for all our Elasticsearch client instances and it worked well in my very basic test, but it'd definitely require more thorough research to know if it's feasible extension point.

@exalate-issue-sync exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Aug 4, 2021
@jportner jportner added bug Fixes for quality problems that affect the customer experience and removed Feature:New Feature New feature not correlating to an existing feature label labels Sep 29, 2021
@exalate-issue-sync exalate-issue-sync bot added loe:medium Medium Level of Effort and removed loe:small Small Level of Effort labels Sep 29, 2021
@legrego legrego added impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. labels Oct 5, 2021
@exalate-issue-sync exalate-issue-sync bot added loe:small Small Level of Effort and removed loe:medium Medium Level of Effort labels Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Security/Authentication Platform Security - Authentication impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:small Small Level of Effort research Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants