-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash token values for storage #41792
Conversation
This commit changes how access tokens and refresh tokens are stored in the tokens index. Access token values are now hashed before being stored in the id field of the `user_token` and before becoming part of the token document id. Refresh token values are hashed before being stored in the token field of the `refresh_token`. The tokens are hashed without a salt value since these are v4 UUID values that have enough entropy themselves. Both rainbow table attacks and offline bruteforce attacks are impractical. As a side effect of this change and in order to support multiple concurrent refreshes as introduced in elastic#39631, upon refreshing an <access token, refresh token> pair, the superseding access token and refresh tokens values are stored in the superseded token doc, encrypted with a key that is derived from the superseded refresh token. As such, subsequent requests to refresh the same token in the predefined time window will return the same superseding access token and refresh token values, without hitting the tokens index (as this only stores hashes of the token values). AES in GCM mode is used for encrypting the token values and the key derivation from the superseded refresh token uses a small number of iterations as it needs to be quick. For backwards compatibility reasons, the new behavior is only enabled when all nodes in a cluster are in the required version so that old nodes can cope with the token values in a mixed cluster during a rolling upgrade.
Pinging @elastic/es-security |
x-pack/plugin/core/src/main/resources/security-index-template-7.json
Outdated
Show resolved
Hide resolved
@@ -157,11 +158,12 @@ | |||
* Cheat Sheet</a> and the <a href="https://pages.nist.gov/800-63-3/sp800-63b.html#sec5"> | |||
* NIST Digital Identity Guidelines</a> | |||
*/ | |||
private static final int ITERATIONS = 100000; | |||
static final int TOKEN_SERVICE_KEY_ITERATIONS = 100000; | |||
static final int TOKENS_ENCRYPTION_KEY_ITERATIONS = 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key is derived from a random sting, we don't need too many iterations, we want this to be quick and not computationally expensive.
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
} else { | ||
// The token was created in a < VERSION_ACCESS_TOKENS_UUIDS cluster so we need to decrypt it to get the tokenId | ||
if (in.available() < MINIMUM_BASE64_BYTES) { | ||
logger.debug("invalid token, smaller than [{}] bytes", MINIMUM_BASE64_BYTES); | ||
if (in.available() < LEGACY_MINIMUM_BYTES) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is base64 decoded already so we don't need to check MINIMUM_BASE64_BYTES
// In AES GCM we cannot reuse the same IV. We predictably generate the IV for the second decryption instead of | ||
// storing an extra field, since it doesn't have to be unpredictable, just not reused with the same key. | ||
byte[] iv2 = new byte[iv.length]; | ||
System.arraycopy(iv, iv.length / 2, iv2, 0, iv.length / 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has no particular cryptographic use, we just need a predictable way to get a new IV for the second encryption,
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks like a simplification, axing the superseded_by
reference, which is very welcome!
I see that in addition to the hashed token we have to store it encrypted format so that we can return the plaintext on concurrent refreshes. I think that's alright as well.
I would like to propose another simplification, that I believe will lessen the implementation complexity, even though it will require another iteration. I am thinking of completely removing the refresh_token
and use the access_token
as a refresh token. That is, removing the refresh_token
in the token document, the response will include a refresh_token
field, but getting the token doc from the refresh_token
would use the same hash and get_by_id . I think this way we avoid storing two pairs of hashes and encrypted formats, and use only one.
What do you think? Do you think it would help make this PR leaner?
"type": "binary" | ||
}, | ||
"superseding_encryption_salt": { | ||
"type": "binary" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move superseding_encryption_iv
and superseding_encryption_salt
as data in the refresh token, the same way the standalone access_tokens
were encrypted before.
The idea is to minimize the number of mapped fields, and also "hide" the implementation details.
if (false == refreshTokenVersion.onOrAfter(VERSION_TOKENS_INDEX_INTRODUCED) | ||
|| unencodedRefreshToken.length() != TOKEN_ID_LENGTH) { | ||
logger.debug("Decoded refresh token [{}] with version [{}] is invalid.", unencodedRefreshToken, refreshTokenVersion); | ||
if (refreshToken.length() == HASHED_TOKEN_LENGTH) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this if branch considered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is considered during IDP initiated SAML logout when we need to find relevant tokens based on the token metadata ( that contain the SAML NameID ) and then invalidate those explicitly. See https://github.com/elastic/elasticsearch/blob/9beb31fd3c5a8323cb08cc524f1a2268e9c72c24/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/action/saml/TransportSamlInvalidateSessionAction.java
Thanks for the 👀 @albertzaharovits
Could you give it another try to explain the above for my benefit ? I'm not sure I can follow this |
We discussed this in person with @albertzaharovits and concluded that
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the tests yet, but since you're planning a couple of changes I'll submit my main comments now.
I'm starting to get concerned about the number of places we're passing around Tuple<String,String>
and it doesn't always mean the same thing.
Sometimes the 2nd string is a plain refresh token, sometimes it's a hashed refresh token. Maybe there's other combinations, it's hard to tell.
I don't have a preferred solution, but I think types are important, and the code is getting harder to follow because we're not clearly defining those types.
x-pack/plugin/core/src/main/resources/security-index-template-7.json
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/security/action/oidc/TransportOpenIdConnectAuthenticateAction.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
} else { | ||
// prior versions are not version-prepended, as nodes on those versions don't expect it. | ||
// Such nodes might exist in a mixed cluster during a rolling upgrade. | ||
listener.onResponse(new Tuple<>(userToken, plainRefreshToken)); | ||
listener.onResponse(new Tuple<>(versionedAccessToken, plainRefreshToken)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change correct? The comment implies it should not be versioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is correct. The comment refers to the refresh token (.. too many versions and possibilities to keep track of ). I've updated the comment
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
- Move superseding encrypted token data in a separate object - Concatenate tokens before encryption and split after decryption so that we only have 1 crypto operation and one less field in the mapping - Add javadoc
All instances used are Tuples containing the serialized access token and the serialized refresh token, as these will be returned to the caller of our APIs. I've updated the javadoc where applicable to denote this. |
@tvernum this is ready for a review round |
In line 1420 we return a hashedRefreshToken, but I think the other cases are all unhashed (but versioned, unless in BWC mode). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a few minor nits.
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
.../plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/TokenServiceTests.java
Outdated
Show resolved
Hide resolved
.../plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/TokenServiceTests.java
Outdated
Show resolved
Hide resolved
.../plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/TokenServiceTests.java
Outdated
Show resolved
Hide resolved
.../plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/TokenServiceTests.java
Outdated
Show resolved
Hide resolved
.../plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/TokenServiceTests.java
Outdated
Show resolved
Hide resolved
.../plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/TokenServiceTests.java
Outdated
Show resolved
Hide resolved
This commit changes how access tokens and refresh tokens are stored in the tokens index. Access token values are now hashed before being stored in the id field of the `user_token` and before becoming part of the token document id. Refresh token values are hashed before being stored in the token field of the `refresh_token`. The tokens are hashed without a salt value since these are v4 UUID values that have enough entropy themselves. Both rainbow table attacks and offline brute force attacks are impractical. As a side effect of this change and in order to support multiple concurrent refreshes as introduced in elastic#39631, upon refreshing an <access token, refresh token> pair, the superseding access token and refresh tokens values are stored in the superseded token doc, encrypted with a key that is derived from the superseded refresh token. As such, subsequent requests to refresh the same token in the predefined time window will return the same superseding access token and refresh token values, without hitting the tokens index (as this only stores hashes of the token values). AES in GCM mode is used for encrypting the token values and the key derivation from the superseded refresh token uses a small number of iterations as it needs to be quick. For backwards compatibility reasons, the new behavior is only enabled when all nodes in a cluster are in the required version so that old nodes can cope with the token values in a mixed cluster during a rolling upgrade.
This commit changes how access tokens and refresh tokens are stored in the tokens index. Access token values are now hashed before being stored in the id field of the `user_token` and before becoming part of the token document id. Refresh token values are hashed before being stored in the token field of the `refresh_token`. The tokens are hashed without a salt value since these are v4 UUID values that have enough entropy themselves. Both rainbow table attacks and offline brute force attacks are impractical. As a side effect of this change and in order to support multiple concurrent refreshes as introduced in #39631, upon refreshing an <access token, refresh token> pair, the superseding access token and refresh tokens values are stored in the superseded token doc, encrypted with a key that is derived from the superseded refresh token. As such, subsequent requests to refresh the same token in the predefined time window will return the same superseding access token and refresh token values, without hitting the tokens index (as this only stores hashes of the token values). AES in GCM mode is used for encrypting the token values and the key derivation from the superseded refresh token uses a small number of iterations as it needs to be quick. For backwards compatibility reasons, the new behavior is only enabled when all nodes in a cluster are in the required version so that old nodes can cope with the token values in a mixed cluster during a rolling upgrade.
some tests are failing after the introduction of elastic#41792. relates elastic#42267 and elastic#42289.
This commit changes how access tokens and refresh tokens are stored in the tokens index. Access token values are now hashed before being stored in the id field of the `user_token` and before becoming part of the token document id. Refresh token values are hashed before being stored in the token field of the `refresh_token`. The tokens are hashed without a salt value since these are v4 UUID values that have enough entropy themselves. Both rainbow table attacks and offline brute force attacks are impractical. As a side effect of this change and in order to support multiple concurrent refreshes as introduced in elastic#39631, upon refreshing an <access token, refresh token> pair, the superseding access token and refresh tokens values are stored in the superseded token doc, encrypted with a key that is derived from the superseded refresh token. As such, subsequent requests to refresh the same token in the predefined time window will return the same superseding access token and refresh token values, without hitting the tokens index (as this only stores hashes of the token values). AES in GCM mode is used for encrypting the token values and the key derivation from the superseded refresh token uses a small number of iterations as it needs to be quick. For backwards compatibility reasons, the new behavior is only enabled when all nodes in a cluster are in the required version so that old nodes can cope with the token values in a mixed cluster during a rolling upgrade.
some tests are failing after the introduction of elastic#41792. relates elastic#42267 and elastic#42289.
This commit changes how access tokens and refresh tokens are stored
in the tokens index and is a followup to #39631
Access token values are now hashed before being stored in the id
field of the
user_token
and before becoming part of the tokendocument id. Refresh token values are hashed before being stored
in the token field of the
refresh_token
. The tokens are hashedwithout a salt value since these are v4 UUID values that have
enough entropy themselves. Both rainbow table attacks and offline
brute-force attacks are impractical.
As a side effect of this change and in order to support multiple
concurrent refreshes as introduced in #39631, upon refreshing an
<access token, refresh token> pair, the superseding access token
and refresh tokens values are stored in the superseded token doc,
encrypted with a key that is derived from the superseded refresh
token. As such, subsequent requests to refresh the same token in
the predefined time window will return the same superseding access
token and refresh token values, without hitting the tokens index
(as this only stores hashes of the token values). AES in GCM
mode is used for encrypting the token values and the key
derivation from the superseded refresh token uses a small number
of iterations as it needs to be quick.
For backwards compatibility reasons, the new behavior is only
enabled when all nodes in a cluster are in the required version
so that old nodes can cope with the token values in a mixed
cluster during a rolling upgrade.
Resolves #40765