Hash token values for storage #41792

jkakavas · 2019-05-03T13:12:54Z

This commit changes how access tokens and refresh tokens are stored
in the tokens index and is a followup to #39631

Access token values are now hashed before being stored in the id
field of the user_token and before becoming part of the token
document id. Refresh token values are hashed before being stored
in the token field of the refresh_token. The tokens are hashed
without a salt value since these are v4 UUID values that have
enough entropy themselves. Both rainbow table attacks and offline
brute-force attacks are impractical.

As a side effect of this change and in order to support multiple
concurrent refreshes as introduced in #39631, upon refreshing an
<access token, refresh token> pair, the superseding access token
and refresh tokens values are stored in the superseded token doc,
encrypted with a key that is derived from the superseded refresh
token. As such, subsequent requests to refresh the same token in
the predefined time window will return the same superseding access
token and refresh token values, without hitting the tokens index
(as this only stores hashes of the token values). AES in GCM
mode is used for encrypting the token values and the key
derivation from the superseded refresh token uses a small number
of iterations as it needs to be quick.

For backwards compatibility reasons, the new behavior is only
enabled when all nodes in a cluster are in the required version
so that old nodes can cope with the token values in a mixed
cluster during a rolling upgrade.

Resolves #40765

This commit changes how access tokens and refresh tokens are stored in the tokens index. Access token values are now hashed before being stored in the id field of the `user_token` and before becoming part of the token document id. Refresh token values are hashed before being stored in the token field of the `refresh_token`. The tokens are hashed without a salt value since these are v4 UUID values that have enough entropy themselves. Both rainbow table attacks and offline bruteforce attacks are impractical. As a side effect of this change and in order to support multiple concurrent refreshes as introduced in elastic#39631, upon refreshing an <access token, refresh token> pair, the superseding access token and refresh tokens values are stored in the superseded token doc, encrypted with a key that is derived from the superseded refresh token. As such, subsequent requests to refresh the same token in the predefined time window will return the same superseding access token and refresh token values, without hitting the tokens index (as this only stores hashes of the token values). AES in GCM mode is used for encrypting the token values and the key derivation from the superseded refresh token uses a small number of iterations as it needs to be quick. For backwards compatibility reasons, the new behavior is only enabled when all nodes in a cluster are in the required version so that old nodes can cope with the token values in a mixed cluster during a rolling upgrade.

elasticmachine · 2019-05-03T13:12:56Z

Pinging @elastic/es-security

x-pack/plugin/core/src/main/resources/security-index-template-7.json

jkakavas · 2019-05-03T13:17:36Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

@@ -157,11 +158,12 @@
     * Cheat Sheet</a> and the <a href="https://pages.nist.gov/800-63-3/sp800-63b.html#sec5">
     * NIST Digital Identity Guidelines</a>
     */
-    private static final int ITERATIONS = 100000;
+    static final int TOKEN_SERVICE_KEY_ITERATIONS = 100000;
+    static final int TOKENS_ENCRYPTION_KEY_ITERATIONS = 1024;


The key is derived from a random sting, we don't need too many iterations, we want this to be quick and not computationally expensive.

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

jkakavas · 2019-05-03T13:21:46Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

            } else {
                // The token was created in a < VERSION_ACCESS_TOKENS_UUIDS cluster so we need to decrypt it to get the tokenId
-                if (in.available() < MINIMUM_BASE64_BYTES) {
-                    logger.debug("invalid token, smaller than [{}] bytes", MINIMUM_BASE64_BYTES);
+                if (in.available() < LEGACY_MINIMUM_BYTES) {


This is base64 decoded already so we don't need to check MINIMUM_BASE64_BYTES

jkakavas · 2019-05-03T13:25:36Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

+        // In AES GCM we cannot reuse the same IV. We predictably generate the IV for the second decryption instead of
+        // storing an extra field, since it doesn't have to be unpredictable, just not reused with the same key.
+        byte[] iv2 = new byte[iv.length];
+        System.arraycopy(iv, iv.length / 2, iv2, 0, iv.length / 2);


This has no particular cryptographic use, we just need a predictable way to get a new IV for the second encryption,

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

albertzaharovits

Overall it looks like a simplification, axing the superseded_by reference, which is very welcome!

I see that in addition to the hashed token we have to store it encrypted format so that we can return the plaintext on concurrent refreshes. I think that's alright as well.

I would like to propose another simplification, that I believe will lessen the implementation complexity, even though it will require another iteration. I am thinking of completely removing the refresh_token and use the access_token as a refresh token. That is, removing the refresh_token in the token document, the response will include a refresh_token field, but getting the token doc from the refresh_token would use the same hash and get_by_id . I think this way we avoid storing two pairs of hashes and encrypted formats, and use only one.

What do you think? Do you think it would help make this PR leaner?

albertzaharovits · 2019-05-05T19:55:06Z

x-pack/plugin/core/src/main/resources/security-index-template-7.json

+              "type": "binary"
+            },
+            "superseding_encryption_salt": {
+              "type": "binary"


I would move superseding_encryption_iv and superseding_encryption_salt as data in the refresh token, the same way the standalone access_tokens were encrypted before.
The idea is to minimize the number of mapped fields, and also "hide" the implementation details.

albertzaharovits · 2019-05-05T20:05:45Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

-                if (false == refreshTokenVersion.onOrAfter(VERSION_TOKENS_INDEX_INTRODUCED)
-                        || unencodedRefreshToken.length() != TOKEN_ID_LENGTH) {
-                    logger.debug("Decoded refresh token [{}] with version [{}] is invalid.", unencodedRefreshToken, refreshTokenVersion);
+            if (refreshToken.length() == HASHED_TOKEN_LENGTH) {


When is this if branch considered?

This is considered during IDP initiated SAML logout when we need to find relevant tokens based on the token metadata ( that contain the SAML NameID ) and then invalidate those explicitly. See https://github.com/elastic/elasticsearch/blob/9beb31fd3c5a8323cb08cc524f1a2268e9c72c24/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/action/saml/TransportSamlInvalidateSessionAction.java

jkakavas · 2019-05-06T06:44:36Z

Thanks for the 👀 @albertzaharovits

I would like to propose another simplification, that I believe will lessen the implementation complexity, even though it will require another iteration. I am thinking of completely removing the refresh_token and use the access_token as a refresh token. That is, removing the refresh_token in the token document, the response will include a refresh_token field, but getting the token doc from the refresh_token would use the same hash and get_by_id . I think this way we avoid storing two pairs of hashes and encrypted formats, and use only one.

Could you give it another try to explain the above for my benefit ? I'm not sure I can follow this

jkakavas · 2019-05-07T05:07:16Z

Thanks for the eyes @albertzaharovits

I would like to propose another simplification, that I believe will lessen the implementation complexity, even though it will require another iteration. I am thinking of completely removing the refresh_token and use the access_token as a refresh token. That is, removing the refresh_token in the token document, the response will include a refresh_token field, but getting the token doc from the refresh_token would use the same hash and get_by_id . I think this way we avoid storing two pairs of hashes and encrypted formats, and use only one.

Could you give it another try to explain the above for my benefit ? I'm not sure I can follow this

We discussed this in person with @albertzaharovits and concluded that

There is no benefit in removing the refresh token and it will potentially differentiate us too much from oAuth2 spec
A superseding object (not indexed) will be introduced in the mapping to hold all extra fields
Will attempt to minimize the number of newly introduced fields ( i.e. concatenate superseding_access_token and superseding_refresh_token to one field before encryption and split values after decryption )

tvernum

I haven't reviewed the tests yet, but since you're planning a couple of changes I'll submit my main comments now.

I'm starting to get concerned about the number of places we're passing around Tuple<String,String> and it doesn't always mean the same thing.
Sometimes the 2nd string is a plain refresh token, sometimes it's a hashed refresh token. Maybe there's other combinations, it's hard to tell.
I don't have a preferred solution, but I think types are important, and the code is getting harder to follow because we're not clearly defining those types.

x-pack/plugin/core/src/main/resources/security-index-template-7.json

...a/org/elasticsearch/xpack/security/action/oidc/TransportOpenIdConnectAuthenticateAction.java

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

tvernum · 2019-05-07T05:22:02Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

                                    } else {
                                        // prior versions are not version-prepended, as nodes on those versions don't expect it.
                                        // Such nodes might exist in a mixed cluster during a rolling upgrade.
-                                        listener.onResponse(new Tuple<>(userToken, plainRefreshToken));
+                                        listener.onResponse(new Tuple<>(versionedAccessToken, plainRefreshToken));


Is this change correct? The comment implies it should not be versioned.

Yes, this is correct. The comment refers to the refresh token (.. too many versions and possibilities to keep track of ). I've updated the comment

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

- Move superseding encrypted token data in a separate object - Concatenate tokens before encryption and split after decryption so that we only have 1 crypto operation and one less field in the mapping - Add javadoc

…cuments

jkakavas · 2019-05-10T10:52:29Z

I'm starting to get concerned about the number of places we're passing around Tuple<String,String> and it doesn't always mean the same thing.

All instances used are Tuples containing the serialized access token and the serialized refresh token, as these will be returned to the caller of our APIs. I've updated the javadoc where applicable to denote this.

jkakavas · 2019-05-16T16:06:10Z

@tvernum this is ready for a review round

tvernum · 2019-05-20T03:27:02Z

All instances used are Tuples containing the serialized access token and the serialized refresh token,

In line 1420 we return a hashedRefreshToken, but I think the other cases are all unhashed (but versioned, unless in BWC mode).

tvernum

LGTM, with a few minor nits.

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

.../plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/TokenServiceTests.java

This commit changes how access tokens and refresh tokens are stored in the tokens index. Access token values are now hashed before being stored in the id field of the `user_token` and before becoming part of the token document id. Refresh token values are hashed before being stored in the token field of the `refresh_token`. The tokens are hashed without a salt value since these are v4 UUID values that have enough entropy themselves. Both rainbow table attacks and offline brute force attacks are impractical. As a side effect of this change and in order to support multiple concurrent refreshes as introduced in elastic#39631, upon refreshing an <access token, refresh token> pair, the superseding access token and refresh tokens values are stored in the superseded token doc, encrypted with a key that is derived from the superseded refresh token. As such, subsequent requests to refresh the same token in the predefined time window will return the same superseding access token and refresh token values, without hitting the tokens index (as this only stores hashes of the token values). AES in GCM mode is used for encrypting the token values and the key derivation from the superseded refresh token uses a small number of iterations as it needs to be quick. For backwards compatibility reasons, the new behavior is only enabled when all nodes in a cluster are in the required version so that old nodes can cope with the token values in a mixed cluster during a rolling upgrade.

This commit changes how access tokens and refresh tokens are stored in the tokens index. Access token values are now hashed before being stored in the id field of the `user_token` and before becoming part of the token document id. Refresh token values are hashed before being stored in the token field of the `refresh_token`. The tokens are hashed without a salt value since these are v4 UUID values that have enough entropy themselves. Both rainbow table attacks and offline brute force attacks are impractical. As a side effect of this change and in order to support multiple concurrent refreshes as introduced in #39631, upon refreshing an <access token, refresh token> pair, the superseding access token and refresh tokens values are stored in the superseded token doc, encrypted with a key that is derived from the superseded refresh token. As such, subsequent requests to refresh the same token in the predefined time window will return the same superseding access token and refresh token values, without hitting the tokens index (as this only stores hashes of the token values). AES in GCM mode is used for encrypting the token values and the key derivation from the superseded refresh token uses a small number of iterations as it needs to be quick. For backwards compatibility reasons, the new behavior is only enabled when all nodes in a cluster are in the required version so that old nodes can cope with the token values in a mixed cluster during a rolling upgrade.

some tests are failing after the introduction of elastic#41792. relates elastic#42267 and elastic#42289.

some tests are failing after the introduction of #41792. relates #42267 and #42289.

This commit changes how access tokens and refresh tokens are stored in the tokens index. Access token values are now hashed before being stored in the id field of the `user_token` and before becoming part of the token document id. Refresh token values are hashed before being stored in the token field of the `refresh_token`. The tokens are hashed without a salt value since these are v4 UUID values that have enough entropy themselves. Both rainbow table attacks and offline brute force attacks are impractical. As a side effect of this change and in order to support multiple concurrent refreshes as introduced in elastic#39631, upon refreshing an <access token, refresh token> pair, the superseding access token and refresh tokens values are stored in the superseded token doc, encrypted with a key that is derived from the superseded refresh token. As such, subsequent requests to refresh the same token in the predefined time window will return the same superseding access token and refresh token values, without hitting the tokens index (as this only stores hashes of the token values). AES in GCM mode is used for encrypting the token values and the key derivation from the superseded refresh token uses a small number of iterations as it needs to be quick. For backwards compatibility reasons, the new behavior is only enabled when all nodes in a cluster are in the required version so that old nodes can cope with the token values in a mixed cluster during a rolling upgrade.

some tests are failing after the introduction of elastic#41792. relates elastic#42267 and elastic#42289.

jkakavas added 3 commits May 3, 2019 13:40

Merge remote-tracking branch 'origin/master' into hash-tokens-storage

acaa14a

fix bug and tests

aa00d4a

jkakavas added >enhancement :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v8.0.0 v7.2.0 labels May 3, 2019

jkakavas requested review from tvernum and albertzaharovits May 3, 2019 13:12

jkakavas commented May 3, 2019

View reviewed changes

x-pack/plugin/core/src/main/resources/security-index-template-7.json Outdated Show resolved Hide resolved

jkakavas commented May 3, 2019

View reviewed changes

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java Outdated Show resolved Hide resolved

jkakavas commented May 3, 2019

View reviewed changes

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java Outdated Show resolved Hide resolved

adjust a couple more tests

7ca60c5

albertzaharovits reviewed May 5, 2019

View reviewed changes

tvernum reviewed May 7, 2019

View reviewed changes

jkakavas added 6 commits May 9, 2019 10:08

address feedback

373cb3a

Address feedback

263bdd5

- Move superseding encrypted token data in a separate object - Concatenate tokens before encryption and split after decryption so that we only have 1 crypto operation and one less field in the mapping - Add javadoc

fix BWC tests

41400e0

Merge remote-tracking branch 'origin/master' into hash-tokens-storage

abfc096

re-enable handle of hashed refresh tokens when searching for token do…

449660a

…cuments

Checkstyle violations

f2459d3

jkakavas requested a review from tvernum May 10, 2019 10:52

jkakavas requested a review from albertzaharovits May 10, 2019 10:52

Merge remote-tracking branch 'origin/master' into hash-tokens-storage

adab59d

tvernum approved these changes May 20, 2019

View reviewed changes

address review

8454d57

jkakavas merged commit 307bc17 into elastic:master May 20, 2019

jkakavas added the backport pending label May 20, 2019

jkakavas mentioned this pull request May 20, 2019

[BACKPORT] Hash token values for storage (#41792) #42220

Merged

jkakavas removed the backport pending label May 20, 2019

talevy added a commit to talevy/elasticsearch that referenced this pull request May 21, 2019

mute failing filerealm hash caching tests

2c7a045

some tests are failing after the introduction of elastic#41792. relates elastic#42267 and elastic#42289.

talevy mentioned this pull request May 21, 2019

mute failing filerealm hash caching tests due to sha256 #42304

Merged

talevy added a commit that referenced this pull request May 21, 2019

mute failing filerealm hash caching tests (#42304)

8907dc9

some tests are failing after the introduction of #41792. relates #42267 and #42289.

talevy added a commit that referenced this pull request May 21, 2019

mute failing filerealm hash caching tests (#42304)

39fbed1

some tests are failing after the introduction of #41792. relates #42267 and #42289.

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019

mute failing filerealm hash caching tests (elastic#42304)

e59d8e2

some tests are failing after the introduction of elastic#41792. relates elastic#42267 and elastic#42289.

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hash token values for storage #41792

Hash token values for storage #41792

jkakavas commented May 3, 2019

elasticmachine commented May 3, 2019

jkakavas May 3, 2019

jkakavas May 3, 2019

jkakavas May 3, 2019 •

edited

Loading

albertzaharovits left a comment

albertzaharovits May 5, 2019

albertzaharovits May 5, 2019

jkakavas May 10, 2019

jkakavas commented May 6, 2019

jkakavas commented May 7, 2019

tvernum left a comment

tvernum May 7, 2019

jkakavas May 9, 2019

jkakavas commented May 10, 2019

jkakavas commented May 16, 2019

tvernum commented May 20, 2019

tvernum left a comment

Hash token values for storage #41792

Hash token values for storage #41792

Conversation

jkakavas commented May 3, 2019

elasticmachine commented May 3, 2019

jkakavas May 3, 2019

Choose a reason for hiding this comment

jkakavas May 3, 2019

Choose a reason for hiding this comment

jkakavas May 3, 2019 • edited Loading

Choose a reason for hiding this comment

albertzaharovits left a comment

Choose a reason for hiding this comment

albertzaharovits May 5, 2019

Choose a reason for hiding this comment

albertzaharovits May 5, 2019

Choose a reason for hiding this comment

jkakavas May 10, 2019

Choose a reason for hiding this comment

jkakavas commented May 6, 2019

jkakavas commented May 7, 2019

tvernum left a comment

Choose a reason for hiding this comment

tvernum May 7, 2019

Choose a reason for hiding this comment

jkakavas May 9, 2019

Choose a reason for hiding this comment

jkakavas commented May 10, 2019

jkakavas commented May 16, 2019

tvernum commented May 20, 2019

tvernum left a comment

Choose a reason for hiding this comment

jkakavas May 3, 2019 •

edited

Loading