Limit user to single concurrent auth per realm #30794

jaymode · 2018-05-22T19:29:18Z

This commit reworks the way our realms perform caching in order to
limit each principal to a single ongoing authentication per realm. In
other words, this means that multiple requests made by the same user
will not trigger more that one authentication attempt at a time if no
entry has been stored in the cache. If an entry is present in our
cache, there is no restriction on the number of concurrent
authentications performed for this user.

This change enables us to limit the load we place on an external system
like an LDAP server and also preserve resources such as CPU on
expensive operations such as BCrypt authentication.

Closes #30355

This commit reworks the way our realms perform caching in order to limit each principal to a single ongoing authentication per realm. In other words, this means that multiple requests made by the same user will not trigger more that one authentication attempt at a time if no entry has been stored in the cache. If an entry is present in our cache, there is no restriction on the number of concurrent authentications performed for this user. This change enables us to limit the load we place on an external system like an LDAP server and also preserve resources such as CPU on expensive operations such as BCrypt authentication. Closes elastic#30355

elasticmachine · 2018-05-22T19:29:19Z

Pinging @elastic/es-security

tvernum

I like the approach, but I have a suggested change.

tvernum · 2018-05-23T05:30:56Z

server/src/main/java/org/elasticsearch/common/util/concurrent/ListenableFuture.java

+        listeners.clear();
+    }
+
+    private void notifyListener(Tuple<ActionListener<V>, ExecutorService> tuple) {


It feels weird for this to take a tuple rather than 2 arguments (since sometimes we're constructing the Tuple just to pass as an argument and then be decomposed again)

tvernum · 2018-05-23T05:33:08Z

server/src/main/java/org/elasticsearch/common/util/concurrent/ListenableFuture.java

+    private volatile boolean done = false;
+    private final List<Tuple<ActionListener<V>, ExecutorService>> listeners = new ArrayList<>();
+
+    public void addListener(ActionListener<V> listener, ExecutorService executor) {


I think this needs javadoc - if only to point to the explanation in the class javadoc.

tvernum · 2018-05-23T07:02:04Z

...c/main/java/org/elasticsearch/xpack/security/authc/support/CachingUsernamePasswordRealm.java

+            cache.invalidate(token.principal(), future);
+            authenticateWithCache(token, listener);
+        } else if (authResult.isAuthenticated()) {
+            if (createdAndStartedFuture) {


The use of createdAndStartedFuture here worries me.
We need to handle the two cases (directly authenticated vs loaded from cache) differently, but this variable is a couple of steps removed from that and I fear that something like a race condition somewhere in the cache could mean that this is true, but the user was not actually authenticated with the credentials from token.

Perhaps we could set an AtomicReference to User in the listener for doAuthenticate and check whether the User being pulled from the cache matches. Then were no longer testing "did we create the future" but "did we actually authenticate this User object".

e.g.

doAuthenticate(token, ActionListener.wrap(result -> { if (result.isAuthenticated()) { final User user = result.getUser(); authenticatedUser.set(user);

final boolean wasAuthenticatedWithThisToken = tuple.v2() == authenticatedUser.get(); handleResult(future, wasAuthenticatedWithThisToken, token, tuple, listener);

jaymode · 2018-05-23T15:58:28Z

@tvernum thanks for the suggestion. I implemented the suggestion slightly differently. Let me know what you think.

tvernum

LGTM

This commit reworks the way our realms perform caching in order to limit each principal to a single ongoing authentication per realm. In other words, this means that multiple requests made by the same user will not trigger more that one authentication attempt at a time if no entry has been stored in the cache. If an entry is present in our cache, there is no restriction on the number of concurrent authentications performed for this user. This change enables us to limit the load we place on an external system like an LDAP server and also preserve resources such as CPU on expensive operations such as BCrypt authentication. Closes #30355

* master: Update the version checks around ip_range bucket keys, now that the change was backported. Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges Use geohash cell instead of just a corner in geo_bounding_box (elastic#30698) Limit user to single concurrent auth per realm (elastic#30794) [Tests] Move templated _rank_eval tests (elastic#30679) Security: fix dynamic mapping updates with aliases (elastic#30787) Ensure that ip_range aggregations always return bucket keys. (elastic#30701) Use remote client in TransportFieldCapsAction (elastic#30838) Move Watcher versioning setting to meta field (elastic#30832) [Docs] Explain incomplete dates in range queries (elastic#30689) Move persistent task registrations to core (elastic#30755) Decouple ClusterStateTaskListener & ClusterApplier (elastic#30809) Send client headers from TransportClient (elastic#30803) Packaging: Ensure upgrade_is_oss flag file is always deleted (elastic#30732) Force stable file modes for built packages (elastic#30823) [DOCS] Fixes typos in security settings Fix GeoShapeQueryBuilder serialization after backport

* es/master: Move score script context from SearchScript to its own class (#30816) Fix bad version check writing Repository nodes (#30846) [docs] explainer for java packaging tests (#30825) Remove Throwable usage from transport modules (#30845) REST high-level client: add put ingest pipeline API (#30793) Update the version checks around ip_range bucket keys, now that the change was backported. Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges Use geohash cell instead of just a corner in geo_bounding_box (#30698) Limit user to single concurrent auth per realm (#30794) [Tests] Move templated _rank_eval tests (#30679) Security: fix dynamic mapping updates with aliases (#30787) Ensure that ip_range aggregations always return bucket keys. (#30701) Use remote client in TransportFieldCapsAction (#30838) Move Watcher versioning setting to meta field (#30832) [Docs] Explain incomplete dates in range queries (#30689) Move persistent task registrations to core (#30755) Decouple ClusterStateTaskListener & ClusterApplier (#30809) Send client headers from TransportClient (#30803) Packaging: Ensure upgrade_is_oss flag file is always deleted (#30732) Force stable file modes for built packages (#30823)

* es/6.x: Move score script context from SearchScript to its own class (#30816) Fix bad version check writing Repository nodes (#30846) Modify state of VerifyRepositoryResponse for bwc (#30762) QA: Fix tribe tests when running default zip Use remote client in TransportFieldCapsAction (#30838) Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges Ensure that ip_range aggregations always return bucket keys. (#30701) Limit user to single concurrent auth per realm (#30794) Security: fix dynamic mapping updates with aliases (#30787) [Tests] Move templated _rank_eval tests (#30679) Move Watcher versioning setting to meta field (#30832) Restore "Add more yaml tests for get alias API " (#30814) Send client headers from TransportClient (#30803) [Docs] Explain incomplete dates in range queries (#30689) Move persistent task registrations to core (#30755) Decouple ClusterStateTaskListener & ClusterApplier (#30809) Packaging: Ensure upgrade_is_oss flag file is always deleted (#30732) Force stable file modes for built packages (#30823)

albertzaharovits · 2018-05-27T17:14:51Z

I have pushed e888467 and 935c6c7 to fix a minor test bug:

diff --git a/server/src/test/java/org/elasticsearch/common/cache/CacheTests.java b/server/src/test/java/org/elasticsearch/common/cache/CacheTests.java
index 3b183cce40b..a2e47e6c60f 100644
--- a/server/src/test/java/org/elasticsearch/common/cache/CacheTests.java
+++ b/server/src/test/java/org/elasticsearch/common/cache/CacheTests.java
@@ -471,7 +471,7 @@ public class CacheTests extends ESTestCase {
                     keys.add(key);
                 } else {
                     // invalidate with incorrect value
-                    cache.invalidate(key, Integer.toString(key * randomIntBetween(2, 10)));
+                    cache.invalidate(key, Integer.toString(key + randomIntBetween(2, 10)));
                 }
             }
         }
@@ -506,7 +506,7 @@ public class CacheTests extends ESTestCase {
                     invalidated.add(i);
                 } else {
                     // invalidate with incorrect value
-                    cache.invalidate(i, Integer.toString(i * randomIntBetween(2, 10)));
+                    cache.invalidate(i, Integer.toString(i + randomIntBetween(2, 10)));
                 }
             }
         }

if i is 0 the value is correct and invalidation is expected.
Sample error:

./gradlew :server:test -Dtests.seed=824E15A422DDC053 -Dtests.class=org.elasticsearch.common.cache.CacheTests -Dtests.method="testNotificationOnInvalidateWithValue { seed=[824E15A422DDC053:5A13A8665810B577]}" -Dtests.security.manager=true -Dtests.locale=es-CU -Dtests.timezone=Arctic/Longyearbyen
FAILURE 0.02s | CacheTests.testNotificationOnInvalidateWithValue { seed=[824E15A422DDC053:5A13A8665810B577]} <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected:<[0, 5155, 4680, 2665, 3114, 4237, 5134, 1138, 2674, 179, 116, 4215, 5656, 154, 1338, 2236, 2941, 5885, 2974, 4414]> but was:<[5155, 4680, 2665, 3114, 4237, 5134, 1138, 2674, 179, 116, 4215, 5656, 154, 1338, 2236, 2941, 5885, 2974, 4414]>
   >    at __randomizedtesting.SeedInfo.seed([824E15A422DDC053:5A13A8665810B577]:0)
   >    at org.elasticsearch.common.cache.CacheTests.testNotificationOnInvalidateWithValue(CacheTests.java:513)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
   >    at java.base/java.lang.Thread.run(Thread.java:844)

* master: silence InstallPluginCommandTests, see #30900 Remove left-over comment Fix double semicolon in import statement [TEST] Fix minor random bug from #30794 Include size of snapshot in snapshot metadata #18543, bwc clean up (#30890) Enabling testing against an external cluster (#30885) Add public key header/footer (#30877) SQL: Remove the last remaining server dependencies from jdbc (#30771) Include size of snapshot in snapshot metadata (#29602) Do not serialize basic license exp in x-pack info (#30848) Change BWC version for VerifyRepositoryResponse (#30796) [DOCS] Document index name limitations (#30826) Harmonize include_defaults tests (#30700)

* 6.x: Fix double semicolon in import statement [TEST] Fix minor random bug from #30794 Enabling testing against an external cluster (#30885) SQL: Remove the last remaining server dependencies from jdbc (#30771) Add public key header/footer (#30877) Include size of snapshot in snapshot metadata (#29602) QA: Test template creation during rolling restart (#30850) REST high-level client: add put ingest pipeline API (#30793) Do not serialize basic license exp in x-pack info (#30848) [docs] explainer for java packaging tests (#30825) Verify signatures on official plugins (#30800) [DOCS] Document index name limitations (#30826) [Docs] Add reindex.remote.whitelist example (#30828)

jaymode · 2018-05-29T13:51:37Z

thanks @albertzaharovits

albertzaharovits · 2018-08-02T04:53:30Z

...c/main/java/org/elasticsearch/xpack/security/authc/support/CachingUsernamePasswordRealm.java

+                if (tuple != null) {
+                    final UserWithHash userWithHash = tuple.v2();
+                    final boolean performedAuthentication = createdAndStartedFuture.get() && userWithHash != null &&
+                        tuple.v2().user == authenticatedUser.get();


@jaymode I think tuple.v2().user == authenticatedUser.get() is redundant (or there's a deeper reasoning that eludes me). Can you please clarify why this check is necessary?

@albertzaharovits

This was introduced in response to my feedback above

Limit user to single concurrent auth per realm #30794 (comment)

That link doesn't seem to work correctly.
It's the 3rd comment in my review above.

Thanks @tvernum . I understand, it's always better to test for the actual thing, if you can, instead of a surrogate.
But as it makes you feel surer on the whole it makes me feel less of. The whole seems too gnarly: handleResult has both createdAndStartedFuture and performedAuthentication as params.
Then, I think we can do without createdAndStartedFuture.
I will open a nimble refactor non-issue on this.

@tvernum I am going to concede on this one, for now 😭 .
I could not find a satisfactory refactoring such that the current behavior is maintained over all branches. Specifically, I was looking to minimize authenticationResult.isAuthenticated() checks, as well as to split the listener into two types, one for the thread going for the network call and the other for the stalled threads. In the end I got a huge branching monster no better than the present one.
😢

After elastic#30794, our caching realms limit each principal to a single auth attempt at a time. This prevents hammering of external servers but can cause a significant performance hit when requests need to go through a realm that takes a long time to attempt to authenticate in order to get to the realm that actually authenticates. In order to address this, this change will propagate failed results to listeners if they use the same set of credentials that the authentication attempt used. This does prevent these stalled requests from retrying the authentication attempt but the implementation does allow for new requests to retry the attempt.

After #30794, our caching realms limit each principal to a single auth attempt at a time. This prevents hammering of external servers but can cause a significant performance hit when requests need to go through a realm that takes a long time to attempt to authenticate in order to get to the realm that actually authenticates. In order to address this, this change will propagate failed results to listeners if they use the same set of credentials that the authentication attempt used. This does prevent these stalled requests from retrying the authentication attempt but the implementation does allow for new requests to retry the attempt.

jaymode added >enhancement v7.0.0 :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v6.4.0 labels May 22, 2018

jaymode requested a review from tvernum May 22, 2018 19:29

tvernum reviewed May 23, 2018

View reviewed changes

jaymode added 2 commits May 23, 2018 08:06

add javadoc and remove tuple from method signature

bf159f2

harden check for performed auth

458fa7c

jaymode added 2 commits May 23, 2018 11:00

hasher can be null for lookups

c234979

Merge branch 'master' into per_user_auth_limit

3b19019

tvernum approved these changes May 24, 2018

View reviewed changes

jaymode merged commit b3a4acd into elastic:master May 24, 2018

jaymode deleted the per_user_auth_limit branch May 24, 2018 16:43

albertzaharovits added a commit that referenced this pull request May 27, 2018

[TEST] Fix minor random bug from #30794

e888467

albertzaharovits added a commit that referenced this pull request May 27, 2018

[TEST] Fix minor random bug from #30794

935c6c7

droberts195 mentioned this pull request Jun 4, 2018

[CI] Various MlWithSecurityIT tests failing on Windows with incorrect auth failures #31058

Closed

albertzaharovits reviewed Aug 2, 2018

View reviewed changes

albertzaharovits mentioned this pull request Aug 6, 2018

Refactor CachingUsernamePassword realm #32646

Merged

jaymode mentioned this pull request Dec 20, 2018

Security: propagate auth result to listeners #36900

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit user to single concurrent auth per realm #30794

Limit user to single concurrent auth per realm #30794

jaymode commented May 22, 2018

elasticmachine commented May 22, 2018

tvernum left a comment

tvernum May 23, 2018

tvernum May 23, 2018

tvernum May 23, 2018

jaymode commented May 23, 2018

tvernum left a comment

albertzaharovits commented May 27, 2018

jaymode commented May 29, 2018

albertzaharovits Aug 2, 2018

tvernum Aug 2, 2018

tvernum Aug 2, 2018

albertzaharovits Aug 2, 2018

albertzaharovits Aug 3, 2018 •

edited

Loading

Limit user to single concurrent auth per realm #30794

Limit user to single concurrent auth per realm #30794

Conversation

jaymode commented May 22, 2018

elasticmachine commented May 22, 2018

tvernum left a comment

Choose a reason for hiding this comment

tvernum May 23, 2018

Choose a reason for hiding this comment

tvernum May 23, 2018

Choose a reason for hiding this comment

tvernum May 23, 2018

Choose a reason for hiding this comment

jaymode commented May 23, 2018

tvernum left a comment

Choose a reason for hiding this comment

albertzaharovits commented May 27, 2018

jaymode commented May 29, 2018

albertzaharovits Aug 2, 2018

Choose a reason for hiding this comment

tvernum Aug 2, 2018

Choose a reason for hiding this comment

tvernum Aug 2, 2018

Choose a reason for hiding this comment

albertzaharovits Aug 2, 2018

Choose a reason for hiding this comment

albertzaharovits Aug 3, 2018 • edited Loading

Choose a reason for hiding this comment

albertzaharovits Aug 3, 2018 •

edited

Loading