Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit user to single concurrent auth per realm #30794

Merged
merged 5 commits into from
May 24, 2018

Conversation

jaymode
Copy link
Member

@jaymode jaymode commented May 22, 2018

This commit reworks the way our realms perform caching in order to
limit each principal to a single ongoing authentication per realm. In
other words, this means that multiple requests made by the same user
will not trigger more that one authentication attempt at a time if no
entry has been stored in the cache. If an entry is present in our
cache, there is no restriction on the number of concurrent
authentications performed for this user.

This change enables us to limit the load we place on an external system
like an LDAP server and also preserve resources such as CPU on
expensive operations such as BCrypt authentication.

Closes #30355

This commit reworks the way our realms perform caching in order to
limit each principal to a single ongoing authentication per realm. In
other words, this means that multiple requests made by the same user
will not trigger more that one authentication attempt at a time if no
entry has been stored in the cache. If an entry is present in our
cache, there is no restriction on the number of concurrent
authentications performed for this user.

This change enables us to limit the load we place on an external system
like an LDAP server and also preserve resources such as CPU on
expensive operations such as BCrypt authentication.

Closes elastic#30355
@jaymode jaymode added >enhancement v7.0.0 :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v6.4.0 labels May 22, 2018
@jaymode jaymode requested a review from tvernum May 22, 2018 19:29
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach, but I have a suggested change.

listeners.clear();
}

private void notifyListener(Tuple<ActionListener<V>, ExecutorService> tuple) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird for this to take a tuple rather than 2 arguments (since sometimes we're constructing the Tuple just to pass as an argument and then be decomposed again)

private volatile boolean done = false;
private final List<Tuple<ActionListener<V>, ExecutorService>> listeners = new ArrayList<>();

public void addListener(ActionListener<V> listener, ExecutorService executor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs javadoc - if only to point to the explanation in the class javadoc.

cache.invalidate(token.principal(), future);
authenticateWithCache(token, listener);
} else if (authResult.isAuthenticated()) {
if (createdAndStartedFuture) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of createdAndStartedFuture here worries me.
We need to handle the two cases (directly authenticated vs loaded from cache) differently, but this variable is a couple of steps removed from that and I fear that something like a race condition somewhere in the cache could mean that this is true, but the user was not actually authenticated with the credentials from token.

Perhaps we could set an AtomicReference to User in the listener for doAuthenticate and check whether the User being pulled from the cache matches. Then were no longer testing "did we create the future" but "did we actually authenticate this User object".

e.g.

 doAuthenticate(token, ActionListener.wrap(result -> {
                    if (result.isAuthenticated()) {
                        final User user = result.getUser();
                        authenticatedUser.set(user);
       final boolean wasAuthenticatedWithThisToken = tuple.v2() == authenticatedUser.get();
       handleResult(future, wasAuthenticatedWithThisToken, token, tuple, listener);

@jaymode
Copy link
Member Author

jaymode commented May 23, 2018

@tvernum thanks for the suggestion. I implemented the suggestion slightly differently. Let me know what you think.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaymode jaymode merged commit b3a4acd into elastic:master May 24, 2018
@jaymode jaymode deleted the per_user_auth_limit branch May 24, 2018 16:43
jaymode added a commit that referenced this pull request May 24, 2018
This commit reworks the way our realms perform caching in order to
limit each principal to a single ongoing authentication per realm. In
other words, this means that multiple requests made by the same user
will not trigger more that one authentication attempt at a time if no
entry has been stored in the cache. If an entry is present in our
cache, there is no restriction on the number of concurrent
authentications performed for this user.

This change enables us to limit the load we place on an external system
like an LDAP server and also preserve resources such as CPU on
expensive operations such as BCrypt authentication.

Closes #30355
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request May 24, 2018
* master:
  Update the version checks around ip_range bucket keys, now that the change was backported.
  Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges
  Use geohash cell instead of just a corner in geo_bounding_box (elastic#30698)
  Limit user to single concurrent auth per realm (elastic#30794)
  [Tests] Move templated _rank_eval tests (elastic#30679)
  Security: fix dynamic mapping updates with aliases (elastic#30787)
  Ensure that ip_range aggregations always return bucket keys. (elastic#30701)
  Use remote client in TransportFieldCapsAction (elastic#30838)
  Move Watcher versioning setting to meta field (elastic#30832)
  [Docs] Explain incomplete dates in range queries (elastic#30689)
  Move persistent task registrations to core (elastic#30755)
  Decouple ClusterStateTaskListener & ClusterApplier (elastic#30809)
  Send client headers from TransportClient (elastic#30803)
  Packaging: Ensure upgrade_is_oss flag file is always deleted (elastic#30732)
  Force stable file modes for built packages (elastic#30823)
  [DOCS] Fixes typos in security settings
  Fix GeoShapeQueryBuilder serialization after backport
martijnvg added a commit that referenced this pull request May 25, 2018
* es/master:
  Move score script context from SearchScript to its own class (#30816)
  Fix bad version check writing Repository nodes (#30846)
  [docs] explainer for java packaging tests (#30825)
  Remove Throwable usage from transport modules (#30845)
  REST high-level client: add put ingest pipeline API (#30793)
  Update the version checks around ip_range bucket keys, now that the change was backported.
  Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges
  Use geohash cell instead of just a corner in geo_bounding_box (#30698)
  Limit user to single concurrent auth per realm (#30794)
  [Tests] Move templated _rank_eval tests (#30679)
  Security: fix dynamic mapping updates with aliases (#30787)
  Ensure that ip_range aggregations always return bucket keys. (#30701)
  Use remote client in TransportFieldCapsAction (#30838)
  Move Watcher versioning setting to meta field (#30832)
  [Docs] Explain incomplete dates in range queries (#30689)
  Move persistent task registrations to core (#30755)
  Decouple ClusterStateTaskListener & ClusterApplier (#30809)
  Send client headers from TransportClient (#30803)
  Packaging: Ensure upgrade_is_oss flag file is always deleted (#30732)
  Force stable file modes for built packages (#30823)
martijnvg added a commit that referenced this pull request May 25, 2018
* es/6.x:
  Move score script context from SearchScript to its own class (#30816)
  Fix bad version check writing Repository nodes (#30846)
  Modify state of VerifyRepositoryResponse for bwc (#30762)
  QA: Fix tribe tests when running default zip
  Use remote client in TransportFieldCapsAction (#30838)
  Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges
  Ensure that ip_range aggregations always return bucket keys. (#30701)
  Limit user to single concurrent auth per realm (#30794)
  Security: fix dynamic mapping updates with aliases (#30787)
  [Tests] Move templated _rank_eval tests (#30679)
  Move Watcher versioning setting to meta field (#30832)
  Restore "Add more yaml tests for get alias API " (#30814)
  Send client headers from TransportClient (#30803)
  [Docs] Explain incomplete dates in range queries (#30689)
  Move persistent task registrations to core (#30755)
  Decouple ClusterStateTaskListener & ClusterApplier (#30809)
  Packaging: Ensure upgrade_is_oss flag file is always deleted (#30732)
  Force stable file modes for built packages (#30823)
albertzaharovits added a commit that referenced this pull request May 27, 2018
albertzaharovits added a commit that referenced this pull request May 27, 2018
@albertzaharovits
Copy link
Contributor

I have pushed e888467 and 935c6c7 to fix a minor test bug:

diff --git a/server/src/test/java/org/elasticsearch/common/cache/CacheTests.java b/server/src/test/java/org/elasticsearch/common/cache/CacheTests.java
index 3b183cce40b..a2e47e6c60f 100644
--- a/server/src/test/java/org/elasticsearch/common/cache/CacheTests.java
+++ b/server/src/test/java/org/elasticsearch/common/cache/CacheTests.java
@@ -471,7 +471,7 @@ public class CacheTests extends ESTestCase {
                     keys.add(key);
                 } else {
                     // invalidate with incorrect value
-                    cache.invalidate(key, Integer.toString(key * randomIntBetween(2, 10)));
+                    cache.invalidate(key, Integer.toString(key + randomIntBetween(2, 10)));
                 }
             }
         }
@@ -506,7 +506,7 @@ public class CacheTests extends ESTestCase {
                     invalidated.add(i);
                 } else {
                     // invalidate with incorrect value
-                    cache.invalidate(i, Integer.toString(i * randomIntBetween(2, 10)));
+                    cache.invalidate(i, Integer.toString(i + randomIntBetween(2, 10)));
                 }
             }
         }

if i is 0 the value is correct and invalidation is expected.
Sample error:

./gradlew :server:test -Dtests.seed=824E15A422DDC053 -Dtests.class=org.elasticsearch.common.cache.CacheTests -Dtests.method="testNotificationOnInvalidateWithValue { seed=[824E15A422DDC053:5A13A8665810B577]}" -Dtests.security.manager=true -Dtests.locale=es-CU -Dtests.timezone=Arctic/Longyearbyen
FAILURE 0.02s | CacheTests.testNotificationOnInvalidateWithValue { seed=[824E15A422DDC053:5A13A8665810B577]} <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected:<[0, 5155, 4680, 2665, 3114, 4237, 5134, 1138, 2674, 179, 116, 4215, 5656, 154, 1338, 2236, 2941, 5885, 2974, 4414]> but was:<[5155, 4680, 2665, 3114, 4237, 5134, 1138, 2674, 179, 116, 4215, 5656, 154, 1338, 2236, 2941, 5885, 2974, 4414]>
   >    at __randomizedtesting.SeedInfo.seed([824E15A422DDC053:5A13A8665810B577]:0)
   >    at org.elasticsearch.common.cache.CacheTests.testNotificationOnInvalidateWithValue(CacheTests.java:513)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
   >    at java.base/java.lang.Thread.run(Thread.java:844)

dnhatn added a commit that referenced this pull request May 28, 2018
* master:
  silence InstallPluginCommandTests, see #30900
  Remove left-over comment
  Fix double semicolon in import statement
  [TEST] Fix minor random bug from #30794
  Include size of snapshot in snapshot metadata #18543, bwc clean up (#30890)
  Enabling testing against an external cluster (#30885)
  Add public key header/footer (#30877)
  SQL: Remove the last remaining server dependencies from jdbc (#30771)
  Include size of snapshot in snapshot metadata (#29602)
  Do not serialize basic license exp in x-pack info (#30848)
  Change BWC version for VerifyRepositoryResponse (#30796)
  [DOCS] Document index name limitations (#30826)
  Harmonize include_defaults tests (#30700)
dnhatn added a commit that referenced this pull request May 28, 2018
* 6.x:
  Fix double semicolon in import statement
  [TEST] Fix minor random bug from #30794
  Enabling testing against an external cluster (#30885)
  SQL: Remove the last remaining server dependencies from jdbc (#30771)
  Add public key header/footer (#30877)
  Include size of snapshot in snapshot metadata (#29602)
  QA: Test template creation during rolling restart (#30850)
  REST high-level client: add put ingest pipeline API (#30793)
  Do not serialize basic license exp in x-pack info (#30848)
  [docs] explainer for java packaging tests (#30825)
  Verify signatures on official plugins (#30800)
  [DOCS] Document index name limitations (#30826)
  [Docs] Add reindex.remote.whitelist example (#30828)
@jaymode
Copy link
Member Author

jaymode commented May 29, 2018

thanks @albertzaharovits

if (tuple != null) {
final UserWithHash userWithHash = tuple.v2();
final boolean performedAuthentication = createdAndStartedFuture.get() && userWithHash != null &&
tuple.v2().user == authenticatedUser.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaymode I think tuple.v2().user == authenticatedUser.get() is redundant (or there's a deeper reasoning that eludes me). Can you please clarify why this check is necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertzaharovits

This was introduced in response to my feedback above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That link doesn't seem to work correctly.
It's the 3rd comment in my review above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tvernum . I understand, it's always better to test for the actual thing, if you can, instead of a surrogate.
But as it makes you feel surer on the whole it makes me feel less of. The whole seems too gnarly: handleResult has both createdAndStartedFuture and performedAuthentication as params.
Then, I think we can do without createdAndStartedFuture.
I will open a nimble refactor non-issue on this.

Copy link
Contributor

@albertzaharovits albertzaharovits Aug 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tvernum I am going to concede on this one, for now 😭 .
I could not find a satisfactory refactoring such that the current behavior is maintained over all branches. Specifically, I was looking to minimize authenticationResult.isAuthenticated() checks, as well as to split the listener into two types, one for the thread going for the network call and the other for the stalled threads. In the end I got a huge branching monster no better than the present one.
😢

jaymode added a commit to jaymode/elasticsearch that referenced this pull request Dec 20, 2018
After elastic#30794, our caching realms limit each principal to a single auth
attempt at a time. This prevents hammering of external servers but can
cause a significant performance hit when requests need to go through a
realm that takes a long time to attempt to authenticate in order to get
to the realm that actually authenticates. In order to address this,
this change will propagate failed results to listeners if they use the
same set of credentials that the authentication attempt used. This does
prevent these stalled requests from retrying the authentication attempt
but the implementation does allow for new requests to retry the
attempt.
jaymode added a commit that referenced this pull request Jan 8, 2019
After #30794, our caching realms limit each principal to a single auth
attempt at a time. This prevents hammering of external servers but can
cause a significant performance hit when requests need to go through a
realm that takes a long time to attempt to authenticate in order to get
to the realm that actually authenticates. In order to address this,
this change will propagate failed results to listeners if they use the
same set of credentials that the authentication attempt used. This does
prevent these stalled requests from retrying the authentication attempt
but the implementation does allow for new requests to retry the
attempt.
jaymode added a commit that referenced this pull request Jan 8, 2019
After #30794, our caching realms limit each principal to a single auth
attempt at a time. This prevents hammering of external servers but can
cause a significant performance hit when requests need to go through a
realm that takes a long time to attempt to authenticate in order to get
to the realm that actually authenticates. In order to address this,
this change will propagate failed results to listeners if they use the
same set of credentials that the authentication attempt used. This does
prevent these stalled requests from retrying the authentication attempt
but the implementation does allow for new requests to retry the
attempt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants