Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test] Reduce concurrency when testing creation of security index #75293

Merged
merged 4 commits into from
Sep 16, 2021

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Jul 13, 2021

The internal cluster could get overwhelmed by many nodes and large
number of concurrent putUser request. It can sometimes fail with
confusing messages when JVM is under pressure. This PR reduces the
concurrency so it has a better chance to succeed.

Resolves: #73779

The internal cluster could get overwhelmed by many nodes and large
number of concurrent putUser request. It can sometimes fail with
confusing messages when JVM is under pressure. This PR reduces the
concurrency so it has a better chance to succeed.
@ywangd ywangd added >test Issues or PRs that are addressing/adding tests :Security/Security Security issues without another label v8.0.0 auto-backport Automatically create backport pull requests when merged v7.15.0 labels Jul 13, 2021
@ywangd ywangd requested a review from tvernum July 13, 2021 11:59
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Jul 13, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

Comment on lines +30 to +31
final int numThreads = Math.min(50, scaledRandomIntBetween((processors + 1) / 2, 4 * processors)); // up to 50 threads
final int maxNumRequests = 50 / numThreads; // bound to a maximum of 50 requests
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also choose to reduce the concurrency only when the test is inFipsJvm since the build stats show that all failures so far are for FIPS jobs.

Comment on lines 74 to 78
// TODO: We could also just assert future.actionGet does not throw since either created
// or updated is sufficient for the test purpose, i.e. security index is created.
for (ActionFuture<PutUserResponse> future : futures) {
assertTrue(future.actionGet().created());
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to add this comment to be able to comment this piece of code on the GitHub page. Since the test just needs to ensure security index is created when there are concurrent operations, either a "created" or "updated" user is sufficient. If we make this change, it has the benefit to not fail the test if the same edge case (#73779) happens. Note this change does not replace the reduction of concurrency because large number of threads can fail at prepration phase which is before the requests can be sent off at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make that change, given what the test is really trying to do.
It's unfortunate that PutUser might sometimes given a misleading response, but it's a natural consequence consequence consequence of not being able to guarantee exactly-once-delivery. Sometime an internal retry will produce a no-op update instead of a create.
But the point is that the index & alias are created and the Put operation succeeds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated now.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Sorry this one got pushed down the list of reviews, and I forgot about it.

@ywangd ywangd added auto-backport-and-merge and removed v7.15.0 auto-backport Automatically create backport pull requests when merged labels Sep 16, 2021
@ywangd ywangd merged commit 4afe1e5 into elastic:master Sep 16, 2021
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Sep 16, 2021
…astic#75293)

The internal cluster could get overwhelmed by many nodes and large
number of concurrent putUser request. It can sometimes fail with
confusing messages when JVM is under pressure. This PR reduces the
concurrency so it has a better chance to succeed.

The test also no longer relies on the user being "created" instead of "updated".
Since they do not make a difference for the purpose of this test.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.x

elasticsearchmachine pushed a commit that referenced this pull request Sep 16, 2021
…5293) (#77836)

The internal cluster could get overwhelmed by many nodes and large
number of concurrent putUser request. It can sometimes fail with
confusing messages when JVM is under pressure. This PR reduces the
concurrency so it has a better chance to succeed.

The test also no longer relies on the user being "created" instead of "updated".
Since they do not make a difference for the purpose of this test.
wjp719 added a commit to wjp719/elasticsearch that referenced this pull request Sep 18, 2021
* master: (185 commits)
  Implement get and containsKey in terms of the wrapped innerMap (elastic#77965)
  Adjust Lucene version and enable BWC tests (elastic#77933)
  Disable BWC to upgrade to Lucene-8.10-snapshot
  Reenable MlDistributedFailureIT
  [DOCS] Fix typo for `script.painless.regex.enabled` setting value (elastic#77853)
  Upgrade to Lucene-8.10.0-snapshot-bf2fcb53079 (elastic#77801)
  [DOCS] Fix ESS install lead-in (elastic#77887)
  Resolve thirdparty gradle plugin artifacts from mavencentral (elastic#77865)
  Reduce the number of times that `LifecycleExecutionState` is parsed when running a policy. (elastic#77863)
  Utility methods to add and remove backing indices from data streams (elastic#77778)
  Use Objects.equals() instead of == to compare strings (elastic#77840)
  [ML] prefer least allocated model when a new node is added to the cluster (elastic#77756)
  Deprecate ignore_throttled parameter (elastic#77479)
  Improve LifecycleExecutionState parsing. (elastic#77855)
  [DOCS] Removes deprecated word from HLRC title. (elastic#77851)
  Remove legacy geo code from AggregationResultUtils (elastic#77702)
  Adjust SearchableSnapshotsBlobStoreCacheIntegTests.testBlobStoreCache (elastic#77758)
  Laxify SecureSM to allow creation of the JDK's innocuous threads (elastic#77789)
  [Test] Reduce concurrency when testing creation of security index (elastic#75293)
  Refactor metric PipelineAggregation integration test (elastic#77548)
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Security/Security Security issues without another label Team:Security Meta label for security team >test Issues or PRs that are addressing/adding tests v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SecurityIndexManagerIntegTests testConcurrentOperationsTryingToCreateSecurityIndexAndAlias failing
6 participants