Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid deadlocks in cache #30461

Merged
merged 5 commits into from
May 9, 2018
Merged

Conversation

jasontedor
Copy link
Member

This commit avoids deadlocks in the cache by removing dangerous places where we try to take the LRU lock while completing a future. Instead, we block for the future to complete, and then execute the handling code under the LRU lock (for example, eviction).

Closes #30428

This commit avoids deadlocks in the cache by removing dangerous places
where we try to take the LRU lock while completing a future. Instead, we
block for the future to complete, and then execute the handling code
under the LRU lock (for example, eviction).
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

* master:
  [Docs] Fix typo in cardinality-aggregation.asciidoc (elastic#30434)
  Avoid NPE in `more_like_this` when field has zero tokens (elastic#30365)
  Build: Switch to building javadoc with html5 (elastic#30440)
* elastic/master:
  Mute ML upgrade test (elastic#30458)
  Stop forking javac (elastic#30462)
  Client: Deprecate many argument performRequest (elastic#30315)
  Docs: Use task_id in examples of tasks (elastic#30436)
  Security: Rename IndexLifecycleManager to SecurityIndexManager (elastic#30442)
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question

@@ -632,7 +625,7 @@ public void remove() {
Entry<K, V> entry = current;
if (entry != null) {
CacheSegment<K, V> segment = getCacheSegment(entry.key);
segment.remove(entry.key);
segment.remove(entry.key, f -> {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this follow a different pattern than invalidate? as far as I can tell if we don't wait for the future to be completed, it may be re-inserted into the LRU by the future completion logic. I would also like to understand why this isn't a race condition even if you do complete the future (i.e., aren't we susceptible to race conditions in between the execution of the handler and the get() returning), which will cause the LRU to go out of sync (similar issue in put)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor and I discussed this on another channel. The reason for a different execution paths on the call backs has to do with whether we already hold a reference to the relevant entry or not. I personally prefer to not have two paths here but not enough to request a change.

I would also like to understand why this isn't a race condition even if you do complete the future (i.e., aren't we susceptible to race conditions in between the execution of the handler and the get() returning), which will cause the LRU to go out of sync (similar issue in put)

This one is guarded against by the state in the entry. Deleting an entry also changes the state to deleted and thus it will not be re-added by the handler in computeIfAbsent. That said we found another issue there where delete doesn't mark the entry as deleted if it's in the new state. This will be dealt with in a followup.

@jasontedor jasontedor merged commit 4defaa4 into elastic:master May 9, 2018
@jasontedor jasontedor deleted the cache-deadlock branch May 9, 2018 15:53
jasontedor added a commit that referenced this pull request May 9, 2018
This commit avoids deadlocks in the cache by removing dangerous places
where we try to take the LRU lock while completing a future. Instead, we
block for the future to complete, and then execute the handling code
under the LRU lock (for example, eviction).
jasontedor added a commit that referenced this pull request May 9, 2018
This commit avoids deadlocks in the cache by removing dangerous places
where we try to take the LRU lock while completing a future. Instead, we
block for the future to complete, and then execute the handling code
under the LRU lock (for example, eviction).
jasontedor added a commit that referenced this pull request May 9, 2018
This commit avoids deadlocks in the cache by removing dangerous places
where we try to take the LRU lock while completing a future. Instead, we
block for the future to complete, and then execute the handling code
under the LRU lock (for example, eviction).
dnhatn added a commit that referenced this pull request May 10, 2018
* master:
  Upgrade to Lucene-7.4-snapshot-6705632810 (#30519)
  add version compatibility from 6.4.0 after backport, see #30319 (#30390)
  Security: Simplify security index listeners (#30466)
  Add proper longitude validation in geo_polygon_query (#30497)
  Remove Discovery.AckListener.onTimeout() (#30514)
  Build: move generated-resources to build (#30366)
  Reindex: Fold "with all deps" project into reindex (#30154)
  Isolate REST client single host tests (#30504)
  Solve Gradle deprecation warnings around shadowJar (#30483)
  SAML: Process only signed data (#30420)
  Remove BWC repository test (#30500)
  Build: Remove xpack specific run task (#30487)
  AwaitsFix IntegTestZipClientYamlTestSuiteIT#indices.split tests
  LLClient: Add setJsonEntity (#30447)
  Expose CommonStatsFlags directly in IndicesStatsRequest. (#30163)
  Silence IndexUpgradeIT test failures. (#30430)
  Bump Gradle heap to 1792m (#30484)
  [docs] add warning for read-write indices in force merge documentation (#28869)
  Avoid deadlocks in cache (#30461)
  Test: remove hardcoded list of unconfigured ciphers (#30367)
  mute SplitIndexIT due to #30416
  Docs: Test examples that recreate lang analyzers  (#29535)
  BulkProcessor to retry based on status code (#29329)
  Add GET Repository High Level REST API (#30362)
  add a comment explaining the need for RetryOnReplicaException on missing mappings
  Add `coordinating_only` node selector (#30313)
  Stop forking groovyc (#30471)
  Avoid setting connection request timeout (#30384)
  Use date format in `date_range` mapping before fallback to default (#29310)
  Watcher: Increase HttpClient parallel sent requests (#30130)

# Conflicts:
#	x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/LocalStateCompositeXPackPlugin.java
dnhatn added a commit that referenced this pull request May 10, 2018
* 6.x:
  Upgrade to Lucene-7.4-snapshot-6705632810 (#30519)
  Remove Discovery.AckListener.onTimeout() (#30514)
  Build: move generated-resources to build (#30366)
  Reindex: Fold "with all deps" project into reindex (#30154)
  Isolate REST client single host tests (#30504)
  Remove BWC repository test (#30500)
  Build: Remove xpack specific run task (#30487)
  AwaitsFix IntegTestZipClientYamlTestSuiteIT#indices.split tests
  LLClient: Add setJsonEntity (#30447)
  [docs] add warning for read-write indices in force merge documentation (#28869)
  Avoid deadlocks in cache (#30461)
  BulkProcessor to retry based on status code (#29329)
  Avoid setting connection request timeout (#30384)
  Test: remove hardcoded list of unconfigured ciphers (#30367)
  Add GET Repository High Level REST API (#30362)
  mute SplitIndexIT due to #30416
  Docs: Test examples that recreate lang analyzers  (#29535)
  add a comment explaining the need for RetryOnReplicaException on missing mappings
  Pass the task to broadcast actions (#29672)
  Stop forking groovyc (#30471)
  Add `coordinating_only` node selector (#30313)
  Fix accidental error in changelog
  Use date format in `date_range` mapping before fallback to default (#29310)
  Watcher: Increase HttpClient parallel sent requests (#30130)
  [Security][Tests] Azeri(Turkish) locale tripps opensaml dependency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants