Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure to release translog snapshot in primary-replica resync #32045

Merged
merged 6 commits into from
Jul 17, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jul 13, 2018

We create a translog snapshot inside the resync method, and that
snapshot will be closed by the resync listener. However, if the resync
method throws an exception before the resync listener is initialized,
the translog snapshot won't be released.

Closes #32030

We create a translog snapshot inside the resync method, and that
snapshot will be closed by the resync listener. However, if the resync
method throws an exception before the resync listener is initialized,
the translog snapshot won't be released.

Closes elastic#32030
@dnhatn dnhatn added >bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.4.0 v6.3.2 labels Jul 13, 2018
@dnhatn dnhatn requested a review from ywelsch July 13, 2018 16:24
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Translog.Snapshot wrappedSnapshot = new Translog.Snapshot() {

snapshot = new Translog.Snapshot() {
final AtomicBoolean closed = new AtomicBoolean(); // closed once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not protect against double closing in the snapshot it self? this is a common problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this boolean. It's ok to close a translog snapshot multiple times.

@dnhatn
Copy link
Member Author

dnhatn commented Jul 14, 2018

@elasticmachine test this please

@dnhatn
Copy link
Member Author

dnhatn commented Jul 16, 2018

@ywelsch I've updated the PR to make sure that we hold the reference immediately when creating a snapshot. Can you please have a look? Thank you!

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and thanks for fixing.

public void onFailure(Exception e) {
try {
super.onFailure(e);
}finally {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space missing

IOUtils.close(snapshot);
} catch (IOException inner) {
e.addSuppressed(inner);
}
if (resyncListener != null) {
resyncListener.onFailure(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed anymore now (i.e. the resyncListener does not need to exist outside the try-catch block

public void onResponse(PrimaryReplicaSyncer.ResyncTask result) {
try {
super.onResponse(result);
}finally {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space missing

@dnhatn dnhatn requested a review from ywelsch July 17, 2018 03:09
@dnhatn
Copy link
Member Author

dnhatn commented Jul 17, 2018

@ywelsch I moved the listener into the try clause. Can you take another look? Thank you!

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Jul 17, 2018

Thanks @ywelsch and @bleskes

@dnhatn dnhatn merged commit ef81c1d into elastic:master Jul 17, 2018
@dnhatn dnhatn deleted the resync-leak-snapshot branch July 17, 2018 13:41
dnhatn added a commit that referenced this pull request Jul 17, 2018
Previously we create a translog snapshot inside the resync method, 
and that snapshot will be closed by the resync listener. However, if 
the resync method throws an exception before the resync listener 
is initialized, the translog snapshot won't be released.

Closes #32030
dnhatn added a commit that referenced this pull request Jul 19, 2018
* 6.x:
  Fix rollup on date fields that don't support epoch_millis (#31890)
  Revert "Introduce a Hashing Processor (#31087)" (#32179)
  [test] use randomized runner in packaging tests (#32109)
  Painless: Fix caching bug and clean up addPainlessClass. (#32142)
  Fix BwC Tests looking for UUID Pre 6.4 (#32158) (#32169)
  Call setReferences() on custom referring tokenfilters in _analyze (#32157)
  Add more contexts to painless execute api (#30511)
  Add EC2 credential test for repository-s3 (#31918)
  Fix CP for namingConventions when gradle home has spaces (#31914)
  Convert Version to Java - clusterformation part1 (#32009)
  Fix Java 11 javadoc compile problem
  Improve docs for search preferences (#32098)
  Configurable password hashing algorithm/cost(#31234) (#32092)
  [DOCS] Update TLS on Docker for 6.3
  ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
  Switch distribution to new style Requests (#30595)
  Build: Skip jar tests if jar disabled
  Build: Move shadow customizations into common code (#32014)
  Painless: Add PainlessClassBuilder (#32141)
  Fix accidental duplication of bwc test for script behavior
  Handle missing values in painless (#30975) (#31903)
  Build: Make additional test deps of check (#32015)
  Painless: Fix Bug with Duplicate PainlessClasses (#32110)
  Adjust translog after versionType removed in 7.0 (#32020)
  Disable C2 from using AVX-512 on JDK 10 (#32138)
  [Rollup] Add new capabilities endpoint for concrete rollup indices (#32111)
  Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
  [ML] Wait for aliases in multi-node tests (#32086)
  Ensure to release translog snapshot in primary-replica resync (#32045)
  Docs: Fix missing example script quote (#32010)
  Add Index UUID to `/_stats` Response (#31871) (#32113)
  [ML] Move analyzer dependencies out of categorization config (#32123)
  [ML][DOCS] Add missing 6.3.0 release notes (#32099)
  Updates the build to gradle 4.9 (#32087)
  Update monitoring template version to 6040099 (#32088)
  Fix put mappings java API documentation (#31955)
  Add exclusion option to `keep_types` token filter (#32012)
dnhatn added a commit that referenced this pull request Jul 20, 2018
* master:
  Painless: Simplify Naming in Lookup Package (#32177)
  Handle missing values in painless (#32207)
  add support for write index resolution when creating/updating documents (#31520)
  ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864)
  Remove indication of future multi-homing support (#32187)
  Rest test - allow for snapshots to take 0 milliseconds
  Make x-pack-core generate a pom file
  Rest HL client: Add put watch action (#32026)
  Build: Remove pom generation for plugin zip files (#32180)
  Fix comments causing errors with Java 11
  Fix rollup on date fields that don't support epoch_millis (#31890)
  Detect and prevent configuration that triggers a Gradle bug (#31912)
  [test] port linux package packaging tests (#31943)
  Revert "Introduce a Hashing Processor (#31087)" (#32178)
  Remove empty @return from JavaDoc
  Adjust SSLDriver behavior for JDK11 changes (#32145)
  [test] use randomized runner in packaging tests (#32109)
  Add support for field aliases. (#32172)
  Painless: Fix caching bug and clean up addPainlessClass. (#32142)
  Call setReferences() on custom referring tokenfilters in _analyze (#32157)
  Fix BwC Tests looking for UUID Pre 6.4 (#32158)
  Improve docs for search preferences (#32159)
  use before instead of onOrBefore
  Add more contexts to painless execute api (#30511)
  Add EC2 credential test for repository-s3 (#31918)
  A replica can be promoted and started in one cluster state update (#32042)
  Fix Java 11 javadoc compile problem
  Fix CP for namingConventions when gradle home has spaces (#31914)
  Fix `range` queries on `_type` field for singe type indices (#31756)
  [DOCS] Update TLS on Docker for 6.3 (#32114)
  ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
  Remove versionType from translog (#31945)
  Switch distribution to new style Requests (#30595)
  Build: Skip jar tests if jar disabled
  Painless: Add PainlessClassBuilder (#32141)
  Build: Make additional test deps of check (#32015)
  Disable C2 from using AVX-512 on JDK 10 (#32138)
  Build: Move shadow customizations into common code (#32014)
  Painless: Fix Bug with Duplicate PainlessClasses (#32110)
  Remove empty @param from Javadoc
  Re-disable packaging tests on suse boxes
  Docs: Fix missing example script quote (#32010)
  [ML] Wait for aliases in multi-node tests (#32086)
  [ML] Move analyzer dependencies out of categorization config (#32123)
  Ensure to release translog snapshot in primary-replica resync (#32045)
  Handle TokenizerFactory  TODOs (#32063)
  Relax TermVectors API to work with textual fields other than TextFieldType (#31915)
  Updates the build to gradle 4.9 (#32087)
  Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
  Check that client methods match API defined in the REST spec (#31825)
  Enable testing in FIPS140 JVM (#31666)
  Fix put mappings java API documentation (#31955)
  Add exclusion option to `keep_types` token filter (#32012)
  [Test] Modify assert statement for ssl handshake (#32072)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] PrimaryReplicaSyncerTests failures with
5 participants