-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Never leave stale delete tombstones in version map #29619
Conversation
Today the VersionMap does not clean up a stale delete tombstone if it does not require safe access. However, in a very rare situation due to concurrent refreshes, the safe-access flag may be flipped over then an engine accidentally consult that stale delete tombstone. This commit ensures to never leave stale delete tombstones in a version map by always pruning delete tombstones when putting a new index entry regardless of the value of the safe-access flag.
Pinging @elastic/es-distributed |
@DaveCTurner, I have a test simulating the violated case found by the formal model. However, I discussed with @bleskes and decided not to include it because the version-map test may be enough. You can refer this test at 2cff7fb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, but I'm not going to be able to check it in detail before tomorrow now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prod code lgtm. I left a question on the tests.
@@ -312,6 +312,10 @@ void maybePutIndexUnderLock(BytesRef uid, IndexVersionValue version) { | |||
if (maps.isSafeAccessMode()) { | |||
putIndexUnderLock(uid, version); | |||
} else { | |||
// The existing delete tombstone is no longer the latest version (.i.e stale) | |||
// when a newer version is processed regardless of the safe mode of the version map. | |||
// Therefore, it's better to always prune that tombstone to avoid accidental accesses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
// Even though we don't store a record of the indexing operation (and mark as unsafe) we should still remove any previous delete for this uuid. Note that this is expected to not hurt performance because the tombstone is small (if not empty) when usafe is relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the comment.
@@ -396,6 +397,50 @@ public void testPruneTombstonesWhileLocked() throws InterruptedException, IOExce | |||
assertEquals(0, map.getAllTombstones().size()); | |||
} | |||
|
|||
public void testNeverLeaveStaleDeleteTombstone() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why we need concurrency here? (i.e., why a single threaded test is not enough)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single threaded test is good enough ;)
@bleskes I've addressed your comments. Can you please have another look? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some comments. No need for another review.
} else { | ||
versionMap.maybePutIndexUnderLock(uid, new IndexVersionValue(randomTranslogLocation(), version, 1, 1)); | ||
} | ||
VersionValue storedValue = versionMap.getUnderLock(uid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also check that this return value correlates with what we expect? i.e., if the doc was just deleted it should be deleted, if it was indexed it should never be marked as deleted? (and found if we're in safe mode). Also can you check this property holds also not directly after indexing/deleting? (i.e., after refresh, enforceSafe access etc). Said differently - maybe also randomly do nothing under lock?
@@ -396,6 +397,32 @@ public void testPruneTombstonesWhileLocked() throws InterruptedException, IOExce | |||
assertEquals(0, map.getAllTombstones().size()); | |||
} | |||
|
|||
public void testNeverLeaveStaleDeleteTombstone() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe name the test: "testRandomlyDeletingAndIndexingDoc"? it tests much more, which is good.
Thanks @bleskes and @DaveCTurner for reviewing. @DaveCTurner I am merging this to catch the release train but feel free to leave your feedbacks. |
Previously we did not put an indexing to a version map if that map does not require safe access but removed the existing delete tombstone only if assertion enabled. In #29585, we removed the side-effect caused by assertion then this test started failing. This failure can be explained as follows: - Step 1: Index a doc then delete that doc - Step 2: The version map can switch to unsafe mode because of concurrent refreshes (implicitly called by flushes) - Step 3: Index a document - the version map won't add this version value and won't prune the tombstone (previously it did) - Step 4: Delete a document - this will return NOT_FOUND instead of DELETED because of the stale delete tombstone This failure is actually fixed by #29619 in which we never leave stale delete tombstones Closes #29626
Today the VersionMap does not clean up a stale delete tombstone if it does not require safe access. However, in a very rare situation due to concurrent refreshes, the safe-access flag may be flipped over then an engine accidentally consult that stale delete tombstone. This commit ensures to never leave stale delete tombstones in a version map by always pruning delete tombstones when putting a new index entry regardless of the value of the safe-access flag.
Previously we did not put an indexing to a version map if that map does not require safe access but removed the existing delete tombstone only if assertion enabled. In #29585, we removed the side-effect caused by assertion then this test started failing. This failure can be explained as follows: - Step 1: Index a doc then delete that doc - Step 2: The version map can switch to unsafe mode because of concurrent refreshes (implicitly called by flushes) - Step 3: Index a document - the version map won't add this version value and won't prune the tombstone (previously it did) - Step 4: Delete a document - this will return NOT_FOUND instead of DELETED because of the stale delete tombstone This failure is actually fixed by #29619 in which we never leave stale delete tombstones Closes #29626
* es/master: (32 commits) TEST: Unmute testPrimaryRelocationWhileIndexing Remove remaining tribe node references (#29574) Never leave stale delete tombstones in version map (#29619) Do not serialize common stats flags using ordinal (#29600) Remove stale comment from JVM stats (#29625) TEST: Mute testPrimaryRelocationWhileIndexing Remove bulk fallback for write thread pool (#29609) Fix an incorrect reference to 'zero_terms_docs' in match_phrase queries. Update the version compatibility for zero_terms_query in match_phrase. Account translog location to ram usage in version map Remove extra spaces from changelog Add support to match_phrase query for zero_terms_query. (#29598) Fix incorrect references to 'zero_terms_docs' in query parsing error messages. (#29599) Build: Move java home checks to pre-execution phase (#29548) Avoid side-effect in VersionMap when assertion enabled (#29585) [Tests] Remove accidental logger usage Add tests for ranking evaluation with aliases (#29452) Deprecate use of `htmlStrip` as name for HtmlStripCharFilter (#27429) Update plan for the removal of mapping types. (#29586) [Docs] Add rankEval method for Jva HL client ...
* es/6.x: (28 commits) TEST: Unmute testPrimaryRelocationWhileIndexing Never leave stale delete tombstones in version map (#29619) Do not serialize common stats flags using ordinal (#29600) Remove stale comment from JVM stats (#29625) TEST: Mute testPrimaryRelocationWhileIndexing Remove 7.0.0 from 6.x changelog (#29621) Add support to match_phrase query for zero_terms_query. (#29598) Account translog location to ram usage in version map Avoid side-effect in VersionMap when assertion enabled (#29585) Build: Move java home checks to pre-execution phase (#29548) Add tests for ranking evaluation with aliases (#29452) [Test] Fix assertion in SearchDocumentationIT Deprecate use of `htmlStrip` as name for HtmlStripCharFilter (#27429) test: Assert deprecated http.enebled setting warning Update plan for the removal of mapping types. (#29586) [Docs] Add rankEval method for Jva HL client Make ranking evaluation details accessible for client Rename the bulk thread pool to write thread pool (#29593) [Test] Minor changes to rank_eval tests (#29577) test: Assert deprecated http.enebled setting warning ...
Today the VersionMap does not clean up a stale delete tombstone if it
does not require safe access. However, in a very rare situation due to
concurrent refreshes, the safe-access flag may be flipped over then an
engine accidentally consult that stale delete tombstone.
This commit ensures to never leave stale delete tombstones in a version
map by always pruning delete tombstones when putting a new index entry
regardless of the value of the safe-access flag.