-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow _update and upsert to read from the transaction log #29264
Conversation
We historically removed reading from the transaction log to get consistent results from _GET calls. There was also the motivation that the read-modify-update principle we apply should not be hidden from the user. We still agree on the fact that we should not hide these aspects but the impact on updates is quite significant especially if the same documents is updated before it's written to disk and made serachable. This change adds back the ability to read from the transaction log but only for update calls. Calls to the _GET API will always do a refresh if necessary to return consistent results ie. if stored fields or DocValues Fields are requested.
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. I left some minor questions.
} | ||
if (operation.routing() != null && visitor.needsField(FAKE_ROUTING_FIELD) == StoredFieldVisitor.Status.YES) { | ||
visitor.stringField(FAKE_ROUTING_FIELD, operation.routing().getBytes(StandardCharsets.UTF_8)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know some visitors won't like that _id is not seen, so maybe assert that visitor.needsField("_id")
is false?
new VersionsAndSeqNoResolver.DocIdAndVersion(0, ((Translog.Index) operation).version(), reader, 0)); | ||
} | ||
} catch (IOException e) { | ||
throw new UncheckedIOException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to be using EngineException as a wrapper in other places?
} | ||
}; | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we throw an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some minor questions and comments. Thanks @s1monw
/** | ||
* Returns the translog location for this version value or null. This is optional and might not be tracked all the time. | ||
*/ | ||
public Translog.Location getLocation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add @Nullable
?
TranslogReader translogReader = readers.get(i); | ||
if (translogReader.generation == location.generation) { | ||
reader = translogReader; | ||
onClose = acquireTranslogGenFromDeletionPolicy(current.generation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closing looks to be an expensive call, because upon closing, the write lock is acquired twice (once in trimUnreferencedReaders, and once in closeFilesIfNoPendingRetentionLocks). I wonder if we need to optimize those two methods now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into it
} | ||
if (current.generation == location.generation) { | ||
// fsync here to ensure all buffers are written to disk | ||
current.syncUpTo(location.translogLocation + location.size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why require this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the buffer that we have in memory need to be written to disk if we point to it from the record. I will add a comment
if (versionValue.getLocation() != null) { | ||
try { | ||
Translog.Operation operation = translog.readOperation(versionValue.getLocation()); | ||
if (operation != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when do we expect this to be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the translog generation is not available anymore I think it’s unlikely but I see a chance. I can add a comment in the code
new VersionsAndSeqNoResolver.DocIdAndVersion(0, ((Translog.Index) operation).version(), reader, 0)); | ||
} | ||
} catch (IOException e) { | ||
throw new EngineException(shardId, "failed to read operation from translog", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to fail the engine (as we do when indexing)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the read side of things I am not sure any imo exceptions are fatal?!
@ywelsch I pushed changes. thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
new VersionsAndSeqNoResolver.DocIdAndVersion(0, ((Translog.Index) operation).version(), reader, 0)); | ||
} | ||
} catch (IOException e) { | ||
maybeFailEngine("realtime_get", e); // lets check if the translog has failed with a tragic event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places, we have wrapped the maybeFailEngine
call as follows:
try {
maybeFailEngine("index", e);
} catch (Exception inner) {
e.addSuppressed(inner);
}
throw ...;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's bogus I think. There is no exception throw from maybeFaileEngine only errors which should not be handled. I think we are good here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will clean this up in a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for not letting this go.
We historically removed reading from the transaction log to get consistent results from _GET calls. There was also the motivation that the read-modify-update principle we apply should not be hidden from the user. We still agree on the fact that we should not hide these aspects but the impact on updates is quite significant especially if the same documents is updated before it's written to disk and made serachable. This change adds back the ability to read from the transaction log but only for update calls. Calls to the _GET API will always do a refresh if necessary to return consistent results ie. if stored fields or DocValues Fields are requested. Closes #26802
@s1monw Is there any chance this fix will ever get back ported to the 5.x release branch? I'm coming from Elasticsearch 1.7, and it's significantly more work to upgrade to ES 6.x than ES 5.x due to several breaking changes, notably scripting (we currently make heavy use of Groovy). I'm hoping to upgrade to 5.x as a stepping-stone to 6.x, but my use-case involves heavy, rapid indexing, and this bug slows down indexing in 5.x badly. |
@jonaf this is a major change and 5.6 is in it's last maintenance mode (until 7.0 is released). It only receive small crucial bug fixes or security fixes. I'm afraid this one can't go there. I understand you want to upgrade from 1.7 (it's impressive how long it worked for you) but you'd have to invest in moving to 6.x, if you can't find a way to work around this (like batching updates to make sure the extra refresh in 5.x has less costs). |
Thanks for your reply, @bleskes . We're already batching (to the maximum extent that it improves indexing speed), so the impact is too dramatic for us to ignore. We'll have to go all the way to 6.x, then. |
Hi, I checked the code shows that _GET call always read from a reader for consistency, How about If we use _GET api with preference |
The realtime GET API currently has erratic performance in case where a document is accessed that has just been indexed but not refreshed yet, as the implementation will currently force an internal refresh in that case. Refreshing can be an expensive operation, and also will block the thread that executes the GET operation, blocking other GETs to be processed. In case of frequent access of recently indexed documents, this can lead to a refresh storm and terrible GET performance. While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted to read from the translog in case of realtime GET API or update API, this was removed in 5.0 (#20102) to avoid inconsistencies between values that were returned from the translog and those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and upsert to read from the translog again as it was easier to guarantee consistency for these, and also brought back more predictable performance characteristics of this API. Calls to the realtime GET API, however, would still always do a refresh if necessary to return consistent results. This means that users that were calling realtime GET APIs to coordinate updates on client side (realtime GET + CAS for conditional index of updated doc) would still see very erratic performance. This PR (together with #48707) resolves the inconsistencies between reading from translog and index. In particular it fixes the inconsistencies that happen when requesting stored fields, which were not available when reading from translog. In case where stored fields are requested, this PR will reparse the _source from the translog and derive the stored fields to be returned. With this, it changes the realtime GET API to allow reading from the translog again, avoid refresh storms and blocking the GET threadpool, and provide overall much better and predictable performance for this API.
The realtime GET API currently has erratic performance in case where a document is accessed that has just been indexed but not refreshed yet, as the implementation will currently force an internal refresh in that case. Refreshing can be an expensive operation, and also will block the thread that executes the GET operation, blocking other GETs to be processed. In case of frequent access of recently indexed documents, this can lead to a refresh storm and terrible GET performance. While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted to read from the translog in case of realtime GET API or update API, this was removed in 5.0 (#20102) to avoid inconsistencies between values that were returned from the translog and those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and upsert to read from the translog again as it was easier to guarantee consistency for these, and also brought back more predictable performance characteristics of this API. Calls to the realtime GET API, however, would still always do a refresh if necessary to return consistent results. This means that users that were calling realtime GET APIs to coordinate updates on client side (realtime GET + CAS for conditional index of updated doc) would still see very erratic performance. This PR (together with #48707) resolves the inconsistencies between reading from translog and index. In particular it fixes the inconsistencies that happen when requesting stored fields, which were not available when reading from translog. In case where stored fields are requested, this PR will reparse the _source from the translog and derive the stored fields to be returned. With this, it changes the realtime GET API to allow reading from the translog again, avoid refresh storms and blocking the GET threadpool, and provide overall much better and predictable performance for this API.
We historically removed reading from the transaction log to get consistent
results from _GET calls. There was also the motivation that the read-modify-update
principle we apply should not be hidden from the user. We still agree on the fact
that we should not hide these aspects but the impact on updates is quite significant
especially if the same documents is updated before it's written to disk and made serachable.
This change adds back the ability to read from the transaction log but only for update calls.
Calls to the _GET API will always do a refresh if necessary to return consistent results ie.
if stored fields or DocValues Fields are requested.
Closes #26802