Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
32421: engineccl: ignore intents beneath start in MVCCIncrementalIterator r=petermattis,nvanbenschoten a=benesch As determined in cockroachdb#28358, using time-bound iterators is rife with pitfalls. Specifically, the keys returned outside of the time bounds might be wildly inconsistent. A iteration over time bounds [ts3, ts4] might observe an intent at time ts2 or ts5 as still pending when in fact it was resolved, just in an SST that was not considered by the iterator. The only guarantee is that the snapshot of keys within the [ts3, ts4] time bounds is consistent. (Currently this isn't quite true, thanks to another bug, but this is the guarantee that time-bound iterators should be providing.) MVCCIncrementalIterator mostly handled these pitfalls correctly. It properly ignored all non-metadata keys outside of the timestamp bounds, as well as metadata keys (i.e., intents) above the upper timestamp bound. It was not, however, ignoring intents beneath the lower timestamp bound. Since these intents might be inconsistent (i.e., they might already be resolved), the iterator must ignore them. The most problematic symptom was that ExportRequests, which use an MVCCIncrementalIterator, could get stuck trying to resolve an already-resolved intent. The ExportRequest would return a WriteIntentError because it would observe the intent and not its resolution, but resolving the intent would be a no-op because the intent was, in fact, already resolved, and so retrying the ExportRequest would run into the same problem. The situation would eventually unstick when a RocksDB compaction rearranged SSTs such that the ExportRequest observed both the intent and its resolution. Note that 1eb3b2a disabled the use of time-bound iterators in all other code paths that would have had a similar problem. Release note: None Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
- Loading branch information