storage: write to AbortSpan less, clean up AbortSpan more #42765

nvanbenschoten · 2019-11-26T02:06:29Z

This PR introduces two improvements to our handling of the AbortSpan. It also introduces the first set of comprehensive testing around AbortSpan entry state transitions, which was sorely missing.

This comes after a few different customer issues that at least at first glance appeared to be AbortSpan leaks. There's still more to do to resolve those, mostly by improving GC, but anything we can do here on the frontend to reduce the number of AbortSpan entries that need to be GCed in the first place helps.

Clean up span on non-poisoning, aborting EndTransaction request

Fixes #29128.

Before this change, an EndTransaction request sent to rollback a transaction record would not remove any AbortSpan entries, even if its own Poison flag was set to false. This allowed AbortSpan entries to leak. This commit fixes this behavior by removing the AbortSpan entry in this case.

There were concerns raised in #29128 about this being safe. It turns out that we already do this on every Range except the transaction record's Range during normal intent resolution, so this isn't introducing any new concerns.

Only write AbortSpan entries if intents are removed

This reduces the frequency of AbortSpan entries that can be abandoned even without a transaction coordinator failure. Specifically, it protects against the case where intent resolution races with a transaction coordinator cleaning up its own transaction record and intents. This can happen for both aborted and committed transactions.

In the first case, a pusher might find a transaction's intent and then find its record to be aborted after that transaction had cleanly rolled back its own intents. Even though the transaction's coordinator had already cleaned up and potentially "unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when it goes to resolve the intents that were already cleaned up. These AbortSpan entries would be fully abandoned and have to wait out the GC.

Similarly, in the second case, the transaction might have committed. Here, the pushee might hit an intent and the txn coordinator might clean up and auto-GC its txn record before the pushee arrives at the txn record. Once the pushee gets there, it would mistake the txn for aborted (by design) and proceed to write an AbortSpan record where the intent it had once observed had been (not by design).

We can tell both of these cases by simply recognizing whether intent resolution actually succeeds. If intent resolution doesn't find an intent, then we might be in either case. That's fine, because we only need to ever poison the abort span if we actually remove an intent that could confuse a zombie transaction.

cockroach-teamcity · 2019-11-26T02:06:39Z

This change is

tbg

The fact that we clear when !Poison seems problematic. This isn't new behavior, but it looks worth fixing (though not necessarily in this PR).

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 3 of 3 files at r3, 15 of 15 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)

pkg/storage/batcheval/cmd_resolve_intent.go, line 42 at r4 (raw file):

		txnID = t.IntentTxn.ID
	}
	if status == roachpb.ABORTED {

Could add a comment here that we won't even always write the key in this case, but we can't know this from this code. Your call

pkg/storage/batcheval/transaction_test.go, line 190 at r2 (raw file):

		{
			// NOTE: this request doesn't make sense, but we handle it. An abort
			// span shouldn't be present if the transaction is still committable.

In practice the commit would run into the abort span, but this check is outside of the code being tested, right?

pkg/storage/batcheval/transaction_test.go, line 306 at r2 (raw file):

		},
		{
			name:   "resolve intent, txn aborted, no poison, intent missing, abort span present",

Isn't this problematic? If a txn gets pushed first but the corresponding non-poisoning ResolveIntent gets delayed, then the txn gets aborted and poisoning happens, won't the delayed request unpoison accidentally?

The direct blame is broken due to code movement, but I think this is a problem I introduced at some point. Worth filing an issue. The bool should (as always) have been an enum NOOP, CLEAR, POISON.

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)

pkg/storage/batcheval/cmd_resolve_intent.go, line 42 at r4 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Could add a comment here that we won't even always write the key in this case, but we can't know this from this code. Your call

Done.

pkg/storage/batcheval/transaction_test.go, line 190 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

In practice the commit would run into the abort span, but this check is outside of the code being tested, right?

Well, an abort span should be possible to write unless the txn is already aborted, in which case the EndTxn would necessarily fail.

pkg/storage/batcheval/transaction_test.go, line 306 at r2 (raw file):

If a txn gets pushed first but the corresponding non-poisoning ResolveIntent gets delayed

I don't think there's a case where a PushTxn triggers a ResolveIntent without a Poison flag (even for PUSH_TIMESTAMP), so I think the only case where a ResolveIntent request can have Poison=false is when it was initiated by the transaction coordinator itself or when it was initiated by GC. Maybe I'm missing something.

The fact that we clear when !Poison seems problematic.

Isn't this kind of the point? When !Poison, we're saying that we 're operating as the transaction and we're aware of the ABORT so we're able to clear. When were you thinking we'd set Poison = NOOP?

andreimatei

LGTM good stuff

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)

pkg/storage/batcheval/cmd_end_transaction.go, line 521 at r7 (raw file):

	}

	if WriteAbortSpanOnResolve(txn.Status) {

I like txn.Status == ABORTED better. In fact I'd get rid of this function.

pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r7 (raw file):

	if WriteAbortSpanOnResolve(txn.Status) {
		if err := SetAbortSpan(ctx, evalCtx, batch, ms, txn.TxnMeta, args.Poison); err != nil {

please add a comment that this surprisingly might clear the entry. Or better yet, split the SetAbortSpan() function in two. I think this function is the textbook example of a bool argument anti-pattern.

pkg/storage/batcheval/transaction_test.go, line 27 at r6 (raw file):

// TestSetAbortSpan tests the different ways that request can set, update, and
// delete AbortSpan entries.
func TestSetAbortSpan(t *testing.T) {

excellent test

pkg/storage/batcheval/transaction_test.go, line 43 at r6 (raw file):

	txn := roachpb.MakeTransaction("test", txnKey, 0, hlc.Timestamp{WallTime: 1}, 0)
	txnAbortSpanEntry := roachpb.AbortSpanEntry{

maybe name this newTxnAbortSpanEntry

pkg/storage/batcheval/transaction_test.go, line 50 at r6 (raw file):

	// Used to detect updates to the AbortSpan entry.
	txnPrev := txn.Clone()
	txnPrev.WriteTimestamp.Add(-1, 0)

can you remind the reader if the priority and write timestamp matter for anything or if they're just testing artifacts?

pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):

			// NOTE: this request doesn't make sense, but we handle it. An abort
			// span shouldn't be present if the transaction is still committable.
			name:   "end txn, commit, no poison, abort span present",

let's crash in this case

pkg/storage/batcheval/transaction_test.go, line 212 at r6 (raw file):

			// NOTE: this request doesn't make sense, but we handle it. An EndTxn
			// should never pass Commit = true and Poison = true.
			name:   "end txn, commit, poison, abort span present",

let's crash

pkg/storage/batcheval/transaction_test.go, line 262 at r6 (raw file):

			name: "resolve intent, txn pending, poison, intent missing, abort span missing",
			run: func(b engine.ReadWriter, rec EvalContext) error {
				return resolveIntent(b, rec, roachpb.PENDING, true /* poison */)

resolving an intent with PENDING doesn't make sense, right? Let's crash.

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @tbg)

pkg/storage/batcheval/transaction_test.go, line 306 at r2 (raw file):

I don't think there's a case where a PushTxn triggers a ResolveIntent without a Poison flag (even for PUSH_TIMESTAMP),

Heh, it turns out you're right, but only because I had this same thought in 2017 and made that change:

cockroach/pkg/storage/intentresolver/intent_resolver.go

Lines 265 to 278 in 844eac4

    
           // We always poison due to limitations of the API: not poisoning equals 
        
           // clearing the AbortSpan, and if our pushee transaction first got pushed 
        
           // for timestamp (by us), then (by someone else) aborted and poisoned, and 
        
           // then we run the below code, we're clearing the AbortSpan illegaly. 
        
           // Furthermore, even if our pushType is not PUSH_ABORT, we may have ended 
        
           // up with the responsibility to abort the intents (for example if we find 
        
           // the transaction aborted). 
        
           // 
        
           // To do better here, we need per-intent information on whether we need to 
        
           // poison. 
        
           if err := ir.ResolveIntents(ctx, resolveIntents, 
        
           	ResolveOptions{Wait: false, Poison: true}); err != nil { 
        
           	return cleanup, roachpb.NewError(err) 
        
           }

I was just mildly confused how this all worked (if we push an intent timestamp, certainly the txn should not get a TransactionAbortedError?) but it looks like even before your PR we only mutate the abort span when aborting. So in a roundabout way, we have already implemented the NOOP option. I don't love the messiness of this, but it seems to be working as intended already: if we're not removing intents, Poison is effectively ignored.

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)

pkg/storage/batcheval/cmd_end_transaction.go, line 521 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I like txn.Status == ABORTED better. In fact I'd get rid of this function.

But there's more to this function that just txn.Status == ABORTED.

pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please add a comment that this surprisingly might clear the entry. Or better yet, split the SetAbortSpan() function in two. I think this function is the textbook example of a bool argument anti-pattern.

How about renaming it to UpdateAbortSpan? I'd rather not pull the condition out to all callers and force them to make decisions that don't really concern them.

pkg/storage/batcheval/transaction_test.go, line 43 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

maybe name this newTxnAbortSpanEntry

Done.

pkg/storage/batcheval/transaction_test.go, line 50 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you remind the reader if the priority and write timestamp matter for anything or if they're just testing artifacts?

Good idea, done.

pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

let's crash in this case

If we're all about the KV layer providing an external API (which it sounds like we are) then I don't think we should crash on user input. This case doesn't do anything wrong, we just wouldn't even get to batch evaluation if there's an abort span present.

pkg/storage/batcheval/transaction_test.go, line 212 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

let's crash

Again, I don't think we should crash, but in this case, I could see some extra input validation and rejecting the request. Done.

pkg/storage/batcheval/transaction_test.go, line 262 at r6 (raw file):

resolving an intent with PENDING doesn't make sense, right?

No, it does. That's what happens during a PUSH_TIMESTAMP push.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)

pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we're all about the KV layer providing an external API (which it sounds like we are) then I don't think we should crash on user input. This case doesn't do anything wrong, we just wouldn't even get to batch evaluation if there's an abort span present.

Exactly - we wouldn't get to batch evaluation if there's an abort span. Even if the KV layer were to be treated as an "external" API (where I'd litigate the meaning of "external"), the KV layer isn't the request evaluation functions. Committing a txn with an abort span entry set would be a bug, and a bad one (wouldn't it?). So what are we testing here? The test says // Not aborted, don't touch abort span.. What does "not aborted" mean? The txn is aborted. Whether or not the EndTransaction updates the abort span or not in this case seems like a frivolous concern.
Do you like this test? If I understand it correctly, I think we should get rid of it and just leave a note here about this case not being possible. And if it's easy enough, I do think I'd make sense to refuse the request (or crash), I guess at the evaluateBatch() level. But that higher level function is beyond the concern of this test.

knz · 2019-12-28T11:50:16Z

I haven't looked into this PR yet but reflecting on the last two comments I thought I'd share my views:

if an API needs to reject some invalid combinations of inputs and it was not designed to return error, then panic() is the next best thing before log.Fatal or os.Exit.
in my book it's perfectly a-OK to implement unit tests that verify that an API rejects invalid input combinations. In this case, the test could ascertain that a panic is indeed thrown.
we've started to grow a habit to throw error objects using panic(), instead of strings, and this will pay dividend down the line. If done here, it also enables the test to verify the contents of the exception being thrown.

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)

pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):

What does "not aborted" mean?

Would you like it better if it said "not aborting"?

I think we should get rid of it and just leave a note here about this case not being possible

We could get rid of the test case, but I think there's value in enumerating all combinations of inputs here and just explaining the test cases that should never be hit in practice. It makes it easier to ensure that we've exhaustively tested all valid cases and at least considered the validity of invalid ones. What's the cost of keeping these two cases?

And if it's easy enough, I do think I'd make sense to refuse the request (or crash)

We certainly don't want to read from RocksDB just to perform a local assertion that is already caught further up the stack.

andreimatei

LGTM

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)

pkg/storage/batcheval/cmd_end_transaction.go, line 521 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

But there's more to this function that just txn.Status == ABORTED.

ack

pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

How about renaming it to UpdateAbortSpan? I'd rather not pull the condition out to all callers and force them to make decisions that don't really concern them.

UpdateAbortSpan sounds good.

pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):

What's the cost of keeping these two cases?

The biggest cost is paid when this test fails one day and I start wracking my brain about what it is testing.
If you want to enumerate, enumerate it as a comment. The way I see it, the only test to write (if any) is a test that the request is rejected - at whatever layer.
But I'll shut up now.

We certainly don't want to read from RocksDB just to perform a local assertion that is already caught further up the stack.

Right, I wasn't suggesting we check anything in EndTxn. I was saying we should do it in evaluateBatch() - as I now see that we already do.

pkg/storage/batcheval/transaction_test.go, line 212 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Again, I don't think we should crash, but in this case, I could see some extra input validation and rejecting the request. Done.

Thanks. I think the comment is no longer necessary on this test and the one above, now that we expect an error. Or change it to It is an error to....

pkg/storage/batcheval/transaction_test.go, line 262 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

resolving an intent with PENDING doesn't make sense, right?

No, it does. That's what happens during a PUSH_TIMESTAMP push.

So what "doesn't make sense" here then? Can you add more words to the comment?

…st.go This mirrors `cmd_truncate_log.go`.

Up until this point, we had no testing around this. For instance, not a single test would catch it if we removed the deletion path from SetAbortSpan.

…ction request Fixes cockroachdb#29128. Before this change, an EndTransaction request sent to rollback a transaction record would not remove any AbortSpan entries, even if its own Poison flag was set to false. This allowed AbortSpan entries to leak. This commit fixes this behavior by removing the AbortSpan entry in this case. There were concerns raised in cockroachdb#29128 about this being safe. It turns out that we already do this on every Range except the transaction record's Range, so this isn't introducing any new concerns. Release note (bug fix): AbortSpan records are now cleaned up more aggresively when doing so is known to be safe.

This reduces the frequency of AbortSpan entries that can be abandoned even without a transaction coordinator failure. Specifically, it protects against the case where intent resolution races with a transaction coordinator cleaning up its own transaction record and intents. This can happen for both aborted and committed transactions. In the first case, a pusher might find a transaction's intent and then find its record to be aborted after that transaction had cleanly rolled back its own intents. Even though the transaction's coordinator had already cleaned up and potentially "unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when it goes to resolve the intents that were already cleaned up. These AbortSpan entries would be fully abandoned and have to wait out the GC. Similarly, in the second case, the transaction might have committed. Here, the pushee might hit an intent and the txn coordinator might clean up and auto-GC its txn record before the pushee arrives at the txn record. Once the pushee gets there, it would mistake the txn for aborted (by design) and proceed to write an AbortSpan record where the intent it had once observed had been (not by design). We can tell both of these cases by simply recognizing whether intent resolution actually succeeds. If intent resolution doesn't find an intent, then we might be in either case. That's fine, because we only need to ever poison the abort span if we actually remove an intent that could confuse a zombie transaction. Release note: None

Return an error if not. Release note: None

Purely a rename. Release note: None

Release note: None

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)

pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

UpdateAbortSpan sounds good.

Done.

pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

What's the cost of keeping these two cases?

The biggest cost is paid when this test fails one day and I start wracking my brain about what it is testing.
If you want to enumerate, enumerate it as a comment. The way I see it, the only test to write (if any) is a test that the request is rejected - at whatever layer.
But I'll shut up now.

We certainly don't want to read from RocksDB just to perform a local assertion that is already caught further up the stack.

Right, I wasn't suggesting we check anything in EndTxn. I was saying we should do it in evaluateBatch() - as I now see that we already do.

Ok, I removed the test cases.

pkg/storage/batcheval/transaction_test.go, line 212 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Thanks. I think the comment is no longer necessary on this test and the one above, now that we expect an error. Or change it to It is an error to....

Done.

pkg/storage/batcheval/transaction_test.go, line 262 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

So what "doesn't make sense" here then? Can you add more words to the comment?

I removed those comments. They actually do make sense because of the complexity @tbg brought up above.

andreimatei

🚢 it

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)

nvanbenschoten · 2020-01-02T20:48:54Z

bors r+

42765: storage: write to AbortSpan less, clean up AbortSpan more r=nvanbenschoten a=nvanbenschoten This PR introduces two improvements to our handling of the AbortSpan. It also introduces the first set of comprehensive testing around AbortSpan entry state transitions, which was sorely missing. This comes after a few different customer issues that at least at first glance appeared to be AbortSpan leaks. There's still more to do to resolve those, mostly by improving GC, but anything we can do here on the frontend to reduce the number of AbortSpan entries that need to be GCed in the first place helps. ### Clean up span on non-poisoning, aborting EndTransaction request Fixes #29128. Before this change, an EndTransaction request sent to rollback a transaction record would not remove any AbortSpan entries, even if its own Poison flag was set to false. This allowed AbortSpan entries to leak. This commit fixes this behavior by removing the AbortSpan entry in this case. There were concerns raised in #29128 about this being safe. It turns out that we already do this on every Range except the transaction record's Range during normal intent resolution, so this isn't introducing any new concerns. ### Only write AbortSpan entries if intents are removed This reduces the frequency of AbortSpan entries that can be abandoned even without a transaction coordinator failure. Specifically, it protects against the case where intent resolution races with a transaction coordinator cleaning up its own transaction record and intents. This can happen for both aborted and committed transactions. In the first case, a pusher might find a transaction's intent and then find its record to be aborted after that transaction had cleanly rolled back its own intents. Even though the transaction's coordinator had already cleaned up and potentially "unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when it goes to resolve the intents that were already cleaned up. These AbortSpan entries would be fully abandoned and have to wait out the GC. Similarly, in the second case, the transaction might have committed. Here, the pushee might hit an intent and the txn coordinator might clean up and auto-GC its txn record before the pushee arrives at the txn record. Once the pushee gets there, it would mistake the txn for aborted (by design) and proceed to write an AbortSpan record where the intent it had once observed had been (not by design). We can tell both of these cases by simply recognizing whether intent resolution actually succeeds. If intent resolution doesn't find an intent, then we might be in either case. That's fine, because we only need to ever poison the abort span if we actually remove an intent that could confuse a zombie transaction. 43383: storage: stop modifying requests and returning responses from TxnWaitQueue r=nvanbenschoten a=nvanbenschoten This PR contains three pieces of cleanup that pave the way for pulling the TxnWaitQueue underneath the upcoming `pkg/storage/concurrency` package. It removes all cases where the queue mandates an update to the PushTxnRequest that entered it. It also removes all cases where the queue could field requests directly and return results for them. This restricts the TxnWaitQueue's interface and allows it to be pushed into the concurrency manager abstraction without bloating this new abstraction. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

craig · 2020-01-02T21:19:59Z

Build succeeded

And happy new year from bors! 🎉

GitHub CI (Cockroach)

tbg · 2020-02-28T12:20:08Z

In light of recent events, how nervous would we be about backporting this?

nvanbenschoten · 2020-02-29T22:23:42Z

I feel ok about backporting this to v19.2. Almost all of this was testing, and the few non-testing changes that went in here seem safe.

Here's the PR: #45553. It was mostly clean, and the parts that weren't were because of renames and other non-behavioral changes.

If there is "a lot" of data in Sys{Bytes,Count}, then we are likely experiencing a large abort span. The abort span is not supposed to become that large, but it does happen and causes stability fallout, usually due to a combination of shortcomings: 1. there's no trigger for GC based on abort span size alone (before this commit) 2. transaction aborts tended to create unnecessary abort span entries, fixed (and 19.2-backported) in: cockroachdb#42765 3. aborting transactions in a busy loop: cockroachdb#38088 (and we suspect this also happens in user apps occasionally) 4. large snapshots would never complete due to the queue time limits (addressed in cockroachdb#44952). In an ideal world, we would factor in the abort span into this method directly, but until then the condition guarding this block will do. At worst, there is some other reason for SysBytes become that large while also incurring a large SysCount, but I'm not sure how this would happen. The only other things in this span are the versioned range descriptors (which follow regular GC and it's only ever adding a SysCount of one), or transaction records (which expire like abort span records). Release note (bug fix): Range garbage collection will now trigger based on a large abort span, adding defense-in-depth against ranges growing large (and eventually unstable).

45573: storage: trigger GC based on SysCount/SysBytes r=ajwerner a=tbg If there is "a lot" of data in Sys{Bytes,Count}, then we are likely experiencing a large abort span. The abort span is not supposed to become that large, but it does happen and causes stability fallout, usually due to a combination of shortcomings: 1. there's no trigger for GC based on abort span size alone (before this commit) 2. transaction aborts tended to create unnecessary abort span entries, fixed (and 19.2-backported) in: #42765 3. aborting transactions in a busy loop: #38088 (and we suspect this also happens in user apps occasionally) 4. large snapshots would never complete due to the queue time limits (addressed in #44952). In an ideal world, we would factor in the abort span into this method directly, but until then the condition guarding this block will do. At worst, there is some other reason for SysBytes become that large while also incurring a large SysCount, but I'm not sure how this would happen. The only other things in this span are the versioned range descriptors (which follow regular GC and it's only ever adding a SysCount of one), or transaction records (which expire like abort span records). Release note (bug fix): Range garbage collection will now trigger based on a large abort span, adding defense-in-depth against ranges growing large (and eventually unstable). Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>

If there is "a lot" of data in Sys{Bytes,Count}, then we are likely experiencing a large abort span. The abort span is not supposed to become that large, but it does happen and causes stability fallout, usually due to a combination of shortcomings: 1. there's no trigger for GC based on abort span size alone (before this commit) 2. transaction aborts tended to create unnecessary abort span entries, fixed (and 19.2-backported) in: cockroachdb#42765 3. aborting transactions in a busy loop: cockroachdb#38088 (and we suspect this also happens in user apps occasionally) 4. large snapshots would never complete due to the queue time limits (addressed in cockroachdb#44952). In an ideal world, we would factor in the abort span into this method directly, but until then the condition guarding this block will do. At worst, there is some other reason for SysBytes become that large while also incurring a large SysCount, but I'm not sure how this would happen. The only other things in this span are the versioned range descriptors (which follow regular GC and it's only ever adding a SysCount of one), or transaction records (which expire like abort span records). Release note (bug fix): Range garbage collection will now trigger based on a large abort span, adding defense-in-depth against ranges growing large (and eventually unstable).

Fixes cockroachdb#43707. Fixes cockroachdb#48046. Fixes cockroachdb#48189. Part of the change made by cockroachdb#42765 was to clear AbortSpan entries on non-poisoning, aborting EndTxn requests. Specifically, this change was made in 1328787. The change forgot to update the corresponding span declaration logic to reflect the fact that we were now writing to the AbortSpan in cases where we previously weren't. This was triggering an assertion in race builds that tried to catch this kind of undeclared span access. The assertion failure was very rare because it required the following conditions to all be met: 1. running a test with the race detector enabled 2. a txn (A) must have been aborted by another txn (B) 3. txn B must have cleared an intent on txn A's transaction record range 4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT) We should backport this when we get a change (once v20.1.0 has stabilized), but I don't expect that this could actually cause any issues. The AbortSpan update was strictly a matter of performance and we should never be racing with another request that is trying to read the same AbortSpan entry.

48190: sql: inject tenant ID in sqlServerArgs, pass through ExecutorConfig r=nvanbenschoten a=nvanbenschoten Fixes #47903. Informs #48123. Also known as "the grand plumbing", this change replaces a few instances of `TODOSQLCodec` in `pkg/sql/sqlbase/index_encoding.go` and watches the house of cards fall apart. It then glues the world back together, this time using a properly injected tenant-bound SQLCodec to encode and decode all SQL table keys. A tenant ID field is added to `sqlServerArgs`. This is used to construct a tenant-bound `keys.SQLCodec` during server creation. This codec morally lives on the `sql.ExecutorConfig`. In practice, it is also copied onto `tree.EvalContext` and `execinfra.ServerConfig` to help carry it around. SQL code is adapted to use this codec whenever it needs to encode or decode keys. If all tests pass after this refactor, there is a good chance it got things right. This is because any use of an uninitialized SQLCodec will panic immediately when the codec is first used. This was helpful in ensuring that it was properly plumbed everywhere. 48245: kv: declare write access to AbortSpan on all aborting EndTxn reqs r=nvanbenschoten a=nvanbenschoten Fixes #43707. Fixes #48046. Fixes #48189. Part of the change made by #42765 was to clear AbortSpan entries on non-poisoning, aborting EndTxn requests. Specifically, this change was made in 1328787. The change forgot to update the corresponding span declaration logic to reflect the fact that we were now writing to the AbortSpan in cases where we previously weren't. This was triggering an assertion in race builds that tried to catch this kind of undeclared span access. The assertion failure was very rare because it required the following conditions to all be met: 1. running a test with the race detector enabled 2. a txn (A) must have been aborted by another txn (B) 3. txn B must have cleared an intent on txn A's transaction record range 4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT) We should backport this when we get a change (once v20.1.0 has stabilized), but I don't expect that this could actually cause any issues. The AbortSpan update was strictly a matter of performance and we should never be racing with another request that is trying to read the same AbortSpan entry. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

Fixes cockroachdb#43707. Fixes cockroachdb#48046. Fixes cockroachdb#48189. Part of the change made by cockroachdb#42765 was to clear AbortSpan entries on non-poisoning, aborting EndTxn requests. Specifically, this change was made in 1328787. The change forgot to update the corresponding span declaration logic to reflect the fact that we were now writing to the AbortSpan in cases where we previously weren't. This was triggering an assertion in race builds that tried to catch this kind of undeclared span access. The assertion failure was very rare because it required the following conditions to all be met: 1. running a test with the race detector enabled 2. a txn (A) must have been aborted by another txn (B) 3. txn B must have cleared an intent on txn A's transaction record range 4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT) We should backport this when we get a change (once v20.1.0 has stabilized), but I don't expect that this could actually cause any issues. The AbortSpan update was strictly a matter of performance and we should never be racing with another request that is trying to read the same AbortSpan entry.

nvanbenschoten requested a review from andreimatei November 26, 2019 02:06

nvanbenschoten requested a review from a team as a code owner November 26, 2019 02:06

tbg approved these changes Dec 9, 2019

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/abortSpanResolve branch from a4e578f to a8c828c Compare December 12, 2019 00:14

nvanbenschoten commented Dec 12, 2019

View reviewed changes

andreimatei approved these changes Dec 12, 2019

View reviewed changes

tbg approved these changes Dec 12, 2019

View reviewed changes

tbg self-requested a review December 12, 2019 08:16

nvanbenschoten force-pushed the nvanbenschoten/abortSpanResolve branch from a8c828c to 3d373eb Compare December 27, 2019 23:47

nvanbenschoten commented Dec 27, 2019

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/abortSpanResolve branch 2 times, most recently from 77d4805 to fbc5c1f Compare December 28, 2019 03:49

andreimatei reviewed Dec 28, 2019

View reviewed changes

nvanbenschoten commented Jan 2, 2020

View reviewed changes

andreimatei approved these changes Jan 2, 2020

View reviewed changes

nvanbenschoten added 7 commits January 2, 2020 14:41

storage/batcheval: rename truncate_log_test.go to cmd_truncate_log_te…

0958a6e

…st.go This mirrors `cmd_truncate_log.go`.

storage/batcheval: test requests that modify the AbortSpan

853d087

Up until this point, we had no testing around this. For instance, not a single test would catch it if we removed the deletion path from SetAbortSpan.

storage: assert that Poison=false for EndTxn(Commit=true) reqs

fce02e7

Return an error if not. Release note: None

storage/batcheval: rename SetAbortSpan to UpdateAbortSpan

663eff0

Purely a rename. Release note: None

storage/batcheval: remove impossible cases in TestUpdateAbortSpan

c7bbc4f

Release note: None

nvanbenschoten force-pushed the nvanbenschoten/abortSpanResolve branch from fbc5c1f to c7bbc4f Compare January 2, 2020 20:13

nvanbenschoten commented Jan 2, 2020

View reviewed changes

andreimatei reviewed Jan 2, 2020

View reviewed changes

craig bot merged commit c7bbc4f into cockroachdb:master Jan 2, 2020

nvanbenschoten deleted the nvanbenschoten/abortSpanResolve branch January 2, 2020 21:27

nvanbenschoten mentioned this pull request Feb 29, 2020

release-19.2: storage: write to AbortSpan less, clean up AbortSpan more #45553

Merged

tbg mentioned this pull request Mar 2, 2020

storage: trigger GC based on SysCount/SysBytes #45573

Merged

tbg mentioned this pull request Mar 5, 2020

backport-19.2: storage: trigger GC based on SysCount/SysBytes #45744

Merged

nvanbenschoten mentioned this pull request Apr 30, 2020

ccl/importccl: TestImportPgDump failed #48189

Closed

nvanbenschoten mentioned this pull request Apr 30, 2020

kv: declare write access to AbortSpan on all aborting EndTxn reqs #48245

Merged

nvanbenschoten mentioned this pull request May 4, 2020

release-20.1: kv: declare write access to AbortSpan on all aborting EndTxn reqs #48410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: write to AbortSpan less, clean up AbortSpan more #42765

storage: write to AbortSpan less, clean up AbortSpan more #42765

nvanbenschoten commented Nov 26, 2019 •

edited

Loading

cockroach-teamcity commented Nov 26, 2019

tbg left a comment

nvanbenschoten left a comment

andreimatei left a comment

tbg left a comment

nvanbenschoten left a comment

andreimatei left a comment

knz commented Dec 28, 2019

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten commented Jan 2, 2020

craig bot commented Jan 2, 2020

tbg commented Feb 28, 2020

nvanbenschoten commented Feb 29, 2020

	// We always poison due to limitations of the API: not poisoning equals
	// clearing the AbortSpan, and if our pushee transaction first got pushed
	// for timestamp (by us), then (by someone else) aborted and poisoned, and
	// then we run the below code, we're clearing the AbortSpan illegaly.
	// Furthermore, even if our pushType is not PUSH_ABORT, we may have ended
	// up with the responsibility to abort the intents (for example if we find
	// the transaction aborted).
	//
	// To do better here, we need per-intent information on whether we need to
	// poison.
	if err := ir.ResolveIntents(ctx, resolveIntents,
	ResolveOptions{Wait: false, Poison: true}); err != nil {
	return cleanup, roachpb.NewError(err)
	}

storage: write to AbortSpan less, clean up AbortSpan more #42765

storage: write to AbortSpan less, clean up AbortSpan more #42765

Conversation

nvanbenschoten commented Nov 26, 2019 • edited Loading

Clean up span on non-poisoning, aborting EndTransaction request

Only write AbortSpan entries if intents are removed

cockroach-teamcity commented Nov 26, 2019

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

knz commented Dec 28, 2019

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Jan 2, 2020

craig bot commented Jan 2, 2020

Build succeeded

tbg commented Feb 28, 2020

nvanbenschoten commented Feb 29, 2020

nvanbenschoten commented Nov 26, 2019 •

edited

Loading