Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

closedts: shorten target_duration from 30s to 5s #39643

Merged

Conversation

ajwerner
Copy link
Contributor

This PR dramatically shortens the closed timestamp target interval. With this
setting the experimental_follower_read_timestamp() will now be 8.5 seconds in
the past as opposed to the current 48 seconds in the past. This relatively
aggressive setting of 5s is intended to shake out issues as we head into the stability
period.

This value has been verified to work on the schemachange/mixed/tpcc roachtest
introduced in #39096 and for vanilla TPC-C runs around where we have
established a baseline in the past using tpccbench. Specifically several runs
at 2300 warehouses on 3x c5d.4xlarge nodes which is right at the passing
boundary without the change were run only a very small difference in
efficiency or tail latency was observed.

It seems reasonable to attempt to live with this value on master for a while
and see what happens.

Fixes #37083.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner
Copy link
Contributor Author

cc @awoods187

@awoods187
Copy link
Contributor

Awesome!

@tbg
Copy link
Member

tbg commented Aug 14, 2019

Are we concerned about transactions taking more than 5s retrying endlessly? This was a problem previously, though only with 48s+ transaction. Users' batch jobs may go haywire. Arguably if the closed timestamp prevents a txn once, it will do so again (in the absence of contention at least).

@tbg
Copy link
Member

tbg commented Aug 14, 2019

Also, please run a few manual roachprod-stressrace invocations. This sort of change is probably going to give us some fallout, and since master already is a bit flaky we should weed that out before.

@ajwerner
Copy link
Contributor Author

Are we concerned about transactions taking more than 5s retrying endlessly? This was a problem previously, though only with 48s+ transaction. Users' batch jobs may go haywire. Arguably if the closed timestamp prevents a txn once, it will do so again (in the absence of contention at least).

Yes, IIUC that's true for transactions which observe the clock. For transactions which don't, shouldn't they get pushed and then refresh, right?

Also, please run a few manual roachprod-stressrace invocations. This sort of change is probably going to give us some fallout, and since master already is a bit flaky we should weed that out before.

TestNodeLivenessSetDecommissioning fails on master and also this branch almost instantly:

I190814 13:16:32.434790 1641 storage/store_snapshot.go:786  [replicate,s1,r6/1:/Table/{SystemCon…-11}] sending LEARNER snapshot 0243daa0 at applied index 16                                                                                                                   
I190814 13:16:32.435061 1641 storage/store_snapshot.go:829  [replicate,s1,r6/1:/Table/{SystemCon…-11}] streamed snapshot to (n2,s2):2: kv pairs: 44, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s                                                                            
I190814 13:16:32.435720 2327 storage/replica_raftstorage.go:823  [s2,r6/2:{-}] applying LEARNER snapshot at index 16 (id=0243daa0, encoded size=8206, 1 rocksdb batches, 0 log entries)                                                                                        
I190814 13:16:32.436232 2327 storage/replica_raftstorage.go:829  [s2,r6/2:/Table/{SystemCon…-11}] applied LEARNER snapshot in 0ms [clear=0ms batch=0ms entries=0ms commit=0ms]                                                                                                 
I190814 13:16:32.437799 1641 storage/replica_command.go:1198  [replicate,s1,r6/1:/Table/{SystemCon…-11}] change replicas (ADD_REPLICA (n2,s2):2): existing descriptor r6:/Table/{SystemConfigSpan/Start-11} [(n1,s1):1, (n2,s2):2LEARNER, next=3, gen=1]  

I'll wait until the stress passes on critical packages before attempting to merge this.

@ajwerner
Copy link
Contributor Author

The test failure seems to not be directly related to learners as something similar happens when learners are disabled. I'm bisecting before typing up an issue.

I190814 13:38:29.275572 4251 storage/node_liveness.go:474  [liveness-hb] heartbeat failed on epoch increment; retrying
I190814 13:38:29.275719 2646 storage/store_snapshot.go:786  [replicate,s1,r6/1:/Table/{SystemCon…-11}] sending PREEMPTIVE snapshot 4e31ebe4 at applied index 13
I190814 13:38:29.275901 2646 storage/store_snapshot.go:829  [replicate,s1,r6/1:/Table/{SystemCon…-11}] streamed snapshot to (n2,s2):?: kv pairs: 42, log entries: 0, rate-limit: 8.0 MiB/sec, 0.01s
I190814 13:38:29.276288 4014 storage/replica_raftstorage.go:823  [s2,r6/?:{-}] applying PREEMPTIVE snapshot at index 13 (id=4e31ebe4, encoded size=8023, 1 rocksdb batches, 0 log entries)
I190814 13:38:29.277049 4014 storage/replica_raftstorage.go:829  [s2,r6/?:/Table/{SystemCon…-11}] applied PREEMPTIVE snapshot in 1ms [clear=0ms batch=0ms entries=0ms commit=1ms]
I190814 13:38:29.277333 2646 storage/replica_command.go:1198  [replicate,s1,r6/1:/Table/{SystemCon…-11}] change replicas (ADD_REPLICA (n2,s2):2): existing descriptor r6:/Table/{SystemConfigSpan/Start-11} [(n1,s1):1, next=2, gen=0]
I190814 13:38:29.278707 2646 storage/replica_raft.go:291  [replicate,s1,r6/1:/Table/{SystemCon…-11},txn=4a2f147a] proposing ADD_REPLICA[(n2,s2):2]: after=[(n1,s1):1 (n2,s2):2] next=3
I190814 13:38:29.282106 2646 storage/store_snapshot.go:786  [replicate,s1,r6/1:/Table/{SystemCon…-11}] sending PREEMPTIVE snapshot 675e4e73 at applied index 16
I190814 13:38:29.296762 3979 storage/replica_raftstorage.go:823  [s3,r6/?:{-}] applying PREEMPTIVE snapshot at index 16 (id=675e4e73, encoded size=8204, 1 rocksdb batches, 0 log entries)
I190814 13:38:29.297540 3979 storage/replica_raftstorage.go:829  [s3,r6/?:/Table/{SystemCon…-11}] applied PREEMPTIVE snapshot in 1ms [clear=0ms batch=0ms entries=0ms commit=1ms]
I190814 13:38:29.297944 2646 storage/store_snapshot.go:829  [replicate,s1,r6/1:/Table/{SystemCon…-11}] streamed snapshot to (n3,s3):?: kv pairs: 44, log entries: 0, rate-limit: 8.0 MiB/sec, 0.02s
I190814 13:38:29.298187 2646 storage/replica_command.go:1198  [replicate,s1,r6/1:/Table/{SystemCon…-11}] change replicas (ADD_REPLICA (n3,s3):3): existing descriptor r6:/Table/{SystemConfigSpan/Start-11} [(n1,s1):1, (n2,s2):2, next=3, gen=1]
F190814 13:38:29.303312 4123 storage/replica_raft.go:668  [s2,r6/2:{-}] setting non-zero HardState.Commit on uninitialized replica [n2,s2,r6/2:{-}]. HS={Term:6 Vote:0 Commit:16 XXX_unrecognized:[]}
goroutine 4123 [running]:

@tbg
Copy link
Member

tbg commented Aug 14, 2019

Yes, IIUC that's true for transactions which observe the clock. For transactions which don't, shouldn't they get pushed and then refresh, right?

Yes, though if the txn takes more than 5s, the refresh may fail. Definitely txns which don't read are safe.

The test failure seems to not be directly related to learners as something similar happens when learners are disabled. I'm bisecting before typing up an issue.

Uhoh, that doesn't look good. Curious what you'll find.

@tbg
Copy link
Member

tbg commented Aug 14, 2019

(My hope is that this is #39604)

@ajwerner
Copy link
Contributor Author

(My hope is that this is #39604)

I don't think so. This reproduces as far back as a8d548f which is months old. The bisecting continues.

@ajwerner
Copy link
Contributor Author

I've been stressing things and saw:

    --- FAIL: TestRangeTransferLeaseExpirationBased/DrainTransfer (0.17s)
        client_replica_test.go:568: change replicas of r1 failed: descriptor changed: [expected] r1:/M{in-ax} [(n1,s1):1, (n2,s2):2LEARNER, next=3, gen=1] != [actual] r1:/M{in-ax} [(n1,s1):1, next=3, gen=2]

Which I don't think is related.

Stressing it a while longer and then I'm going to bors this. Please don't hesitate to revert it.

ajwerner added a commit to ajwerner/cockroach that referenced this pull request Aug 23, 2019
While stressing storage for cockroachdb#39643 I encountered
TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic
failing under stress. It also complained about the parent testing.T
having been failed when then using a child so fixed that too though
in a perhaps messy way.

```
--- FAIL: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic (2.29s)
    --- PASS: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic/initial_run (0.14s)
    client_test.go:1280: [NotLeaseHolderError] r1: replica (n1,s1):1 not lease holder; current lease is repl=(n3,s3):3 seq=4 start=1566532897.322355813,1 exp=1566532898.378802707,0 pro=1566532897.478824123,0
    --- FAIL: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic/after_restart (1.02s)
        testing.go:820: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
```

Release note: None
@ajwerner
Copy link
Contributor Author

--- FAIL: TestStoreRangeMergeDuringShutdown (0.15s)
    client_merge_test.go:3209: expected "failed to send RPC: sending to all 1 replicas failed; last error: (err: node unavailable; try another peer) <nil>" error, but got not lease holder

I also stressed master just to see what happens and observed some other flakes:

    --- FAIL: TestRangeTransferLeaseExpirationBased/DrainTransferWithExtension (0.17s)
        client_replica_test.go:568: change replicas of r1 failed: descriptor changed: [expected] r1:/M{in-ax} [(n1,s1):1, (n2,s2):2LEARNER, next=3, gen=1] != [actual] r1:/M{in-ax} [(n1,s1):1, next=3, gen=2]

and

--- FAIL: TestInitRaftGroupOnRequest (0.58s)
    client_raft_test.go:3903: expected NotLeaseHolderError; got <nil>

and

--- FAIL: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic (2.29s)
    --- PASS: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic/initial_run (0.14s)
    client_test.go:1280: [NotLeaseHolderError] r1: replica (n1,s1):1 not lease holder; current lease is repl=(n3,s3):3 seq=4 start=1566532897.322355813,1 exp=1566532898.378802707,0 pro=1566532897.478824123,0
    --- FAIL: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic/after_restart (1.02s)
        testing.go:820: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test

(#39838)

Stressing storage and fixing the various rare failures would be good. In the meantime it's not clear that this made anything worse. If it becomes apparent that's not true, please do revert it.

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 23, 2019

Build failed

@ajwerner ajwerner force-pushed the ajwerner/shorten-closed-ts-target-interval branch from 9c29e06 to a3d077c Compare August 23, 2019 05:34
@ajwerner
Copy link
Contributor Author

Fixed a constant in the follower read test.

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 23, 2019

Build failed

This PR dramatically shortens the closed timestamp target interval. With this
setting the experimental_follower_read_timestamp() will now be 3.7 seconds in
the past as opposed to the current 48 seconds in the past. This relatively
aggressive setting of 5s is intended to shake out issues.

This value has been verified to work on the `schemachange/mixed/tpcc` roachtest
introduced in cockroachdb#39096 and for vanilla TPC-C runs around where we have
established a baseline in the past using tpccbench. Specifically several runs
at 2300 warehouses on 3x c5d.4xlarge nodes which is right at the passing
boundary without the change were run only a very small difference in
efficiency or tail latency was observed.

It seems reasonable to attempt to live with this value on master for a while
and see what happens.

Fixes cockroachdb#37083.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/shorten-closed-ts-target-interval branch from a3d077c to 3e97e57 Compare August 23, 2019 05:48
@ajwerner
Copy link
Contributor Author

Generated the settings html file.

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 23, 2019

Build failed

@ajwerner
Copy link
Contributor Author

Flaked on #39610

bors r+

craig bot pushed a commit that referenced this pull request Aug 23, 2019
39643: closedts: shorten target_duration from 30s to 5s r=ajwerner a=ajwerner

This PR dramatically shortens the closed timestamp target interval. With this
setting the experimental_follower_read_timestamp() will now be 8.5 seconds in
the past as opposed to the current 48 seconds in the past. This relatively
aggressive setting of 5s is intended to shake out issues as we head into the stability
period. 

This value has been verified to work on the `schemachange/mixed/tpcc` roachtest
introduced in #39096 and for vanilla TPC-C runs around where we have
established a baseline in the past using tpccbench. Specifically several runs
at 2300 warehouses on 3x c5d.4xlarge nodes which is right at the passing
boundary without the change were run only a very small difference in
efficiency or tail latency was observed.

It seems reasonable to attempt to live with this value on master for a while
and see what happens.

Fixes #37083.

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Aug 23, 2019

Build succeeded

@craig craig bot merged commit 3e97e57 into cockroachdb:master Aug 23, 2019
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 3, 2019
While stressing storage for cockroachdb#39643 I encountered
TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic
failing under stress. It also complained about the parent testing.T
having been failed when then using a child so fixed that too though
in a perhaps messy way.

```
--- FAIL: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic (2.29s)
    --- PASS: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic/initial_run (0.14s)
    client_test.go:1280: [NotLeaseHolderError] r1: replica (n1,s1):1 not lease holder; current lease is repl=(n3,s3):3 seq=4 start=1566532897.322355813,1 exp=1566532898.378802707,0 pro=1566532897.478824123,0
    --- FAIL: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic/after_restart (1.02s)
        testing.go:820: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
```

Release note: None
craig bot pushed a commit that referenced this pull request Sep 3, 2019
39832: changefeedccl: allow base64-encoded client certificate r=rolandcrosby a=rolandcrosby

Adds `client_cert` and `client_key` options to the `kafka://`
changefeed URI scheme. Works like the existing `ca_cert` option:
the user base64-encodes the contents of a PEM certificate and
private key, and passes those base64 values as parameters in the
Kafka URI.

Fixes #39817.

Release note (enterprise change): Client certificates are now
supported for Kafka changefeed authentication.

39838: storage: fix stress flake r=ajwerner a=ajwerner

While stressing storage for #39643 I encountered
TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic
failing under stress. It also complained about the parent testing.T
having been failed when then using a child so fixed that too though
in a perhaps messy way.

```
--- FAIL: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic (2.29s)
    --- PASS: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic/initial_run (0.14s)
    client_test.go:1280: [NotLeaseHolderError] r1: replica (n1,s1):1 not lease holder; current lease is repl=(n3,s3):3 seq=4 start=1566532897.322355813,1 exp=1566532898.378802707,0 pro=1566532897.478824123,0
    --- FAIL: TestDefaultConnectionDisruptionDoesNotInterfereWithSystemTraffic/after_restart (1.02s)
        testing.go:820: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
```

Release note: None

Co-authored-by: Roland Crosby <roland@cockroachlabs.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 5, 2019
This reverts commit 3e97e57.

The last couple of weeks of experience show that tests become flakey with
the target_duration set to 5s. This isn't overly surprising given that updates
to table descriptors observe their own timestamp and thus can never be
refreshed.

See https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/structured.go#L1496-L1509

References cockroachdb#39643.

Release note: None
craig bot pushed a commit that referenced this pull request Sep 5, 2019
40527: Revert "closedts: shorten target_duration from 30s to 5s" r=andy-kimball a=ajwerner

This reverts commit 3e97e57.

The last couple of weeks of experience show that tests become flakey with
the target_duration set to 5s. This isn't overly surprising given that updates
to table descriptors observe their own timestamp and thus can never be
refreshed.

See https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/structured.go#L1496-L1509

References #39643.

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

closedts: shorten default closed timestamp interval
5 participants