Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/cloud-sink-gcs/assume-role failed #96330

Closed
cockroach-teamcity opened this issue Feb 1, 2023 · 8 comments
Closed

roachtest: cdc/cloud-sink-gcs/assume-role failed #96330

cockroach-teamcity opened this issue Feb 1, 2023 · 8 comments
Assignees
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-cdc
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 1, 2023

roachtest.cdc/cloud-sink-gcs/assume-role failed with artifacts on release-22.2 @ daf17897df96e2fbf9475fa4b4ebe235d16c9b62:

test artifacts and logs in: /artifacts/cdc/cloud-sink-gcs/assume-role/run_1
(test_impl.go:292).Fatalf: max latency was more than allowed: 1m16.39589613s vs 1m0s

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-24069

Epic CRDB-11732

@cockroach-teamcity cockroach-teamcity added branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 1, 2023
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Feb 1, 2023
@blathers-crl blathers-crl bot added the T-cdc label Feb 1, 2023
@miretskiy miretskiy removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Feb 6, 2023
@jayshrivastava
Copy link
Contributor

jayshrivastava commented Feb 6, 2023

I suspect this is happening. There are 3 states when starting the changefeed:

  1. Initial scan from the beginning of time to the statement time
  2. Catchup scan from the statement time to the present time
  3. Normal operation from the present time onwards

The latency verifier only waits for the initial scan to finish and starts measuring latencies from the catchup scan onward. If the initial scan takes long, the catchup scan may have a lot of rows to cover since we are running a workload on the table. The catchup scan emits NO CHECKPOINTS. The latency verifier observes the latency based on the highwater from the initial scan. This observed latency increases as the catchup scan continues. We should update the latency verifier to wait for the catchup scan to finish. edit: we can also increase the concurrency of the catchup scan so it goes faster.

@jayshrivastava
Copy link
Contributor

jayshrivastava commented Feb 6, 2023

The latencies for a passing run and failed run are very different.

Failing:

changefeed: 07:19:28 cdc.go:1695: initial scan completed: latency 1.076099793s
07:19:29 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 2.1s; max steady latency so far 2.1s
07:19:39 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.089s; max steady latency so far 12.089s
07:19:50 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 22.337s; max steady latency so far 22.337s
07:20:00 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 23.33s; max steady latency so far 23.361s
07:20:10 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 33.58s; max steady latency so far 33.58s
07:20:20 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 43.83s; max steady latency so far 43.83s
07:20:31 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 54.077s; max steady latency so far 54.077s
07:20:41 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 1m4.326s; max steady latency so far 1m4.326s
07:20:51 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 1m1.026s; max steady latency so far 1m4.326s
07:21:01 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 1m11.272s; max steady latency so far 1m11.272s
07:21:12 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 16.463s; max steady latency so far 1m16.395s
07:21:22 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 26.708s; max steady latency so far 1m16.395s
07:21:32 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 25.801s; max steady latency so far 1m16.395s
07:21:42 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 24.264s; max steady latency so far 1m16.395s
07:21:53 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 23.441s; max steady latency so far 1m16.395s
07:22:03 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 33.688s; max steady latency so far 1m16.395s
07:22:13 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 32.612s; max steady latency so far 1m16.395s
07:22:23 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 20.769s; max steady latency so far 1m16.395s
07:22:34 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 31.013s; max steady latency so far 1m16.395s

Passing run (pubsub): https://teamcity.cockroachdb.com/repository/download/Cockroach_Nightlies_RoachtestNightlyGceBazel/8586987:id/cdc/pubsub-sink/assume-role/run_1/artifacts.zip!/test.log

changefeed: 09:11:54 cdc.go:1695: initial scan completed: latency 11m25.701876413s
09:11:55 cdc.go:1710: test status: watching changefeed: end-to-end latency 11m26.726s not yet below target steady latency 1m0s
09:12:05 cdc.go:1710: test status: watching changefeed: end-to-end latency 3m23.579s not yet below target steady latency 1m0s
09:12:15 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.042s; max steady latency so far 12.042s
09:12:25 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.266s; max steady latency so far 20.238s
09:12:36 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.294s; max steady latency so far 20.463s
09:12:46 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.05s; max steady latency so far 20.492s
09:12:56 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.143s; max steady latency so far 20.492s
09:13:06 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.57s; max steady latency so far 20.492s
09:13:17 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.794s; max steady latency so far 20.492s
09:13:27 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.817s; max steady latency so far 20.492s
09:13:37 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 13.042s; max steady latency so far 20.492s
09:13:47 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 13.264s; max steady latency so far 20.492s
09:13:58 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 13.684s; max steady latency so far 20.492s
09:14:08 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 13.909s; max steady latency so far 20.492s
09:14:18 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 14.128s; max steady latency so far 20.492s
09:14:28 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 14.152s; max steady latency so far 20.492s
09:14:39 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 14.377s; max steady latency so far 20.492s
09:14:49 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 14.603s; max steady latency so far 20.492s
09:14:59 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 14.828s; max steady latency so far 20.492s
09:15:09 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 14.856s; max steady latency so far 20.492s
09:15:20 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 15.281s; max steady latency so far 20.492s
09:15:30 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 15.501s; max steady latency so far 20.492s
09:15:40 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 15.734s; max steady latency so far 20.492s
09:15:50 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 15.951s; max steady latency so far 20.492s
09:16:01 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 15.976s; max steady latency so far 20.492s
09:16:11 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 16.403s; max steady latency so far 20.492s
09:16:21 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 16.431s; max steady latency so far 20.492s
09:16:31 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 16.663s; max steady latency so far 20.492s
09:16:42 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 17.091s; max steady latency so far 20.492s
09:16:52 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 17.314s; max steady latency so far 20.492s
09:17:02 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 17.537s; max steady latency so far 20.492s
09:17:12 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 17.754s; max steady latency so far 20.492s

@jayshrivastava
Copy link
Contributor

Here's a passing run for cloudstorage

07:25:36 cdc.go:281: test status: watching changefeed
changefeed: 07:25:37 cdc.go:1695: initial scan completed: latency 1.074800666s
07:25:38 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 2.1s; max steady latency so far 2.1s
07:25:49 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.138s; max steady latency so far 12.138s
07:25:59 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 22.393s; max steady latency so far 22.393s
07:26:09 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 32.647s; max steady latency so far 32.647s
07:26:19 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 31.027s; max steady latency so far 35.722s
07:26:30 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 29.295s; max steady latency so far 35.722s
07:26:40 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 16.49s; max steady latency so far 35.722s
07:26:50 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 26.739s; max steady latency so far 35.722s
07:27:01 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 26.031s; max steady latency so far 35.722s
07:27:11 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 25.264s; max steady latency so far 35.722s
07:27:21 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 24.095s; max steady latency so far 35.722s
07:27:31 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 34.348s; max steady latency so far 35.722s
07:27:42 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 33.168s; max steady latency so far 35.722s
07:27:52 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 32.251s; max steady latency so far 35.722s
07:28:02 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 31.275s; max steady latency so far 35.722s
07:28:12 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 30.201s; max steady latency so far 35.722s
07:28:23 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 28.976s; max steady latency so far 35.722s
07:28:33 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 28.016s; max steady latency so far 35.722s
07:28:43 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 26.928s; max steady latency so far 35.722s
07:28:53 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 25.919s; max steady latency so far 35.722s
07:29:04 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 25.221s; max steady latency so far 35.722s
07:29:14 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 24.255s; max steady latency so far 35.722s
07:29:24 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 23.423s; max steady latency so far 35.722s
07:29:34 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 33.674s; max steady latency so far 35.722s
07:29:45 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 32.424s; max steady latency so far 35.722s

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/cloud-sink-gcs/assume-role failed with artifacts on release-22.2 @ 31eb3ce6ceb18123fd2e7eac464b25e203d7de0f:

test artifacts and logs in: /artifacts/cdc/cloud-sink-gcs/assume-role/run_1
(test_impl.go:292).Fatalf: max latency was more than allowed: 1m14.451686108s vs 1m0s

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@jayshrivastava
Copy link
Contributor

In the latest failure, we fail after the catchup scan portion.

changefeed: 07:25:17 cdc.go:1695: initial scan completed: latency 2.106103722s
07:25:18 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 3.132s; max steady latency so far 3.132s
07:25:29 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 12.477s; max steady latency so far 12.477s
07:25:39 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 22.745s; max steady latency so far 22.745s
07:25:49 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 21.855s; max steady latency so far 22.745s
07:25:59 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 32.115s; max steady latency so far 32.115s
07:26:10 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 30.399s; max steady latency so far 34.168s
07:26:20 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 29.669s; max steady latency so far 34.168s
07:26:30 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 28.471s; max steady latency so far 34.803s
07:26:41 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 27.164s; max steady latency so far 35.66s
07:26:51 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 25.4s; max steady latency so far 35.66s
07:27:01 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 23.694s; max steady latency so far 35.66s
07:27:11 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 33.967s; max steady latency so far 35.66s
07:27:22 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 33.198s; max steady latency so far 35.66s
07:27:32 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 32.237s; max steady latency so far 35.66s
07:27:42 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 31.157s; max steady latency so far 35.66s
07:27:52 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 30.615s; max steady latency so far 35.66s
07:28:03 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 30.307s; max steady latency so far 35.66s
07:28:13 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 28.739s; max steady latency so far 35.66s
07:28:23 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 27.838s; max steady latency so far 35.66s
07:28:34 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 27.108s; max steady latency so far 35.66s
07:28:44 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 26.367s; max steady latency so far 35.66s
07:28:54 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 25.388s; max steady latency so far 35.66s
07:29:04 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 24.559s; max steady latency so far 35.66s
07:29:15 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 23.924s; max steady latency so far 35.66s
07:29:25 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 34.197s; max steady latency so far 35.66s
07:29:35 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 32.703s; max steady latency so far 35.66s
07:29:45 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 31.975s; max steady latency so far 35.66s
07:29:56 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 31.17s; max steady latency so far 35.66s
07:30:06 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 30.504s; max steady latency so far 35.66s
07:30:16 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 29.762s; max steady latency so far 35.66s
07:30:27 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 18.034s; max steady latency so far 35.66s
07:30:37 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 28.312s; max steady latency so far 35.66s
07:30:47 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 27.58s; max steady latency so far 35.66s
07:30:57 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 26.838s; max steady latency so far 35.66s
07:31:08 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 25.563s; max steady latency so far 35.66s
07:31:18 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 24.359s; max steady latency so far 35.66s
07:31:28 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 23.632s; max steady latency so far 35.66s
07:31:39 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 33.908s; max steady latency so far 35.66s
07:31:49 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 44.186s; max steady latency so far 44.186s
07:31:59 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 54.463s; max steady latency so far 54.463s
07:32:09 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 1m4.736s; max steady latency so far 1m4.736s
07:32:20 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 1m3.971s; max steady latency so far 1m11.928s
07:32:30 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 1m14.251s; max steady latency so far 1m14.251s
07:32:40 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 1m13.423s; max steady latency so far 1m14.251s
07:32:50 cdc.go:1723: test status: watching changefeed: end-to-end steady latency 33.522s; max steady latency so far 1m14.451s

jayshrivastava added a commit to jayshrivastava/cockroach that referenced this issue Feb 21, 2023
Previously, we would observe failures due to the latency jumping to ~1m15s.
- https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/8757437
- https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/8534805

The most likely explanation for this is network blips. Previously, the maximum acceptable latency was 1 minute. This change
bumps it to 90 seconds to reduce how often we see flakes.

Fixes: cockroachdb#96330
Epic: none

Release note: None
@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/cloud-sink-gcs/assume-role failed with artifacts on release-22.2 @ c9b71bafa2857ebdc5ffea629fb9546937f10ba9:

test artifacts and logs in: /artifacts/cdc/cloud-sink-gcs/assume-role/run_1
(test_impl.go:292).Fatalf: max latency was more than allowed: 1m16.507124349s vs 1m0s

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/cloud-sink-gcs/assume-role failed with artifacts on release-22.2 @ 86e68df82e1b4e80ca134fe1b8f47dc930294e25:

test artifacts and logs in: /artifacts/cdc/cloud-sink-gcs/assume-role/run_1
(cdc.go:1797).assertValid: max latency was more than allowed: 1m1.722809472s vs 1m0s

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@jayshrivastava
Copy link
Contributor

Closing this as a flake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-cdc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants