-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: cdc/cloud-sink-gcs/assume-role failed #96330
Comments
I suspect this is happening. There are 3 states when starting the changefeed:
The latency verifier only waits for the initial scan to finish and starts measuring latencies from the catchup scan onward. If the initial scan takes long, the catchup scan may have a lot of rows to cover since we are running a workload on the table. The catchup scan emits NO CHECKPOINTS. The latency verifier observes the latency based on the highwater from the initial scan. This observed latency increases as the catchup scan continues. We should update the latency verifier to wait for the catchup scan to finish. edit: we can also increase the concurrency of the catchup scan so it goes faster. |
The latencies for a passing run and failed run are very different. Failing:
Passing run (pubsub): https://teamcity.cockroachdb.com/repository/download/Cockroach_Nightlies_RoachtestNightlyGceBazel/8586987:id/cdc/pubsub-sink/assume-role/run_1/artifacts.zip!/test.log
|
Here's a passing run for cloudstorage
|
roachtest.cdc/cloud-sink-gcs/assume-role failed with artifacts on release-22.2 @ 31eb3ce6ceb18123fd2e7eac464b25e203d7de0f:
Parameters: |
In the latest failure, we fail after the catchup scan portion.
|
Previously, we would observe failures due to the latency jumping to ~1m15s. - https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/8757437 - https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/8534805 The most likely explanation for this is network blips. Previously, the maximum acceptable latency was 1 minute. This change bumps it to 90 seconds to reduce how often we see flakes. Fixes: cockroachdb#96330 Epic: none Release note: None
roachtest.cdc/cloud-sink-gcs/assume-role failed with artifacts on release-22.2 @ c9b71bafa2857ebdc5ffea629fb9546937f10ba9:
Parameters: |
roachtest.cdc/cloud-sink-gcs/assume-role failed with artifacts on release-22.2 @ 86e68df82e1b4e80ca134fe1b8f47dc930294e25:
Parameters: |
Closing this as a flake. |
roachtest.cdc/cloud-sink-gcs/assume-role failed with artifacts on release-22.2 @ daf17897df96e2fbf9475fa4b4ebe235d16c9b62:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=16
,ROACHTEST_encrypted=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-24069
Epic CRDB-11732
The text was updated successfully, but these errors were encountered: