-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestSQLStatsDataDriven is flaky #89861
Comments
Skipping the test since its flaky. Informs: cockroachdb#89861 Release note: None
flaked again https://teamcity.cockroachdb.com/viewLog.html?buildId=7149837&tab=buildResultsDiv&buildTypeId=Cockroach_UnitTests_Test - skip is here while we investigate - #90744. |
Skipping the test since its flaky. Informs: #89861 Release note: None
98337: sql: enable TestSQLStatsDataDriven r=j82w a=j82w Adding retry logic to the execute command, and enabling the test. All previous builds with failures are no longer available. No failures on testing locally with stress. Epic: none Closes: #89861 Release note: none Co-authored-by: j82w <jwilley@cockroachlabs.com>
I see this was just unskipped - but it just failed again for me: Looks like several others had the same failure: https://teamcity.cockroachdb.com/test/6015532732023785721?currentProjectId=Cockroach_Ci_TestsGcpLinuxX8664BigVm&expandTestHistoryChartSection=true |
This commit skips TestSQLStatsDataDriven. This test is flaky. Part of cockroachdb#89861. Epic: none Release note: None
…99123 98712: storage: expose separate WriteBatch interface r=sumeerbhola a=jbowens Adapt NewUnindexedBatch to no longer take a writeOnly parameter. Instead, a new NewWriteBatch method is exposed that returns a WriteBatch type that does not provide Reader facilities. Future work may remove UnindexedBatch altogether, updating callers to explicitly maintain separate Readers and WriteBatches. Epic: None Release note: None 98807: spanconfig: add TestBlockedKVSubscriberDisablesQueues r=irfansharif a=irfansharif Regression test for #98422 which disables {split,replicate,mvccGC} queues until subscribed to span configs. We're only slightly modifying the existing merge-queue specific TestBlockedKVSubscriberDisablesMerges. Release note: None 98823: sql: address a couple of old TODOs r=yuzefovich a=yuzefovich This commit addresses a couple of old TODOs in the `connExecutor.dispatchToExecutionEngine` method. First, it simply removes the now-stale TODO about needing to `Step()` the txn before executing the query. The TODO mentioned things about the old way FK checks were performed, but we now always run checks in a deferred fashion, and perform the `Step`s correctly there for cascades and checks, thus, the TODO can be removed. (The TODO itself explicitly mentions that it can be removed once we do checks and cascades in the deferred fashion.) Second, it addresses a TODO about how some plan information is saved. Up until 961e66f the cleanup of `planNode` tree and `flow` infra was intertwined, so in 6ae4881 in order to "sample" the plan with correct info (some of which is only available _after_ the execution is done) we delegated that sampling to `planTop.close`. However, with `planNode` tree and `flow` cleanups refactored and de-coupled, we no longer have to close the plan early on. As a result, we now close the plan in a defer right after we create it (now it's the only place we do so), and this commit makes it so that we record the plan info explicitly right after returning from the execution engine, thus, addressing the second TODO. Epic: None Release note: None 98896: tracing: fix WithEventListeners option with verbose parent r=yuzefovich a=yuzefovich Previously, when using `WithEventListeners` span option and not explicitly specifying `WithRecording` option, we could incorrectly use the structured recording. In particular, this was the case when the parent has verbose recording, and this is now fixed. At the moment, this span option is used only by the backup and restore processors, so the impact would be that their verbose recording wouldn't be included into the session recording (perhaps we don't actually expose that anyway). Epic: None Release note: None 99079: sql: fix hint for unknown udfs in views and funcs r=rharding6373 a=rharding6373 Fixes a hint for some instances of an "unknown function" error that was not properly applied previously. Informs: #99002 Epic: None Release note: None 99095: ui: fix inflight request replacement, cleanup database details api r=THardy98 a=THardy98 Changes to allow in-flight request replacement in this PR: #98331 caused breaking changes to `KeyedCachedDataReducer`s. Despite the added `allowReplacementRequests` flag being `false`, in-flight requests would be replaced (specifically, when fetching for initial state). This would prevent all requests but the last one from being stored in the reducer state and leave the remaining requests in a permanent `inFlight` status. Consequently, our pages would break as the permanent `inFlight` status on these requests would prevent us from ever fetching data, putting these pages in a permanent loading state. This PR adds checks to ensure `allowReplacementRequests` is enabled when we receive a response, before deciding whether to omit it from the reducer state. It's important to note that this is still not an ideal solution for `KeyedCachedDataReducer`s, should we ever want to replace inflight requests for one. The checks to replace an in-flight request still occur at the reducer-level, whereas in a `KeyedCachedDataReducer` we'd like them to occur at the state-level (i.e. instead of replacement checks for all requests across the reducer, we should check for requests that impact a specific key's state). This way we scope replacements to each key in the reducer, allowing us to fetch data for each key independently (which is the intended functionality). Enabling this for a `KeyedCachedDataReducer` without making this change I believe will cause the breaking behaviour above. This PR also includes some cleanup to the database details API, namely: - camel casing response fields - encapsulating the database span stats response into its own object (instead of across multiple objects), this helps scope any errors deriving from the query to the single stats object **BEFORE** https://www.loom.com/share/57c9f278376a4551977398316d6ed7b6 **AFTER** https://www.loom.com/share/b2d4e074afca4b42a4f711171de3fd72 Release note (bug fix): Fixed replacement of in-flight requests for `KeyedCachedDataReducer`s to prevent permanent loading on requests stuck on an `inFlight` status. 99097: upgrades: fix backfilling of special roles in user ID migrations r=rafiss a=andyyang890 upgrades: fix system.privileges user ID migration This patch adds a missing case for the public role in the backfilling portion of the system.privileges user ID migration. Release note: None --- upgrades: fix system.database_role_settings user ID migration This patch adds a missing case for the empty role in the backfilling portion of the system.database_role_settings user ID migration. Release note: None --- Part of #87079 99113: server: use no-op cost controller for shared-process tenants r=knz a=stevendanna In the long run, all rate limiting of tenants should be controlled by a tenant capability. However, at the moment, we do not have the infrastructure to read tenant capabilities from the tenant process. Since, the main user of shared-process tenants is the new, experimental unified-architecture were we do not anticipate needing the cost controller for most users, this PR injects a no-op cost controller for shared-process tenants. Epic: CRDB-23559 Release note: None 99120: tracing: allow collection of stack history for Structured spans as well r=dt a=adityamaru Previously, we mandated that the span must be Verbose to be able to collect and emit a Structured event containing stack history. This change permits Structured spans to also collect stack history. Release note: None Epic: none 99123: sql: skip flaky TestSQLStatsDataDriven r=ericharmeling a=ericharmeling This commit skips TestSQLStatsDataDriven. This test is flaky. Part of #89861. Epic: none Release note: None Co-authored-by: Jackson Owens <jackson@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com> Co-authored-by: Thomas Hardy <thardy@cockroachlabs.com> Co-authored-by: Andy Yang <yang@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: adityamaru <adityamaru@gmail.com> Co-authored-by: Eric Harmeling <eric.harmeling@cockroachlabs.com>
Duplicate: #89861 |
It looks like TestSQLStatsDataDriven occasionally flakes. 2 out of the last 1000 runs.
Failed
=== RUN TestSQLStatsDataDriven
test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/83b49f90eb84fbc5d893648fc9fd7294/logTestSQLStatsDataDriven2913649257
test_log_scope.go:79: use -show-logs to present logs inline
=== CONT TestSQLStatsDataDriven
datadriven_test.go:223: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/83b49f90eb84fbc5d893648fc9fd7294/logTestSQLStatsDataDriven2913649257
--- FAIL: TestSQLStatsDataDriven (8.34s)
Its hard to tell whats going on here. But its flaked again recently. I see this at the end of the log:
W221012 19:26:22.304027 2704 kv/kvserver/closedts/sidetransport/receiver.go:139 â‹® [n2] 264 closed timestamps side-transport connection dropped from node: 1
W221012 19:26:22.304076 1929 jobs/registry.go:824 â‹® [n2] 265 canceling all adopted jobs due to stopper quiescing
I221012 19:26:22.305595 2435 sql/stats/automatic_stats.go:509 â‹® [n3] 266 quiescing auto stats refresher
W221012 19:26:22.305700 2441 jobs/registry.go:824 â‹® [n3] 267 canceling all adopted jobs due to stopper quiescing
W221012 19:26:22.307135 2423 sql/sqlliveness/slinstance/slinstance.go:250 â‹® [n3] 268 exiting heartbeat loop
W221012 19:26:22.307801 1049 jobs/registry.go:824 â‹® [n1] 269 canceling all adopted jobs due to stopper quiescing
W221012 19:26:22.307981 1031 sql/sqlliveness/slinstance/slinstance.go:250 â‹® [n1] 270 exiting heartbeat loop
I221012 19:26:22.308208 1226 jobs/registry.go:1217 â‹® [n1] 271 AUTO SPAN CONFIG RECONCILIATION job 804560481633009665: stepping through state succeeded with error:
W221012 19:26:22.308299 1911 sql/sqlliveness/slinstance/slinstance.go:250 â‹® [n2] 272 exiting heartbeat loop
W221012 19:26:22.308306 1226 kv/txn.go:705 ⋮ [n1] 273 failure aborting transaction: ‹node unavailable; try another peer›; abort caused by: context canceled
W221012 19:26:22.308609 2819 kv/kvserver/closedts/sidetransport/receiver.go:139 â‹® [n3] 274 closed timestamps side-transport connection dropped from node: 1
https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_BazelEssentialCi/6911488?showRootCauses=false&expandBuildChangesSection=true&expandBuildProblemsSection=true&expandBuildDeploymentsSection=true&expandBuildTestsSection=true
Jira issue: CRDB-20469
The text was updated successfully, but these errors were encountered: