-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: gossip/chaos/nodes=9 failed #124828
Comments
roachtest.gossip/chaos/nodes=9 failed with artifacts on master @ 802554ac59763c31aee034651efcd28f7bdcdd5d:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters |
roachtest.gossip/chaos/nodes=9 failed with artifacts on master @ c88a1cedb064f554f38bb5b310e83ac3259ee02e:
Parameters:
Help
See: roachtest README See: How To Investigate (internal) Grafana is not yet available for azure clusters |
In all three cases, we see a sudden long pause between gossip probes and then the test times out.
and
and
I don't see much that could explain a pause between a |
I've spent another 15 minutes looking at this without any success, so I'll time box the investigation there. This doesn't look like a release blocker. It looks like a test infra flake. I'll add a bit more logging then move this to the backlog. |
Informs cockroachdb#124828. Release note: None
125724: sql: reduce miscellaneous allocations r=yuzefovich a=yuzefovich See each commit for details. Combined improvement: ``` name old time/op new time/op delta LastWriteWinsInsert-8 1.42ms ±16% 1.44ms ±13% ~ (p=0.820 n=20+20) name old alloc/op new alloc/op delta LastWriteWinsInsert-8 267kB ± 2% 265kB ± 2% ~ (p=0.057 n=19+20) name old allocs/op new allocs/op delta LastWriteWinsInsert-8 1.95k ± 0% 1.91k ± 0% -2.38% (p=0.000 n=17+18) ``` Epic: CRDB-39063. 125794: roachtest: improve logging in gossip/chaos/nodes=9 r=nvanbenschoten a=nvanbenschoten Informs #124828. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Informs cockroachdb#124828. Informs cockroachdb#126077. Release note: None
125865: cli: add storage engine metrics to debug zip r=itsbilal a=anish-shanbhag Metrics from the storage engine are already exposed in the `/debug/lsm` HTTP endpoint. These can be useful when debugging storage issues, and so this change adds these metrics to the debug zip under `/nodes/$N/lsm.txt` in the same text format as the HTTP route. The previously unused `EngineStats` status endpoint was repurposed to serve these metrics from each node. Fixes: #79518 Epic: none Release note: none 126087: roachtest: improve logging in gossip/chaos/nodes=9 further r=nvanbenschoten a=nvanbenschoten Informs #124828. Informs #126077. Release note: None Co-authored-by: Anish Shanbhag <anish.shanbhag@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Informs cockroachdb#124828. Informs cockroachdb#126077. Release note: None
commit ce132d4252906a72c5bd7ec5c2f1e285a67f5ec4 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jul 1 11:58:38 2024 -0400 cleanup lots commit cc051e28307da824d5fe099e585941dcdcf6967e Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jul 1 11:17:57 2024 -0400 cleanup commit 5a6e601d22ccc5a2250861d98ce32b0379ed792e Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jul 1 11:15:04 2024 -0400 switch to older kgo version to not have to bump scary deps commit 57bba2f1db711d69af7dcdf851780c172d3f767e Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jul 1 09:49:53 2024 -0400 meta commit a95fab1588b89b70b6251374d89e0d2029efba9a Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 28 12:35:20 2024 -0400 fix pubsub close commit d7fbd5c7077676ef115b4526a2a3fa6e262f36d6 Author: miles <miles.frankel@cockroachlabs.com> Date: Fri Jun 28 16:02:49 2024 +0000 hax commit a45362ea8b613b4fffc3e0b2c93201bd10aa6a2f Author: miles <miles.frankel@cockroachlabs.com> Date: Fri Jun 28 14:47:39 2024 +0000 timeout commit 0c78122d66fb9fdeca083e04dc29c86bbe4ba714 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 28 10:32:40 2024 -0400 update msg commit e46627c78cadc7e4cd267170c384a96cdeb23d51 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 28 10:26:39 2024 -0400 increase timeout commit ae99141cfdc62576d638cdda4651324ae92bf2d0 Author: miles <miles.frankel@cockroachlabs.com> Date: Fri Jun 28 14:09:45 2024 +0000 rm par dbg commit 44f7e7e60033fb23e7c4e33252004593784db48c Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Thu Jun 27 16:56:27 2024 -0400 wippppp commit 2b6d87bbbba1556caa6a93abfe0864285dec376d Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Thu Jun 27 15:07:52 2024 -0400 wu commit 4834373fa2bdf77ae7acd14ebe2926c8bdb23fef Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Thu Jun 27 13:30:04 2024 -0400 fix tests commit 100279bfb174a4894ea7be386d2b13ee55374abc Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Thu Jun 27 12:09:37 2024 -0400 wup commit 2224f7c935e780dc663fba47a76f9413b6233334 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Thu Jun 27 11:56:30 2024 -0400 wip commit ac5539cd3fe329d6b3a630a81f0553d2ca379053 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Thu Jun 27 11:17:44 2024 -0400 wip commit 3c3dadab6476dfc708f1829007fb72c1fd4b6193 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Thu Jun 27 11:01:33 2024 -0400 wip commit c6ba99319fc94c011a52cdcf52078c7b7905136e Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Wed Jun 26 17:08:18 2024 -0400 wip commit 9dfa995abf0100b78d16e2bf4e7d9286cd551a26 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Wed Jun 26 15:31:40 2024 -0400 wpi commit e65c4fbb8e091f1bdd8e76f87c7b519e8694f0ed Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Wed Jun 26 14:51:16 2024 -0400 try commit 58c639651e52d3721b3a25cf9e37e64d390f7a62 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Wed Jun 26 14:43:26 2024 -0400 wip commit fb6d341f1fe172e76aec28b470005f98fb3425da Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Wed Jun 26 13:55:07 2024 -0400 test setup commit 1d3c0cd070c5ab047ccaf987403cb1c5cc295e70 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 15:38:51 2024 -0400 cl commit 464c3e718f4d83a7af9a06f35159c42eb8098dc5 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 15:16:35 2024 -0400 cl commit adb3f485c0b7381affbb4b38950e44bcdd4a08fa Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:39:58 2024 -0400 cl commit 829cd2c2bbdf655cde7b88f533716ba3e78fff72 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:36:25 2024 -0400 cl commit 1fe41f499dcfe558c262fb4617837834bef69a9b Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:35:45 2024 -0400 cl commit 579a243f3037ec5e9c5e520a9784a682aac2c04c Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:33:48 2024 -0400 cl commit 7f4e4d437b79fab889f0fecbeb165bf3caa937b5 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:30:42 2024 -0400 cl commit adc33b0fe49b27e7c5cba4f2743d6480e5265d4b Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:30:04 2024 -0400 cl commit 900153adda79ae47adba0ad9d1bb041ecc01be0a Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:27:51 2024 -0400 clean commit 16bcad8f83b4caf9122a573400e355e17a08eef3 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:26:20 2024 -0400 cl commit b85df130cb1dc0732f16825e3eb24119701c488e Merge: 54185ab5b36 4db235a8452 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:25:48 2024 -0400 Merge branch 'master' of github.com:cockroachdb/cockroach into kafka-batching-sink-cleaned commit 54185ab5b365c330ef14fe8254859eed881446fe Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:21:16 2024 -0400 hack tests commit 657c12a78bf63e485c1e8efdb8474db26ae4afdc Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 14:19:41 2024 -0400 remove weirder roachtest stuff commit 20c3d9a4f40b19556eb0d0d82bbe134017cf0b04 Author: Bhaskarjyoti Bora <bhaskar.bora@cockroachlabs.com> Date: Tue Jun 18 17:08:35 2024 +0530 roachtest/test-selector: omit test failure due to preempted VMs This change is possible due to a recent addition of a column capturing the failure details in snowflake. In the test selection criteria, we consider all tests that have failed. This includes the tests that are failed due to infra as well. But, these tests can be considered success as the actual test has not failed. So, this change ignores the tests that have failed due to preempted VMs (one type of infra flake). Also, the query was considering the test results only from the master branch. This change reads the branch from the env as TC_BUILD_BRANCH. Informs: #119630 Epic: none commit 27bbaf15a7a59432789262f1b4acc42c25ad0a23 Author: Michael Butler <butler@cockroachlabs.com> Date: Fri Jun 21 11:34:28 2024 -0400 storage: add OriginID to MVCCValueHeader This patch adds a new OriginID field to the MVCCValueHeader which identifies the original cluster that wrote the key in a Logical Data Replication (LDR) setup. A value of 0 identifies a local write, a value of 1 identifies a remote write, and values above 2+ may be used eventually to identify a specific cluster. In a future patch, the OriginID field will be used in a similar fashion to the OmitInRangefeeds MVCCValueHeader field -- downstream rangefeeds may be configured to filter logical operations depending on the origin ID. This approach will address the LDR data looping problem. For more information, read in the linked epic: Epic: https://cockroachlabs.atlassian.net/browse/CRDB-39140 Release note: none commit bded5f5aaf01166a3110c8823fa1c5e5ac2232ea Author: David Taylor <tinystatemachine@gmail.com> Date: Mon Jun 24 13:57:07 2024 +0000 crosscluster/producer: always repartition spec if able If we have 5 or 6 spans, we still want to partition them up across more than one proc to get more parallelism since those spans could be large. Release note: none. Epic: none. commit 6689ff4ca8ddeb2ac5e2c023dba1f5e8dbf91d50 Author: Rail Aliiev <rail@iqchoice.com> Date: Mon Jun 24 22:38:09 2024 -0400 remove unused file Epic: none Release note: None commit 983a700e9b3793e4344ffcf7ae0ad6c3d0d17862 Author: CRL Release bot <teamcity@cockroachlabs.com> Date: Tue Jun 25 00:18:43 2024 +0000 master: Update pkg/testutils/release/cockroach_releases.yaml Update pkg/testutils/release/cockroach_releases.yaml with recent values. Epic: None Release note: None Release justification: test-only updates commit afe0795c9ad05da797dd6fd4596b08d14403725e Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Fri Jun 14 16:54:16 2024 -0700 sql: audit all processors to avoid redundant copies of eval context This commit audits all processors to ensure that we make copies of the eval context only when necessary. Up until recently every processor mutated the context by modifying the `context.Context` field, but now that field has been removed, so only a handful of processors mutate the context (namely, whenever a processor explicitly mutates the context or when it uses the rendering / filtering capabilities of the ExprHelper). This commit makes it explicit whenever a copy is needed and uses the flow's eval context when no mutation occurs. In order to clarify different usages it removes `ProcessorBaseNoHelper.EvalCtx` field and moves it into each processor if it needs it. Rendering capabilities are used implicitly by many processors via `ProcOutputHelper`, and if a processor itself doesn't need a mutable eval context, we can create a copy for the helper without exposing it to the processor itself. Release note: None commit a110cc3d08991d16c81fc9b94d89b2ad9f3cef5d Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Fri Jun 21 13:03:53 2024 -0700 sql: remove unused ProcOutputHelpers This commit removes embedded `execinfra.ProcOutputHelper`s from some processors where they weren't actually used (because either the post-processing spec is empty or the processor only produces metadata). This should effectively be a no-op change. Release note: None commit 08336021df15a27078db375284902e739a8ec878 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Fri Jun 21 12:10:04 2024 -0700 sql: remove some usages of ProcessorBaseNoHelper.EvalCtx This commit replaces some usages of `ProcessorBaseNoHelper.EvalCtx` field with `FlowCtx.EvalCtx` whenever it's clear that it's safe (meaning that those usages do not modify the eval context). It additionally replaces usages of `EvalCtx.Settings` with `FlowCtx.Cfg.Settings`. Overall, the goal is to reduce the usage of the eval context outside of actual expression evaluation, and this commit should effectively be a no-op. Release note: None commit 9c170aed3238547787d0b0235c2be2599be67c50 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Fri Jun 21 14:14:29 2024 -0700 ccl: remove redundant flowCtx field from processors All processors that use ProcessorBaseNoHelper already have a reference to the flow context, so storing it explicitly in the processor is redundant. Release note: None commit a8baf3e12ae338a1a1ca2153b486ae84feba3710 Author: Rafi Shamim <rafi@cockroachlabs.com> Date: Fri Jun 21 18:52:29 2024 -0400 sql,schemachanger: pause backfill job when disk space runs out Previously, this error would cause the job to fail and begin reverting. If the cause of the error was insufficient disk space, then attempting to revert is just as likely to fail. Now we pause the job instead. Release note: None commit 12c90525d825174ea2171fc3eed1f01d6c4365b2 Author: Michael Butler <butler@cockroachlabs.com> Date: Mon Jun 24 14:11:26 2024 -0400 roachtest: wait for job to pause before resuming Fixes #125921 Release note: none commit ac4e00ed1138bfebdfa413f9ea4ecd6c99167ca0 Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Fri Jun 21 17:05:30 2024 -0400 sql/rowexec: avoid extra allocation and copy in joinerBase for paired joiners Prior to this commit `joinerBase` would always reallocate and copy a slice of output types if the join had a continuation column (which is the case for paired joiners). This extra allocation has been eliminated by initially allocating enough space in the output types for the continuation column. Release note: None commit f8d55f418a3c80b0d694140884428dfceaa71cb4 Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Fri Jun 21 17:01:20 2024 -0400 sql/rowexec: batch allocations in joinerBase Three allocations in `joinerBase.init` have been batched into a single allocation. Release note: None commit d32f380dca834850b656d97fe28355b1a8b62d3d Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Fri Jun 21 15:15:47 2024 -0400 sql/sem/tree: add type-specific allocation sizes for DatumAlloc `DatumAlloc` now allows for users to provide type-specific allocation sizes. This is useful when the number of allocations of specific types can be known before bulk allocating datums with `DatumAlloc`. Type-specific allocation sizes are currently only used by `VecToDatumConverter`. This can significantly reduce the number of allocations made by `Materializer` when operating on a small batch of wide rows that has many duplicate column types. For example, consider a single-row batch with 100 string columns. Prior to this change, the allocation size would be set to 1 (because the batch size is 1) and `DatumAlloc` would allocate 100 individual string slices. After this change, `DatumAlloc` allocates a single string slice with a length of 100. Release note: None commit 250b798c4ba1ae09a6e7bc5d9bb1f2a40dec6d02 Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Fri Jun 21 15:01:19 2024 -0400 sql/sem/tree: refactor DatumAlloc This commit renames the `AllocSize` field of the `DatumAlloc` struct to `DefaultAllocSize` in preparation of a future commit that will introduce allocation sizes for individual datum types. Release note: None commit e5c7d037c77eb2729d0c0b4bf04f3daeb0b73f42 Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Fri Jun 21 15:22:38 2024 -0400 sql/colexec: add more cases to BenchmarkMaterializer Release note: None commit 8b38d8452683b1b08d083e5a9751e13e7db82aaa Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Fri Jun 21 11:14:52 2024 -0700 sql: address a couple of nits around recent COPY changes Add a TODO to investigate a "magic number" of 1/3 in the upper bound of the COPY fraction of max command size as well as rename the recently-added cluster setting to `sql.copy.fraction_of_max_command_size`. Release note: None commit 011588015de758fdb4ca9e20ef63356a4692e710 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Mon Jun 24 12:16:59 2024 -0700 stmtdiagnostics: fix the test in an edge case We previously reset the stmt timeout after executing the query that checks that the request is completed, so in an edge case that query could've timed out producing a test flake. Release note: None commit b71b9e9bc79a04bba194e0c65fe54a987b771353 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Sun Jun 23 18:21:42 2024 -0700 sql: presize stmt buf in internal executor We always execute either 2 or 4 commands in the internal executor, so we can precisely reserve the necessary capacity, avoiding some allocations. Release note: None commit 8366074e72d73d01655f00f60a203447aef5a406 Author: Rebecca Taft <becca@cockroachlabs.com> Date: Thu Jun 20 15:52:15 2024 -0500 opt: fix stats estimates when histograms disabled If the database detects excessive memory utilization during stats collection, it may disable histogram collection. Prior to this commit, the optimizer was using the resulting empty histograms for statistics estimation, resulting in bad plans. This commit fixes the stats cache and the statistics builder in the optimizer so they do not use histograms if they are empty (or the row count equals the null count). Fixes #125963 Release note (bug fix): Fixed the statistics estimation code in the optimizer so it does not use the empty histograms produced if histogram collection has been disabled during stats collection due to excessive memory utilization. Now the optimizer will rely on distinct counts instead of the empty histograms and should produce better plans as a result. This bug has existed since CockroachDB v22.1. commit e10b797f962c7da70bacaf5e7bc261fc4b22dd8c Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Mon Jun 24 14:20:01 2024 -0400 sql: remove unnecessary `convertNodeOrdinalsToInts` function This function converted a slice of `exec.NodeColumnOrdinal`s to a slice of `int`s. There was no reason to make this conversion. Removing it reduces allocations. Release note: None commit 12fc034a0473414e08e0fa70b90723303532c499 Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Mon Jun 17 15:04:54 2024 -0400 opt: improve stats for filters with placeholder equalities Stats estimates for filters of the form `col = $1` have been improved. While histograms cannot be filtered without knowing the placeholder value, we do know that `col` is limited to a one distinct value. This improves the estimate over the simple 1/3 "unknown selectivity" that was applied for these filters before this commit. This changed required a modification to `TestCopyAndReplace` because it made it such that a merge-join was no longer selected when optimizing the query with placeholders. Now the test checks that `CopyAndReplace` works correctly when copying a fully optimized memo with a parameterized lookup-join. I've confirmed that the test still catches the bug fixed in the original PR that added the test, #35020. Release note: None commit c127e477cb1767e84f9474a54bf41f921400513a Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Mon Jun 17 14:48:56 2024 -0400 opt: add stats tests for generic query plans This commit adds some stats tests with queries containing placeholders. Release note: None commit 4dd0674ec309b2ff4e7d51085752352f338c64c8 Author: Andrew Baptist <baptist@cockroachlabs.com> Date: Mon Jun 24 13:24:42 2024 -0400 kvserver: release transport factory This change correctly releases the Transport created by the transport factory on proxy requests. Epic: None Release note: none commit 6177fdeb6e9856eff76ec8ff9988b665acc86e60 Author: Arul Ajmani <arulajmani@gmail.com> Date: Mon Jun 24 10:27:08 2024 -0400 kvserver: skip TestGossipHandlesReplacedNode under deadlock We've seen this test flake under deadlock for some time now. Skip it. Closes https://github.com/cockroachdb/cockroach/issues/125215 Release note: None commit 0693e620f8e19dd78de22a41a0e54a66dbe41b01 Author: Andrew Kimball <andyk@cockroachlabs.com> Date: Mon Jun 17 16:40:31 2024 -0700 tenantcost: incorporate write batch rate into estimated CPU The new estimated CPU cost model requires that the cost of writes change based on QPS - the combined rate of write batches / second across all SQL pods for a given tenant. This PR wires that up, with changes to the server to calculate the write QPS and changes to the client to use that value to modify estimated CPU. In order to calculate the rate across all SQL pods, the server adds two additional "rate" fields to the tenant_usage table. These fields track the rate of writes for the most recent 10-second intervals, summed up across the SQL pods for each tenant. Also, remove reads of the "instance_shares" column, as it's not actually used anywhere. In a future version, we can drop the column. Epic: CC-28471 Release note: None commit 33ffb9dcfffc263b2de40ae4568ef9020fa9dde2 Author: Ricky Stewart <ricky@cockroachlabs.com> Date: Thu Jun 6 16:52:01 2024 -0500 dev: add support for `--remote` mode to `dev doctor` This can be opted into with `dev doctor --remote`. You can switch back to local mode with `--remote=no`. If neither option is supplied, then we infer the appropriate mode from the `.bazelrc.user` contents. Epic: CRDB-34137 Release note: None commit 0907b0bdaa15f52d83c88e293a17cce69edae3b1 Author: Rafi Shamim <rafi@cockroachlabs.com> Date: Mon Jun 24 12:38:35 2024 -0400 sql: avoid slow lock verification in TestSchemaChangeAfterCreateInTxn The addition of test-only verification pushed this test over the timeout sometimes, such that running it under the deadlock detector would cause spurious failures. We avoid this by making the test smaller under deadlock, like we do for race builds. Release note: None commit 9dd25df7a04c867488b538ac8d17743ac911d1c1 Author: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com> Date: Mon Jun 24 12:10:51 2024 -0400 kvserver: deflake `WALBytesWritten` metric There is a race condition in Pebble metrics where sometimes the WAL is rotated prior to updated the BytesIn metric to account for the previous WAL. The metrics collection call happens async so it can sometimes cause this metric to decrease for a scrape window. Fixes: #125736. Release note: None commit d3859c87461b2a168d5eb61d70cbed07e28872c4 Author: David Taylor <tinystatemachine@gmail.com> Date: Sun Jun 23 18:07:37 2024 +0000 jobs/jobsprofiler: limit retained dsp diagrams Release note (ops change): Fixed a bug where collection of debug information for very long-running jobs could use excessive space in the job_info system table and/or cause some interactions with the jobs system to become slow. Epic: none. commit d39af87e2d2a24603066d1f451fe4054afb96583 Author: David Taylor <tinystatemachine@gmail.com> Date: Sun Jun 23 15:20:56 2024 +0000 jobs: add limit support to infostorage.DeleteRange Release note: none. Epic: none. commit fe5a29eaf3ca145d6a5225ed88457c2f53e11c56 Author: David Taylor <tinystatemachine@gmail.com> Date: Sun Jun 23 15:11:06 2024 +0000 jobs: add Count() job info accessor Release note: none. Epic: none. commit 1106f4bb032a1ce54047da424da2c374b5ba4ad1 Author: David Taylor <tinystatemachine@gmail.com> Date: Sun Jun 23 14:52:27 2024 +0000 jobs: explain trailing ~ Release note: none. Epic: none. commit 272c17396d0869c84ada52c90fee0372acc7276f Author: DarrylWong <darryl@cockroachlabs.com> Date: Wed Jun 12 14:51:16 2024 -0400 roachtest: grafana annotations read creds from cloud storage As of #124099 we now store the service account creds in cloud storage. We already use this to access prometheus when generating dynamic configs. This change does the same for Grafana annotations by extracting the common logic into a helper. This will allow users to have access to Grafana annotations out of the box locally, and limit the amount of benign but potentially confusing warnings about invalid credentials. Epic: none Fixes: none Release note: none commit 17ddb769170e53f2c7ef457ceda1ba1c558552a8 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Sun Jun 23 19:09:41 2024 -0700 logical: remove some overhead with explicit txns Previously, when using the explicit txns to perform INSERT and DELETE ingestion queries we would use "vanilla" `Txn` method. This method creates a new internal session data on each invocation of the internal executor methods. This commit avoids doing that by supplying the Txn method with the copy of the session data from the session that started the replication stream - that copy will be adjusted to apply some internal executor overrides but otherwise will be unchanged. In other words, even though this commit is not exactly a no-op, it is effectively one. ``` name old time/op new time/op delta LastWriteWinsInsert/explicitTxn/noPrevValue-8 812µs ± 2% 758µs ± 2% -6.61% (p=0.000 n=19+20) LastWriteWinsInsert/explicitTxn/withPrevValue-8 1.19ms ± 1% 1.14ms ± 1% -3.91% (p=0.000 n=20+20) LastWriteWinsInsert/implicitTxn/noPrevValue-8 747µs ± 1% 752µs ± 2% +0.76% (p=0.018 n=19+20) LastWriteWinsInsert/implicitTxn/withPrevValue-8 1.14ms ± 2% 1.14ms ± 1% +0.52% (p=0.015 n=20+19) name old alloc/op new alloc/op delta LastWriteWinsInsert/explicitTxn/noPrevValue-8 177kB ± 2% 174kB ± 2% -1.45% (p=0.000 n=17+16) LastWriteWinsInsert/explicitTxn/withPrevValue-8 243kB ± 0% 240kB ± 0% -1.02% (p=0.000 n=16+16) LastWriteWinsInsert/implicitTxn/noPrevValue-8 166kB ± 9% 165kB ± 3% ~ (p=0.985 n=16+16) LastWriteWinsInsert/implicitTxn/withPrevValue-8 231kB ± 0% 231kB ± 0% ~ (p=0.897 n=16+16) name old allocs/op new allocs/op delta LastWriteWinsInsert/explicitTxn/noPrevValue-8 1.07k ± 2% 0.99k ± 2% -7.39% (p=0.000 n=17+16) LastWriteWinsInsert/explicitTxn/withPrevValue-8 1.63k ± 0% 1.55k ± 0% -4.83% (p=0.000 n=17+17) LastWriteWinsInsert/implicitTxn/noPrevValue-8 974 ±11% 967 ± 1% ~ (p=0.771 n=16+16) LastWriteWinsInsert/implicitTxn/withPrevValue-8 1.53k ± 0% 1.53k ± 0% ~ (p=0.843 n=17+17) ``` Release note: None commit fc84374ceb5a9991c44b4a0e271e202c3232c298 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Tue Jun 18 14:26:35 2024 -0700 logical: use implicit txns for each INSERT and DELETE This commit introduces another strategy for processing rows. Previously, we would handle a batch of rows within an explicit txn, and this commit makes it so that each row is handled in a separate implicit txn. This strategy is now used by default and can be disabled via newly-added `logical_replication.consumer.use_implicit_txns.enabled` cluster setting. Usage of the implicit txns allows us to hit 1PC down in the KV layer in some cases. Release note: None commit 60881bb0dd414ce26dd44865629ef557c6b08dcb Author: Ricky Stewart <ricky@cockroachlabs.com> Date: Mon Jun 24 11:09:34 2024 -0500 jobsprofiler: bump size of test under `race` Closes #126093. Epic: none Release note: None commit cd341e12b8b2dc3a4814c6ebbf24ec81d645526b Author: Ricky Stewart <ricky@cockroachlabs.com> Date: Mon Jun 24 11:05:57 2024 -0500 rpc: grant test an additional core under `race` Closes #126092. Epic: none Release note: None commit ecd82d90fa940625c4bbff506459ce6b2961a777 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Fri Jun 21 21:26:06 2024 -0700 sql: fix stmt stats for internal executors with outer txns This commit fixes an oversight where we previously incorrectly used `fromOuterTxn` boolean when instantiating the stats container. This field was originally introduced to indicate that some schema objects shouldn't be reset when resetting the conn executor, which is the case when we have an outer txn AND we injected the descs collection, etc. For the stats container we only care about whether there is an outer txn or not. This commit splits out the boolean into two to clarify things. Now, whenever we're `underOuterTxn` we'll do the right thing for the stats container, and the old boolean `fromOuterTxn` has been renamed to `skipResettingSchemaObjects` to avoid possible confusion. I think the same problem was present with the usage of `fromOuterTxn` when ensuring that we step back the txn in the internal executor, and this is now fixed. Additionally, it adds a few TODOs about code spots we might need to change to address an issue with txn stats of internal executors. Release note: None commit c67a81e07b684a4754475f1d56a1c418d90422fe Author: Ricky Stewart <ricky@cockroachlabs.com> Date: Mon Jun 24 11:03:23 2024 -0500 liveness: grant test an additional core under `race` Closes #124124. Epic: none Release note: None commit c1b32bcfc017d162ae62c93dfd5789960e0a299a Author: Andrew Baptist <baptist@cockroachlabs.com> Date: Mon May 13 15:52:30 2024 -0400 kvclient: separate out proxy sending Previously in DistSender, a proxy request was handled using the same logic within sendPartialBatch and sendToReplicas. There were short circuits to handle the different cases of retries, but in cases with changing range boundaries, it could return the wrong error. This change intercepts proxy request at the start of DistSender.Send and sends to the transport bypassing the rest of the DistSender retry logic. This simplifies the code for proxy requests. Epic: none Fixes: #123965 Informs: #123146 Release note: None commit 7fa0c3d8ef7a170e43b3e86dd7eb386bbd0110d8 Author: Andrew Baptist <baptist@cockroachlabs.com> Date: Mon Jun 17 15:13:17 2024 -0400 kvclient: verify a proxy request is not split If a proxy request is against a range which is subsequently split, the request should still follow what the original client requested rather than the updated cache. This prevents complex errors from split requests as they can't be processed correctly. Informs: #123965 Release note: None commit cddb9da604c3675c21a677c7a031e0ee055ce92f Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Tue Jun 18 15:05:43 2024 -0400 opt: add generic query plan tests with partial indexes Tests for generic query plans with partial indexes have been added. No code changes are needed in order for a query with a filter like `col = $1` to utilize an index on `(col) WHERE col IS NOT NULL`. Release note: None commit 673fefa4c6ec559ed36c291b22c6b0bd13f5e494 Author: Nathan VanBenschoten <nvanbenschoten@gmail.com> Date: Sun Jun 23 22:36:30 2024 -0400 roachtest: improve logging in gossip/chaos/nodes=9 further Informs #124828. Informs #126077. Release note: None commit e1f4ac82f1c6416f9460354a4bcf26866f06975c Author: Anish Shanbhag <anish.shanbhag@cockroachlabs.com> Date: Fri Jun 21 13:42:33 2024 -0400 cli: add storage engine metrics to debug zip Metrics from the storage engine are already exposed in the `/debug/lsm` HTTP endpoint. These can be useful when debugging storage issues, and so this change adds these metrics to the debug zip under `/nodes/$N/lsm.txt` in the same text format as the HTTP route. The previously unused `EngineStats` status endpoint was repurposed to serve these metrics from each node. Fixes: #79518 Epic: none Release note: none commit 93b87be68b478f61d79049d2c6edf96d849301bf Author: Uzair Ahmad <uzair.ahmad@cockroachlabs.com> Date: Thu Jun 20 17:44:49 2024 -0400 sql: only compute histogram sample size once for all requested stats Previously, we computed the number of samples for histogram construction repeatedly for each stat request. This is unnecessary as this number doesn't change and pollutes logging with the computed sample size. We now compute and log the number of samples only once, if necessary. See also: #123972 Release note: None commit e830f879d27c218ef986851427e8f3861d7f771c Author: David Taylor <tinystatemachine@gmail.com> Date: Sun Jun 23 19:31:17 2024 +0000 changefeedccl: disable physical plan debug persistence Mitigation for #126083. Release note (ops change): Some debugging-only information about physcial plans is no longer collected in the system.job_info table for changefeeds due to it having the potential to grow very large. Epic: none. commit 11ddc6cc0573d76adbdaa20b3f8b119225da7a4f Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Thu Jun 20 10:30:49 2024 +0300 ui: apply antd's crlTheme in Db Console This change intended to apply Antd's theme from CC Console to components imported from Cluster UI and Antd components used directly in Db Console. Note that ContextProvider depends on its own context and it is important to import instance of `ContextProvider` from the same package as other Antd components that should inherit theme. Currently, both Db Console and Cluster UI have their own copy of `antd` package so we need to apply theme twice with every ContextProvider. Release note: None commit 568258fde26727634f7622c75a9710b595ba6ad3 Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Mon Jun 17 15:00:43 2024 +0300 ui: fix Storybooks for Db Console This patch changes Storybooks configuration to rely on `main.js` file which is a default place for configs now. Also, it moves default global styles imports to decorators file. Release note: None commit 64874ec80a094a02f13a6320ee747fb9db786a56 Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Mon May 13 18:37:15 2024 +0300 ui: upgrade antd to v5 in db-console - update `antd` version to match the version as in CC Console. - `rc-drawer` is added as a peer dependency to make Typescript properly infer types. - remove all `antd` style imports from the code. `Antd@5` uses css in js and doesn't require style imports. Release note: None commit b6c1fdc5a6b9cf1f1308145687ea34b78a57b001 Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Fri May 24 12:01:15 2024 +0300 ui: fix JobsProfileView component styles This patch doesn't affect how `JobsProfileView` component looks, instead it fixes some broken styles after `antd` package has been upgraded. Also, `Space` components are used for better layout structuring instead of `<Row>` and `<Col>` components. Release note: None commit 76515bd1d531338a00acb97c7d23503610ca5ceb Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Mon May 20 16:03:21 2024 +0300 ui: re-work Modal and Diagnostics modal ActivateStatementDiagnosticsModal component is based on pure Antd components and has been dramatically affected after upgrade of Antd version to 5.x. The main issue that new components are based on flexbox styles and it caused many changes in layout and how components structured. So this change introduced following changes: - avoid using `<div>` elements for structuring layout, instead use <Space> and <Grid> components. - Remove unused styles and rely on antdTheme instead as much as possible. There is some styles defined which cannot be defined with theme. Release note: None commit 5b98f977c27520f0af5068c15502aba46eae0281 Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Mon May 13 18:27:48 2024 +0300 ui: fix Search component styling after antd version upgrade This change adjusts styling of <Search /> component after antd version upgrade that affected styles. - cleaned up some style that override `antd` styles as they don't make any sense; - removed `<Form />` component as a wrapper that was responsible to handle event when user presses Enter button. Now it is possible to handle within `<Input />` component. Release note: None commit a2a6feb4c26e1ba840665124e043c0da19beee31 Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Mon May 13 16:08:06 2024 +0300 ui: fix styles of DateRangeMenu component - default button type with borders was overridden by global styles so former were removed; - time and date selectors size is set to 'large' to keep the same size of component as before; Release note: None commit 1bca81e1d49781e7528d185f83c637151bec9052 Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Mon May 20 16:02:39 2024 +0300 ui: add antd theme to cluster-ui package This change adds a copy of antd theme from CC Console to rely on the same styles. It will help to adjust components to the same styles as in CC Console during extraction components. Release note: None commit 6b9df57c064603ba8459e92301b682ef2dd46ed9 Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Sat Apr 27 00:28:32 2024 +0300 ui: resolve conflicts in components that depend on Antd package in cluster-ui This change resolves issues that couldn't be fixed automatically during `antd` version upgrade and require changes to resolve breaking changes. Following components were mostly affected: - `<Icon/>` - `<DatePicker/>` - `<Select/>` In addition, proper types were provided due to resolve compilation errors. commit 2ed76134fa686c06e4f75266878a67c9ee7c9c39 Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Thu Apr 25 20:24:17 2024 +0300 ui: automatic fixes for antd migration 4.x -> 5.x in cluster-ui Those changes produced by command: `pnpm --package=@ant-design/codemod-v5 dlx antd5-codemod src` which applies automatic fixes during Antd version upgrade to 5v. Release note: None commit e743d84f9c54a1b03e9b7288da5bcdafa69c5da0 Author: Andrii Vorobiov <and.vorobiov@gmail.com> Date: Thu Apr 25 19:54:19 2024 +0300 ui: upgrade antd to 5.x version in cluster-ui - upgrade and version to 5.6.1 in package.json. It is the same version of antd that's used in CC Console. - remove babel configs related to antd (deprecated) - remove overridden antd global styles - all of these changes are done according to migration instruction https://ant.design/docs/react/migration-v5 Release note: None commit adbc9a3745ae5d0bea65324d70a4ec1d798373ff Author: Andrew Kimball <andyk@cockroachlabs.com> Date: Thu Jun 20 16:04:49 2024 -0700 upgrades: add migration that creates system tables in all tenants A change in 24.1 creates all system tables in new app tenants, even if the system tables are only used by the system tenant. However, app tenants created before 24.1 are missing these system tables. This commit adds a migration that adds the missing system tables to existing app tenants. Epic: none Release note: All system tables are now visible in app tenants that were created before v24.1. commit 3f282e2d9aaa424905f3f0181f9f00f9455ee92a Author: Steven Danna <danna@cockroachlabs.com> Date: Wed Jun 5 14:02:04 2024 +0100 roachtest: small ldr roachtest This is a small roachtest for LDR. It would be nice to share more infrastructure with the cluster to cluster tests, but I am hoping that happens a bit more organically as we develop the corpus of LDR tests. Epic: none Release note: None commit a302178779f9814f632a0a6e1e4e61ddaa432655 Author: Steven Danna <danna@cockroachlabs.com> Date: Wed Jun 19 14:21:38 2024 +0100 streamingccl/logical: quote column and constraint names This escapes column and constraint names so that if they contain special characters the generated query is still correct. Epic: none Release note: none commit 4e03048e7d8146cdfb997c909070af6467c32897 Author: Vidit Bhat <vidit.bhat@cockroachlabs.com> Date: Mon Jun 10 18:41:30 2024 +0530 drt: operations framework artifacts aren't persisted Previously, we were incorrectly claiming that operation logs were being persisted. This was inadequate because it made debugging difficult. This patch ensures that we don't log the misleading statement. We have decided not to persist the logs since they are already being printed to stderr/stdout. Epic: none Fixes: #125307 Release note: None commit 590d13f5bd3d354a849dd4792905c8d46a1c2ebe Author: Annie Pompa <annie@cockroachlabs.com> Date: Wed May 29 14:56:02 2024 -0400 schemachanger: support DB zone config This patch adds the functionality to configure a zone configuration on a database. Informs: #117574 Release note: None commit 32fd179b21b0300662f8905a2f9b37a75b39c208 Author: Annie Pompa <annie@cockroachlabs.com> Date: Wed May 29 14:44:08 2024 -0400 schemachanger: add zone config interfaces/functions in builder This patch adds the necessary interfaces/functions for zone config verification and retrieval. Informs: #117574 Release note: None commit 35d96a3a5bf7efd02f71e339fc13c6cdfaf18112 Author: Xiang Gu <xiang@cockroachlabs.com> Date: Mon Jan 8 16:58:18 2024 -0500 scpb: Introduce DatabaseZoneConfig element This commit introduces the DatabaseZoneConfig element that will become relevant for supporting `ALTER DATABASE ... CONFIGURE ZONE` in DSC. Release note: None commit 655635939a499e2e846c43647e4c43bd858644a9 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Mon Jun 17 19:20:39 2024 -0700 logical: add optimistic INSERT strategy This commit extends the LWW row processor to apply the "optimistic" INSERT strategy (gated on a newly-added `logical_replication.consumer.try_optimistic_insert.enabled` cluster setting which defaults to `true`). The idea is that if we see that previous value is nil in the KVWithDiff, then it means that either we are processing an initial scan or there was no conflict on the source cluster, so we're assuming that there is no conflicting row on the destination too, so we can proceed with `INSERT` statement without `ON CONFLICT` clause, which internally translates into the blind write. If there is no conflict, great! If there is a conflict, then we have to fall back to the "pessimistic" `INSERT ON CONFLICT` statement, which has read-then-write KV behavior. Note that this optimistic strategy of evaluation of `INSERT ON CONFLICT` statements might be built directly into the DB at some point by SQL Queries (possibly soonish), but for now we choose to handle it at the "application" level since we have a tight timeline. Additionally, this commit adds a metric for the number of times we encountered a conflict and had to fall back to the pessimistic path. Release note: None commit 1f3643c2c126175e3628fb20e71480b660160b84 Author: Nathan VanBenschoten <nvanbenschoten@gmail.com> Date: Fri Jun 21 17:21:13 2024 -0400 kv: deflake TestNodeLivenessRetryAmbiguousResultError Fixes #125947. This commit fixes a flake in TestNodeLivenessRetryAmbiguousResultError, where the short-circuit path for a manual liveness heartbeat could break an assertion in the test. To avoid this flakiness, pause the heartbeat loop before we grab the liveness record to ensure that we don't race with the heartbeat loop in a way that allows our manual heartbeat to short-circuit and fail to hit an injected error. Release note: None commit 7bf718eb7cbb30394d75eebad71d590520c78854 Author: Nathan VanBenschoten <nvanbenschoten@gmail.com> Date: Fri Jun 21 16:58:55 2024 -0400 kv: use new atomic types in TestNodeLivenessRetryAmbiguousResultError Cleans up the test. Epic: None Release note: None commit 9c0336a238df793171a9326bfa8462ee37c2b917 Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Mon Jun 17 14:01:29 2024 -0400 opt: reduce exploration for joins created by ConvertSelectWithPlaceholdersToJoin Join reordering and merge join exploration is now disabled for joins created by the `ConvertSelectWithPlaceholdersToJoin` rule. The rule's main goal is to assist in generating lookup joins. This changes reduces unnecessary exploration. Release note: None commit 385f4db9d598fca722c22eadd2697dca3c303e7e Author: Marcus Gartner <marcus@cockroachlabs.com> Date: Mon Jun 17 13:34:33 2024 -0400 opt: add ConvertSelectWithPlaceholdersToJoin exploration rule The `ConvertSelectWithPlaceholdersToJoin` exploration rule has been added which enables optimizations to apply to query plans where placeholder values are not known. See the rule's comments for more details. The rule does not currently apply to any query plans because it only matches Select expressions with filters that have placeholders, and we currently replace all placeholders with constant values before exploration. It will be used in the future when we enable exploration of generic query plans. Release note: None commit bf0cf0ade763cd865c6414366b22bccabdb074a4 Author: DarrylWong <darryl@cockroachlabs.com> Date: Fri Jun 14 11:15:06 2024 -0400 roachprod: add default auth-mode env var This change adds a ROACHPROD_DEFAULT_AUTH_MODE env var which sets the default auth mode used by roachprod CLI and roachprod pgurl expander if not specified. This env var is unset at the start of every roachtest run. Release note: none Fixes: none Epic: none commit baa9437ef8a27d1531170790c9417d0fbc12826d Author: Rafi Shamim <rafi@cockroachlabs.com> Date: Fri Jun 21 13:34:04 2024 -0400 compose: don't use DebuggableTempDir helper The helper is not needed since this test is never executed remotely. Also, the usage of it in older branches broke the test. Release note: None commit c658c9124d864e98a77bb80d1485b7f42897a4c9 Author: Nathan VanBenschoten <nvanbenschoten@gmail.com> Date: Fri Jun 21 12:40:50 2024 -0400 kv: include ReplicaNeedsSnapshotStatus in allocator log Informs #125893. This commit includes ReplicaNeedsSnapshotStatus in the allocator log in excludeReplicasInNeedOfSnapshots. This will help us understand why a replica is not considered for lease transfer. It also adds information about the candidate replica ID and replica type by deferring to ReplicaDescriptor.String instead of re-implementing it. Release note: None commit 362db3757bef52460d7690733fc752ef1628f305 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Mon Jun 17 11:24:56 2024 -0700 cli: fix statement-bundle recreate for cluster settings In the recently merged change we included `SET CLUSTER SETTING` statements into `env.sql` file for all cluster settings that have non-default values. However, this broke `debug sb recreate` command because SET CLUSTER SETTING stmts aren't allowed to be in the multi-stmt implicit txn. This commit fixes that oversight by splitting out the contents of `env.sql` file so that each SET CLUSTER SETTING would get its own implicit txn. Release note: None commit 3190dd7a438512965af678d7c25b32ca01548163 Author: Ricky Stewart <ricky@cockroachlabs.com> Date: Fri Jun 21 11:20:27 2024 -0500 builtins: grant test more memory under `race` Closes #125899 Epic: none Release note: None commit 26e783de779ce3382e0235539629dd8357a8ecb2 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Tue Jun 18 16:43:27 2024 -0700 sql: address a couple of nits around conn executor constructor This commit removes redundant call to `initPlanner` in the `postSetupFn` (we already unconditionally call `initPlanner` right after `postSetupFn` in the constructor) as well as avoids allocations whenever `postSetupFn` becomes a no-op. Release note: None commit 3437e5a53b0986e838a9fead34e71bd188fbbc8b Author: Radu Berinde <radu@cockroachlabs.com> Date: Fri Jun 14 13:10:53 2024 -0700 scripts: add script to check Pebble deps Sometimes we backport Pebble changes to `pebble/crl-release-` branches but forget to bump the dep in crdb. This script checks for this and automatically files issues. The intention is to run it nightly in a GitHub Action. Epic: none Release note: None commit df2e8d72d6feaafa6c5361869ba58b0a6074dff3 Author: Yahor Yuzefovich <yahor@cockroachlabs.com> Date: Thu Jun 13 14:21:50 2024 -0700 util/duration: fix up recent change for special casing MulFloat This commit extends recently merged change to improve `MulFloat` to also apply to `DivFloat` and adds a couple of unit tests. Release note: None commit 42e1713dfc61cf6269facfb7f4ac05f468ece5bb Author: Nathan VanBenschoten <nvanbenschoten@gmail.com> Date: Thu Jun 20 17:00:42 2024 -0400 sql/colexecerror: implement Unwrap on StorageError Fixes #125922. In #125922, we saw flakiness in `TestAdminDecommissionedOperations` due to unreliable grpc error code propagation. This commit fixes the issue by implementing `Unwrap` on `colexecerror.StorageError` so that the grpc error code is propagated correctly. This was broken before because `grpc/status.FromError` uses `errors.As` from the standard library, which only respects `Unwrap()` and does not respect `Cause()`. As a result, grpc codes of any error wrapped in a `colexecerror.StorageError` were being lost. The test was flaky and not entirely broken because it would sometimes hit a code path that wrapped the grpc error in a `StorageError` and sometimes hit a code path that did not. Release note: None commit 2c52056b20e48680174e1ffdd08908637b87bb4b Author: Matt Spilchen <matt.spilchen@cockroachlabs.com> Date: Fri Jun 21 10:01:51 2024 -0300 sql/schemachanger: block data type alter of col used in computed cols Previously, altering the data type of a column used in a generated column was allowed, causing downstream issues when querying the generated column. To be consistent with postgres, we will now block any attempt to alter the data type of a column used in a computed column. The update to event_test.go was needed because it was doing an ALTER that is now blocked. The test was changed to not reference a column in the computed expression that we intend to change the data type on. Epic: CRDB-25314 Closes: #72457 Release note (bug fix): Attempts to alter the data type of a column used in a computed column expression are now blocked. Signed-off-by: Matt Spilchen <matt.spilchen@cockroachlabs.com> commit a950845f399d0655543c78adbda8feebbb85068c Author: Anish Shanbhag <anish.shanbhag@cockroachlabs.com> Date: Fri Jun 14 14:51:14 2024 -0400 ui: show per-store metrics in storage dashboard Many metrics on the storage dashboard are aggregated across all stores, which can be misleading. This change breaks down many storage metrics on a per-store basis to display if certain stores are causing issues. The store-level metrics are only shown when filtering to a single node on the metrics dashboard in order to increase usability when there are thousands of stores in a single cluster. Also fixes incorrect aggregation of certain metrics across stores; read amplification is now aggregated using averaging, and latency metrics are aggregated using maximum. Epic: none Fixes: #125696, #90734 Release note (ui change): Updated Storage dashboard graphs to show most metrics on a per-store basis when viewing a single node's metrics. This provides increased visibility into issues caused by specific stores on each node. commit ca763a58cd8431d8d81feb679ce910768555c7dd Author: Renato Costa <renato@cockroachlabs.com> Date: Thu Jun 20 14:04:52 2024 -0400 roachtest: don't verify cluster restores on unsupported binaries This commit updates the `backup-restore/mixed-versions` test such that we don't attempt to run cluster restores on binaries that have reached EOL. These issues, if any, won't be fixed and would just add noise. Fixes: #125916 Release note: None commit c7e5667cec0659297ba691ab038f5150aeb5799d Author: Renato Costa <renato@cockroachlabs.com> Date: Thu Jun 20 14:04:07 2024 -0400 mixedversion: bump `OldestSupportedVersion` to 23.1 Now that 22.2 has reached EOL, we only exercise functionality if the cluster is running at least 23.1 binaries. Epic: none Release note: None commit 17bb6b12fc14e518d9cea0c72c0193f734327bfe Author: Renato Costa <renato@cockroachlabs.com> Date: Thu Jun 20 14:03:26 2024 -0400 mixedversion: automatically set build versions for every test This allows us to change the `OldestSupportedVersion` constant without changing the output of any test. Epic: none Release note: None commit 7d384d515358527a448b449e9e334cf008147fa5 Author: Akshay Joshi <akshay@cockroachlabs.com> Date: Thu Jun 20 15:10:51 2024 +0530 cli: fix non determinism of TestTenantZip Previously, TestTenantZip was generating non deterministic output due to cpu profile files retrieval logs.To address this, we are removing cpu profile file retrieval logs file during comparison with golden data. Epic: none Fixes: #125036 Release note: None commit d4ea52d80e73e5715254d8e43f73977645f25e7e Author: Herko Lategan <herko@cockroachlabs.com> Date: Thu Jun 20 17:37:51 2024 +0100 roachtest: fix node kill drain variant Previously, if the drain variant of the node kill operation ran it would fail by not being able to find the binary since the remote run command required a `./` prefix to the binary. But even if it did find the binary the constructed `pgurl` fails, because it passes the incorrect certificate required by drain. This code has been updated to rather use the readily available `certs` directory, or `insecure` if the cluster is not secure. And instead passes a `host:port` versus a postgres URL. Epic: None Release Note: None commit 81887058992436bae4c919f1b8586a32ce395abe Author: Arjun Mahishi <arjun.mahishi@gmail.com> Date: Thu Jun 13 16:58:10 2024 +0530 pkg/cli: tsdump - fix custom SQL port behaviour When CRDB is running with a custom gRPC and SQL port, users have the option to provide the gRPC address using the `--host` flag. An option for providing the custom SQL address is the `--url`. But, tsdump doesn't seem to be using the address provided as part of `--url`. This commit fixes that. When the `--host` flag is provided, we can connect to the gRPC server and get the Node Details. This will give us the SQL address. So, there is no need for the user to specify the SQL port separately at all. This is how it's handled in the `debug zip` command. Since `debug zip` and `tsdump` are often used together, it's important to have a consistent behaviour across both of them. Part of: CRDB-39382 Fixes: #125315 Release note (bug fix): Fix the bug in tsdump where the command fails when a custom SQL port is used and the `--format=raw` flag is provided commit 05820a8e05a4d17e06cc7c682d0c18a571dfb619 Author: Radu Berinde <radu@cockroachlabs.com> Date: Thu Jun 20 14:15:04 2024 -0700 go.mod: bump Pebble to 1b880b3441f2 Changes: * [`1b880b34`](https://github.com/cockroachdb/pebble/commit/1b880b34) provider: consult readahead config when initializing handle * [`71d59d67`](https://github.com/cockroachdb/pebble/commit/71d59d67) objstorageprovider: reuse buf from pooled remoteHandle * [`41441bc6`](https://github.com/cockroachdb/pebble/commit/41441bc6) keyspan: remove error from FragmentIterator.Close * [`b17532f1`](https://github.com/cockroachdb/pebble/commit/b17532f1) sstable: remove unused closeHook in fragmentBlockIter Release note: none. Epic: none. commit b1428cc1b3a00f0f62e2518b99cfaa5308c5658d Author: Michael Erickson <michae2@cockroachlabs.com> Date: Thu Jun 20 12:12:44 2024 -0700 memo: add missing cases to memo.(*statisticsBuilder).colStat Add all operators for which we call memo.(*logicalPropsBuilder).buildBasicProps to the colStatUnknown case of colStat. Fixes: #96993 Release note (bug fix): Fix a bug in which some DDL and administrative statements used within a CTE would fail with an "unrecognized relational expression type" internal error. commit 218dfefe48112ab8b23c1eafb77ecd9ce8437893 Author: Arul Ajmani <arulajmani@gmail.com> Date: Wed Jun 19 12:22:24 2024 -0400 kvfollowerreadsccl: deflake TestFollowerReadsWithStaleDescriptor This test wants to tightly control where leases are placed to make assertions about follower reads. Most of this is achieved by running the cluster in replication mode manual, which disables the lease queue. However, we're still susceptible to load based rebalancing transferring leases unexpectedly, which could cause the test to fail. Disable the store rebalancer. While here, we also do the same for TestSecondaryTenantFollowerReadsRouting which could also run into this. Closes #125898 Release note: None commit e989daa290a5beb917ba47fb95a8607bc01d200c Author: Rafi Shamim <rafi@cockroachlabs.com> Date: Thu Jun 20 12:53:21 2024 -0400 randgen: avoid overflow in computed column expressions The computed column definition could lead to overflow errors since it adds a number that is randomly chosen and could be very large. Using absolute value instead avoids this. Release note: None commit a1ccb39a4c8714c9b0aa73b54cd62a6dc67bfb56 Author: Rafi Shamim <rafi@cockroachlabs.com> Date: Thu Jun 20 15:44:54 2024 -0400 workload/schemachange: avoid selecting non-public enum members This prevents the workload from trying to reference an enum member that is not yet public. Release note: None commit 1ea0579c7be7b412ad1505dd9cf91faf289cb2de Author: Faizan Qazi <faizan@cockroachlabs.com> Date: Wed Jun 19 23:12:24 2024 +0000 SQL: CREATE TABLE can fail on transaction retries Previously, when a CREATE TABLE definition included an index expression, we modified the original abstract syntax tree (AST) to have the index reference a new virtual column for the expression. However, if a transaction retry occurred, this AST modification was not handled correctly because the new virtual column was not public. To resolve this, this patch ensures that the index element nodes are copied before any changes are made. Fixes: #125619 Release note (bug fix): CREATE TABLE with index expressions could hit undefined column errors on transaction retries commit 835d3b8f0e45fdd3f2cc2e566939a35eff6818e5 Author: Nathan VanBenschoten <nvanbenschoten@gmail.com> Date: Fri Jun 14 17:34:12 2024 -0400 kv: move leaseStatus to leases package Informs #123498. This commit moves the leaseStatus function to the leases package. This allows for easier unit testing and helps to further consolidate leasing logic in a single location. Code movement only, not behavior changes. Release note: None commit 5cdbb5ead66eb0f6c5ba9931e0e5149501a3fd08 Author: Nathan VanBenschoten <nvanbenschoten@gmail.com> Date: Fri Jun 14 17:36:14 2024 -0400 kv: rename leases.Input to leases.BuildInput Simple rename. Epic: None Release note: None commit edc2918ca82fde99c3177e2531e7acecb5c8555e Author: David Hartunian <davidh@cockroachlabs.com> Date: Tue Jun 18 15:33:41 2024 -0400 log: use httptest for http sink tests Previously, we had a custom test server using a raw tcp connection. This server would cause occasional flakes and has been replaced with a standard test HTTP server to more reliably mimic an HTTP sink The hung server test is also amended to use a waitGroup and also check if we log too quickly since that would mean we didn't hang at all. Resolves: #125753 Resolves: #125725 Resolves: #125540 Resolves: #125466 Resolves: #125389 Resolves: #125358 Resolves: #125005 Resolves: #124973 Resolves: #124642 Resolves: #124596 Release note: None commit 5179df92698233ce6fcdcd1fbfb6d230e9401eec Author: Xin Hao Zhang <xzhang@cockroachlabs.com> Date: Thu Jun 20 11:10:11 2024 -0400 sqlstats: revert turning off sql activity tables This commit reverts PR #123381 which turned off the sql activity update job by default. We'll punt disabling this flush job to the next major release. The justification for this is we changed some cluster settings for sql stats for 24.1.2 that will cap the number of rows per aggregation interval. We'll wait for existing clusters to stabilize with this cap before disabling the cache entirely. Epic: none Part of: #123081 Release note: None commit f555b11a654bddd0b5101a5ff72b0a5a86d846a3 Author: Nathan VanBenschoten <nvanbenschoten@gmail.com> Date: Thu Jun 20 12:41:09 2024 -0400 kv: deflake TestWedgedReplicaDetection Closes #125826. This commit deflakes TestWedgedReplicaDetection by waiting to ensure that the leader replica has three entries in its lastUpdateTimes map before wedging a follower replica. This should already be the case by the time we run the SucceedsSoon because the leader was able to replicate a log entry to its two followers, but we wait to be sure and to avoid flakiness. The logging from the previous commit revealed that it is possible that the WaitForValues call returned as soon as one of the followers appended and applied a log entry, but before its response was delivered to the leader. Release note: None commit 5c50fedd27101ebf7c5065bcf43ef025edd20419 Author: Nathan VanBenschoten <nvanbenschoten@gmail.com> Date: Thu Jun 20 12:09:29 2024 -0400 kv: improve logging in TestWedgedReplicaDetection Informs #125826. This commit makes it easier to debug the flake in TestWedgedReplicaDetection by adding more logging. Release note: None commit e9debdfbce2c426e790f777d1e59bca38d74d198 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 13:47:07 2024 -0400 clean commit 9d8b6ea2988c00d690c102d8a1ab78de98d7b39e Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 25 13:45:57 2024 -0400 undo batching sink changes commit b7d358a778151a09a0bca124db7066e92160d1d3 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jun 24 13:10:03 2024 -0400 topic creation commit 0a228a4832996f532aa74764490525bfd20c9c12 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jun 24 12:46:05 2024 -0400 enable new kafka sink in all roachtests manually commit b5a90f56638c5a12f9aed674891c4665e1ffff1b Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jun 24 10:27:56 2024 -0400 cl commit ee467027334ea90d3406d92e2c1991518a331f45 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 16:34:14 2024 -0400 fix commit fbb9fddfe54293c7dab36038a744913e664ecdcb Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 16:24:26 2024 -0400 add kgop to part consistency test commit bb923ea242de3c1fc9959ec6173b04d9b92477ec Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 15:49:08 2024 -0400 pls commit 809ad61b1f8bdde3bbd7e7e0a3ec607f50dbf0be Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 15:47:43 2024 -0400 :( commit 741230adfd61f9d97bc92bdada3e3115c1897cbc Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 15:42:06 2024 -0400 wat commit ac9199f49d2060f640d477269dde4aa56ca3297d Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 15:33:31 2024 -0400 wat commit 90491f26003610c0f6feb0b2f679e35b70ad01eb Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 15:32:27 2024 -0400 wat commit f2f538e44060b9df7f242fd9ac8c814833aaf1d4 Merge: fc12ad8c1d0 8a43fac15a4 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 15:25:34 2024 -0400 Merge branch 'master' of github.com:asg0451/cockroach into kafka-batching-sink commit fc12ad8c1d0a5b6038ccddb1e70bf88b527a139a Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 15:13:11 2024 -0400 wat commit 66b92fd674d61130834552b326891f417e9b365a Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 14:43:40 2024 -0400 testwip commit 944c79c5eaee5c59f1ea06a0c85178716575ebca Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 12:56:11 2024 -0400 fixup commit e7fae043daf9f7a160153bbccab059b964edaeb2 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Fri Jun 21 12:46:55 2024 -0400 test commit b115aa6265cb9ad7f688e274577581044cf4f586 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Tue Jun 18 10:01:26 2024 -0400 fix tests commit 9649a46ee80ab2598b28ad580646ca3aed90c467 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jun 17 17:04:13 2024 -0400 wip commit d5ec36afe62f5d0c1e505cf801c02a670649801f Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jun 17 16:20:48 2024 -0400 cleran commit dfed215eccfdc7d73739cfd9c255a624cad8e4e0 Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jun 17 15:27:30 2024 -0400 c commit fd87c7e808ec9e7a83871d9d6a97cd72e596e0aa Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jun 17 15:27:16 2024 -0400 topic namer tests commit 7534ef6fef229bad9ce59d528c63b10ed9ad1e3e Author: Miles Frankel <miles.frankel@cockroachlabs.com> Date: Mon Jun 17 15:08:11 2024 -0400 testwip commit aa0f83f48f4c92483b36f913842bfae82bade889 Author: Miles Frankel …
This failed again with the new logging (#126545 (comment)), but there's still an unexplained pause here. |
We have marked this test failure issue as stale because it has been |
roachtest.gossip/chaos/nodes=9 failed with artifacts on master @ d5f825ce04362a75781b75e19cf429f2ed5ed61b:
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=4
ROACHTEST_encrypted=false
ROACHTEST_metamorphicBuild=false
ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
See: Grafana
This test on roachdash | Improve this report!
Jira issue: CRDB-39090
The text was updated successfully, but these errors were encountered: