-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/partitionccl: TestRepartitioning failed under stress #43707
Comments
(ccl/partitionccl).TestRepartitioning failed on master@ae3bf3e4f9a32f491f09846750ad88bef2be1c5d: Fatal error:
Stack:
Log preceding fatal error
ReproParameters:
powered by pkg/cmd/internal/issues |
cc @andreimatei looks like you recently upgraded this error to a Fatal? |
yeah, let me see. I've seen this crash somewhere else too, so maybe I should revert something. |
Before this patch, the refresher interceptor was erroneously asserting its tracking of the refreshed timestamp is in sync with the TxnCoordSender. It may, in fact, not be in sync in edge cases where a refresh succeeded but the TxnCoordSender doesn't hear about that success. Touches cockroachdb#38156 Touches cockroachdb#41941 Touches cockroachdb#43707 Release note: None
Before this patch, the refresher interceptor was erroneously asserting its tracking of the refreshed timestamp is in sync with the TxnCoordSender. It may, in fact, not be in sync in edge cases where a refresh succeeded but the TxnCoordSender doesn't hear about that success. Touches cockroachdb#38156 Touches cockroachdb#41941 Touches cockroachdb#43707 Release note: None
44407: storage: improve the migration away from txn.DeprecatedOrigTimestamp r=andreimatei a=andreimatei 19.2 doesn't generally set txn.ReadTimestamp. Instead, it sets txn.DeprecatedOrigTimestamp. Before this patch, all code dealing with txn.ReadTimestamp had to deal with the possibility of it not being set. This is fragile; I recently forgot to deal with it in a patch. This patch sets txn.ReadTimestamp to txn.DeprecatedOrigTimestamp when it wasn't set, thereby releaving most other code of they worry. This comes at the cost of an extra txn clone for requests coming from 19.2 nodes. Release note: None 44428: storage: fix handling of refreshed timestamp r=andreimatei a=andreimatei Before this patch, the refresher interceptor was erroneously asserting its tracking of the refreshed timestamp is in sync with the TxnCoordSender. It may, in fact, not be in sync in edge cases where a refresh succeeded but the TxnCoordSender doesn't hear about that success. Touches #38156 Touches #41941 Touches #43707 Release note: None 44503: roachpb: fix txn.Update() commutativity r=andreimatei a=andreimatei Updates to the WriteTooOld field were not commutative. This patch fixes that, by clarifying that the transaction with the higher ReadTimestamp gets to dictate the WriteTooOld value. I'm not sure what consequences this used to have, besides allowing for the confusing case where the server would receive a request with the WriteTooOld flag set, but with the ReadTimestamp==WriteTimestamp. A future commit introduces a sanity assertion that all the requests with the WTO flag have a bumped WriteTimestamp. Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
(ccl/partitionccl).TestRepartitioning failed on master@120d53fc17f1f027c6075a2050f2991e369c5293:
MoreParameters:
See this test on roachdash |
(ccl/partitionccl).TestRepartitioning failed on master@c473f40078994551cebcbe00fdbf1fa388957658:
MoreParameters:
See this test on roachdash |
(ccl/partitionccl).TestRepartitioning failed on master@e1e26af4e20a4e7be3b4b157e6d7c72fbf44c10d:
MoreParameters:
See this test on roachdash |
(ccl/partitionccl).TestRepartitioning failed on master@3b612692db93aa7c87493705e1fad85c9c664f6c: Fatal error:
Stack:
Log preceding fatal error
MoreParameters:
See this test on roachdash |
48190: sql: inject tenant ID in sqlServerArgs, pass through ExecutorConfig r=nvanbenschoten a=nvanbenschoten Fixes #47903. Informs #48123. Also known as "the grand plumbing", this change replaces a few instances of `TODOSQLCodec` in `pkg/sql/sqlbase/index_encoding.go` and watches the house of cards fall apart. It then glues the world back together, this time using a properly injected tenant-bound SQLCodec to encode and decode all SQL table keys. A tenant ID field is added to `sqlServerArgs`. This is used to construct a tenant-bound `keys.SQLCodec` during server creation. This codec morally lives on the `sql.ExecutorConfig`. In practice, it is also copied onto `tree.EvalContext` and `execinfra.ServerConfig` to help carry it around. SQL code is adapted to use this codec whenever it needs to encode or decode keys. If all tests pass after this refactor, there is a good chance it got things right. This is because any use of an uninitialized SQLCodec will panic immediately when the codec is first used. This was helpful in ensuring that it was properly plumbed everywhere. 48245: kv: declare write access to AbortSpan on all aborting EndTxn reqs r=nvanbenschoten a=nvanbenschoten Fixes #43707. Fixes #48046. Fixes #48189. Part of the change made by #42765 was to clear AbortSpan entries on non-poisoning, aborting EndTxn requests. Specifically, this change was made in 1328787. The change forgot to update the corresponding span declaration logic to reflect the fact that we were now writing to the AbortSpan in cases where we previously weren't. This was triggering an assertion in race builds that tried to catch this kind of undeclared span access. The assertion failure was very rare because it required the following conditions to all be met: 1. running a test with the race detector enabled 2. a txn (A) must have been aborted by another txn (B) 3. txn B must have cleared an intent on txn A's transaction record range 4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT) We should backport this when we get a change (once v20.1.0 has stabilized), but I don't expect that this could actually cause any issues. The AbortSpan update was strictly a matter of performance and we should never be racing with another request that is trying to read the same AbortSpan entry. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Fixes cockroachdb#43707. Fixes cockroachdb#48046. Fixes cockroachdb#48189. Part of the change made by cockroachdb#42765 was to clear AbortSpan entries on non-poisoning, aborting EndTxn requests. Specifically, this change was made in 1328787. The change forgot to update the corresponding span declaration logic to reflect the fact that we were now writing to the AbortSpan in cases where we previously weren't. This was triggering an assertion in race builds that tried to catch this kind of undeclared span access. The assertion failure was very rare because it required the following conditions to all be met: 1. running a test with the race detector enabled 2. a txn (A) must have been aborted by another txn (B) 3. txn B must have cleared an intent on txn A's transaction record range 4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT) We should backport this when we get a change (once v20.1.0 has stabilized), but I don't expect that this could actually cause any issues. The AbortSpan update was strictly a matter of performance and we should never be racing with another request that is trying to read the same AbortSpan entry.
SHA: https://github.com/cockroachdb/cockroach/commits/5b73cd419b514452492226c0ea1b19639356ebb8
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1671818&tab=buildLog
The text was updated successfully, but these errors were encountered: