-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
teamcity: failed test: TestConcurrentRaftSnapshots #39661
Labels
Milestone
Comments
cockroach-teamcity
added
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
labels
Aug 14, 2019
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Aug 16, 2019
Fixes cockroachdb#39652. Fixes cockroachdb#39661. Fixes cockroachdb#35144. This commit fixes the referenced issues by eliminating the practice of updating the transaction coordinator's proto directly from its heartbeat loop. This was problematic because of the race described in https://github.com/cockroachdb/cockroach/blob/dc3686f79b3750500efaff7092c81a3e5ce6d02c/pkg/kv/txn_interceptor_heartbeater.go#L357-L364. The heartbeat loop doesn't know if it's racing with an EndTransaction request and it could incorrectly interpret a missing transaction record if it is. The safest thing to do is to limit the path in which it informs the TxnCoordSender of updates. This limits the responsibility of the heartbeat loop. Its job is now only to: 1. update the transaction record to maintain liveness 2. eagerly clean up a transaction if it is found to be aborted 3. inform the transaction coordinator about an aborted transaction record IF the transaction coordinator is continuing to send requests through the interceptor. Notably, the heartbeat loop no longer blindly updates the transaction coordinator's transaction proto. There wasn't a strong reason for it to be able to do so, especially now that we no longer push transactions or ratchet their priority frequently. Moreover, even if those were still frequent occurrences, updating the proto from the heartbeat loop prevented usual restart handling from being used. For instance, doing so might prevent us from refreshing the transaction. All in all, allowing this didn't seem worth the complexity. This commit also includes some cleanup. For instance, it removes a confusing dependency where the txnHeartbeater called back into the TxnCoordSender. It also addresses a longstanding TODO to actually unit test the txnHeartbeater. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Aug 21, 2019
Fixes cockroachdb#39652. Fixes cockroachdb#39661. Fixes cockroachdb#35144. This commit fixes the referenced issues by eliminating the practice of updating the transaction coordinator's proto directly from its heartbeat loop. This was problematic because of the race described in https://github.com/cockroachdb/cockroach/blob/dc3686f79b3750500efaff7092c81a3e5ce6d02c/pkg/kv/txn_interceptor_heartbeater.go#L357-L364. The heartbeat loop doesn't know if it's racing with an EndTransaction request and it could incorrectly interpret a missing transaction record if it is. The safest thing to do is to limit the path in which it informs the TxnCoordSender of updates. This limits the responsibility of the heartbeat loop. Its job is now only to: 1. update the transaction record to maintain liveness 2. eagerly clean up a transaction if it is found to be aborted 3. inform the transaction coordinator about an aborted transaction record IF the transaction coordinator is continuing to send requests through the interceptor. Notably, the heartbeat loop no longer blindly updates the transaction coordinator's transaction proto. There wasn't a strong reason for it to be able to do so, especially now that we no longer push transactions or ratchet their priority frequently. Moreover, even if those were still frequent occurrences, updating the proto from the heartbeat loop prevented usual restart handling from being used. For instance, doing so might prevent us from refreshing the transaction. All in all, allowing this didn't seem worth the complexity. This commit also includes some cleanup. For instance, it removes a confusing dependency where the txnHeartbeater called back into the TxnCoordSender. It also addresses a longstanding TODO to actually unit test the txnHeartbeater. Release note: None
craig bot
pushed a commit
that referenced
this issue
Aug 21, 2019
39699: kv: don't update transaction proto directly from heartbeat loop r=nvanbenschoten a=nvanbenschoten Fixes #39652. Fixes #39661. Fixes #35144. This commit fixes the referenced issues by eliminating the practice of updating the transaction coordinator's proto directly from its heartbeat loop. This was problematic because of the race described in https://github.com/cockroachdb/cockroach/blob/dc3686f79b3750500efaff7092c81a3e5ce6d02c/pkg/kv/txn_interceptor_heartbeater.go#L357-L364 The heartbeat loop doesn't know if it's racing with an EndTransaction request and it could incorrectly interpret a missing transaction record if it is. The safest thing to do is to limit the path in which it informs the TxnCoordSender of updates. This limits the responsibility of the heartbeat loop. Its job is now only to: 1. update the transaction record to maintain liveness 2. eagerly clean up a transaction if it is found to be aborted 3. inform the transaction coordinator about an aborted transaction record IF the transaction coordinator is continuing to send requests through the interceptor. Notably, the heartbeat loop no longer blindly updates the transaction coordinator's transaction proto. There wasn't a strong reason for it to be able to do so, especially now that we no longer push transactions or ratchet their priority frequently. Moreover, even if those were still frequent occurrences, updating the proto from the heartbeat loop prevented usual restart handling from being used. For instance, doing so might prevent us from refreshing the transaction. All in all, allowing this didn't seem worth the complexity. This commit also includes some cleanup. For instance, it removes a confusing dependency where the txnHeartbeater called back into the TxnCoordSender. It also addresses a longstanding TODO to actually unit test the txnHeartbeater. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
The following tests appear to have failed on master (testrace): TestConcurrentRaftSnapshots
You may want to check for open issues.
#1438837:
Please assign, take a look and update the issue accordingly.
The text was updated successfully, but these errors were encountered: