-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jepsen: bank/majority-ring failed with wrong-total
#34321
Comments
wrong-total
This is a real failure:
|
Possibly has the same cause as the TPC-C errors @nvanbenschoten is looking at (#34228?) This is a bank test with 5 accounts (and network partitions). At some point just prior to operation 12746 one of the transfers created $3 out of thin air, and all subsequent reads report this inconsistency. The nearest transfers of $3 were at indexes 12723 and 12725, just before a nemesis activation. The client ultimately got a timeout error on these connections, but not until 12898 (after the inconsistent reads, and long after the network partition healed at 12733)
|
Fixes cockroachdb#34025. Fixes cockroachdb#33624. Fixes cockroachdb#33335. Fixes cockroachdb#33151. Fixes cockroachdb#33149. Fixes cockroachdb#34159. Fixes cockroachdb#34293. Fixes cockroachdb#32813. Fixes cockroachdb#30886. Fixes cockroachdb#34228. Fixes cockroachdb#34321. It is rare but possible for a replica to become a leaseholder but not learn about this until it applies a snapshot. Immediately upon the snapshot application's `ReplicaState` update, the replica will begin operating as a standard leaseholder. Before this change, leases acquired in this way would not trigger in-memory side-effects to be performed. This could result in a regression in the new leaseholder's timestamp cache compared to the previous leaseholder, allowing write-skew like we saw in cockroachdb#34025. This could presumably result in other anomalies as well, because all of the steps in `leasePostApply` were skipped. This PR fixes this bug by detecting lease updates when applying snapshots and making sure to react correctly to them. It also likely fixes the referenced issue. The new test demonstrated that without this fix, the serializable violation speculated about in the issue was possible. Release note (bug fix): Fix bug where lease transfers passed through Snapshots could forget to update in-memory state on the new leaseholder, allowing write-skew between read-modify-write operations.
34548: storage: apply lease change side-effects on snapshot recipients r=nvanbenschoten a=nvanbenschoten Fixes #34025. Fixes #33624. Fixes #33335. Fixes #33151. Fixes #33149. Fixes #34159. Fixes #34293. Fixes #32813. Fixes #30886. Fixes #34228. Fixes #34321. It is rare but possible for a replica to become a leaseholder but not learn about this until it applies a snapshot. Immediately upon the snapshot application's `ReplicaState` update, the replica will begin operating as a standard leaseholder. Before this change, leases acquired in this way would not trigger in-memory side-effects to be performed. This could result in a regression in the new leaseholder's timestamp cache compared to the previous leaseholder's cache, allowing write-skew like we saw in #34025. This could presumably result in other anomalies as well, because all of the steps in `leasePostApply` were skipped (as theorized by #34025 (comment)). This PR fixes this bug by detecting lease updates when applying snapshots and making sure to react correctly to them. It also likely fixes the referenced issue. The new test demonstrates that without this fix, the serializable violation speculated about in the issue was possible. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
SHA: https://github.com/cockroachdb/cockroach/commits/e10fb557b11b5ff1b8609aa963da23c37a1143c8
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1113854&tab=buildLog
The text was updated successfully, but these errors were encountered: