-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: jepsen-batch3/multi-register/start-stop-2 failed [hopefully #36431] #38126
Comments
SHA: https://github.com/cockroachdb/cockroach/commits/753f245ae441a4ea5e00bdedd99145c19d5e320b Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1339408&tab=buildLog
|
Hopefully this is hopefully #36431 |
@nvanbenschoten this also failed on 2.1, but that's expected, right? |
Yes, that's expected. We didn't introduce this issue recently. |
SHA: https://github.com/cockroachdb/cockroach/commits/86154ae6ae36e286883d8a6c9a4111966198201d Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1367379&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/4878cb3e960dc26f0946e527e1836f27b3304c97 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1373334&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/4878cb3e960dc26f0946e527e1836f27b3304c97 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1373334&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/9322e07476de447799c5d3011eb2874930ee2993 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1375546&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/da56c792e968574b8f1d9ef3fdb45d56a530221a Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1415578&tab=buildLog
|
The previous failure was a duplicate of #39218, which is fixed by cockroachdb/jepsen#23. |
…tervals This commit fixes the most common failure case in all of the following Jepsen failures. I'm not closing them here though, because they also should be failing due to cockroachdb#36431. Would fix cockroachdb#37394. Would fix cockroachdb#37545. Would fix cockroachdb#37930. Would fix cockroachdb#37932. Would fix cockroachdb#37956. Would fix cockroachdb#38126. Would fix cockroachdb#38763. Before this fix, we were not considering intents in a scan's uncertainty interval to be uncertain. This had the potential to cause stale reads because an unresolved intent doesn't indicate that its transaction hasn’t been committed and isn’t a causal ancestor of the scan. This was causing the `jepsen/multi-register` tests to fail, which I had previously incorrectly attributed entirely to cockroachdb#36431. This commit fixes this by returning `WriteIntentError`s for intents when they are above the read timestamp of a scan but below the max timestamp of a scan. This could have also been fixed by returning `ReadWithinUncertaintyIntervalError`s in this situation. Both would eventually have the same effect, but it seems preferable to kick off concurrency control immediately in this situation and only fall back to uncertainty handling for committed values. If the intent ends up being aborted, this could allow the read to avoid moving its timestamp. This commit will need to be backported all the way back to v2.0. Release note (bug fix): Consider intents in a read's uncertainty interval to be uncertain just as if they were committed values. This removes the potential for stale reads when a causally dependent transaction runs into the not-yet resolved intents from a causal ancestor.
40600: storage/engine: return WriteIntentError for intents in uncertainty intervals r=petermattis a=nvanbenschoten This commit fixes the most common failure case in all of the following Jepsen failures. I'm not closing them here though, because they also should be failing due to #36431. Would fix #37394. Would fix #37545. Would fix #37930. Would fix #37932. Would fix #37956. Would fix #38126. Would fix #38763. Before this fix, we were not considering intents in a scan's uncertainty interval to be uncertain. This had the potential to cause stale reads because an unresolved intent doesn't indicate that its transaction hasn’t been committed and isn’t a causal ancestor of the scan. This was causing the `jepsen/multi-register` tests to fail, which I had previously incorrectly attributed entirely to #36431. This commit fixes this by returning `WriteIntentError`s for intents when they are above the read timestamp of a scan but below the max timestamp of a scan. This could have also been fixed by returning `ReadWithinUncertaintyIntervalError`s in this situation. Both would eventually have the same effect, but it seems preferable to kick off concurrency control immediately in this situation and only fall back to uncertainty handling for committed values. If the intent ends up being aborted, this could allow the read to avoid moving its timestamp. This commit will need to be backported all the way back to v2.0. Release note (bug fix): Consider intents in a read's uncertainty interval to be uncertain just as if they were committed values. This removes the potential for stale reads when a causally dependent transaction runs into the not-yet resolved intents from a causal ancestor. 40603: make: pass TESTFLAGS to roachprod-stress, not GOFLAGS r=petermattis a=nvanbenschoten Passing the testflags through the GOFLAGS env var was causing the following error: ``` stringer -output=pkg/sql/opt/rule_name_string.go -type=RuleName pkg/sql/opt/rule_name.go pkg/sql/opt/rule_name.og.go stringer: go [list -f {{context.GOARCH}} {{context.Compiler}} -tags= -- unsafe]: exit status 1: go: parsing $GOFLAGS: non-flag "storage.test" Makefile:1496: recipe for target 'pkg/sql/opt/rule_name_string.go' failed make: *** [pkg/sql/opt/rule_name_string.go] Error 1 make: *** Waiting for unfinished jobs.... ``` Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
…tervals This commit fixes the most common failure case in all of the following Jepsen failures. I'm not closing them here though, because they also should be failing due to cockroachdb#36431. Would fix cockroachdb#37394. Would fix cockroachdb#37545. Would fix cockroachdb#37930. Would fix cockroachdb#37932. Would fix cockroachdb#37956. Would fix cockroachdb#38126. Would fix cockroachdb#38763. Before this fix, we were not considering intents in a scan's uncertainty interval to be uncertain. This had the potential to cause stale reads because an unresolved intent doesn't indicate that its transaction hasn’t been committed and isn’t a causal ancestor of the scan. This was causing the `jepsen/multi-register` tests to fail, which I had previously incorrectly attributed entirely to cockroachdb#36431. This commit fixes this by returning `WriteIntentError`s for intents when they are above the read timestamp of a scan but below the max timestamp of a scan. This could have also been fixed by returning `ReadWithinUncertaintyIntervalError`s in this situation. Both would eventually have the same effect, but it seems preferable to kick off concurrency control immediately in this situation and only fall back to uncertainty handling for committed values. If the intent ends up being aborted, this could allow the read to avoid moving its timestamp. This commit will need to be backported all the way back to v2.0. Release note (bug fix): Consider intents in a read's uncertainty interval to be uncertain just as if they were committed values. This removes the potential for stale reads when a causally dependent transaction runs into the not-yet resolved intents from a causal ancestor.
…tervals This commit fixes the most common failure case in all of the following Jepsen failures. I'm not closing them here though, because they also should be failing due to cockroachdb#36431. Would fix cockroachdb#37394. Would fix cockroachdb#37545. Would fix cockroachdb#37930. Would fix cockroachdb#37932. Would fix cockroachdb#37956. Would fix cockroachdb#38126. Would fix cockroachdb#38763. Before this fix, we were not considering intents in a scan's uncertainty interval to be uncertain. This had the potential to cause stale reads because an unresolved intent doesn't indicate that its transaction hasn’t been committed and isn’t a causal ancestor of the scan. This was causing the `jepsen/multi-register` tests to fail, which I had previously incorrectly attributed entirely to cockroachdb#36431. This commit fixes this by returning `WriteIntentError`s for intents when they are above the read timestamp of a scan but below the max timestamp of a scan. This could have also been fixed by returning `ReadWithinUncertaintyIntervalError`s in this situation. Both would eventually have the same effect, but it seems preferable to kick off concurrency control immediately in this situation and only fall back to uncertainty handling for committed values. If the intent ends up being aborted, this could allow the read to avoid moving its timestamp. This commit will need to be backported all the way back to v2.0. Release note (bug fix): Consider intents in a read's uncertainty interval to be uncertain just as if they were committed values. This removes the potential for stale reads when a causally dependent transaction runs into the not-yet resolved intents from a causal ancestor.
SHA: https://github.com/cockroachdb/cockroach/commits/3b63648f905715e6c0b055fe9acac9c5b8206196
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1333520&tab=buildLog
The text was updated successfully, but these errors were encountered: