-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: scaledata/filesystem_simulator/nodes=3 failed #36981
Comments
Looks like a transaction got stuck for 6 minutes doing ... something. |
The debug zip is not super useful because it's 2.1, unfortunately. |
@nvanbenschoten what do you think we should do here? The txn started at 11:44:18. From the log file dates we can see that the nodes were (stopped and) restarted as follows: n1 at 11:44:50, 11:46:51, 11:48:52 The 6 minute txn couldn't have run against n1 or it would've been forced to retry at 11:46:51 or before. So it was likely stuck on n2 and got unstuck ("bad connection) when n2 was taken down. Needless to say, there's nothing in the logs on n2. So we have some statement that sometimes gets stuck forever. In fact we have precisely two such statements that started at around the same time (right?). Is there some deadlock? Who knows. What can we do to find out? cc @andreimatei |
The symptoms here look similar to #32204 (see #32204 (comment)). Interestingly, the stuck request began at In #32204 we saw a number of requests get stuck for over 2 minutes. I'm going to drop the timeout on these retries to 2 minutes and see if I can more easily reproduce what we're seeing here. |
SHA: https://github.com/cockroachdb/cockroach/commits/84dc682eca4b11e6abaf390fc8883f32afe81fb4 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1283539&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/ba5c092a726134b73e789c2047f7ec151be7c1a1 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1288263&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/d01a95b1ee71dfb36eed374619a8ed30de057ed2 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1312970&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/e6366f3ac39652a763f38948fccf4b2dab363034 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1347608&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/86154ae6ae36e286883d8a6c9a4111966198201d Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1367379&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/1ad0ecc8cbddf82c9fedb5a5c5e533e72a657ff7 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1399000&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/7111a67b2ea3a19c2f312f8d214b8823f431cac0 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1400942&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/3b9a95bd7eb2cfa6d544fe7217852a85ec3b76f4 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1422703&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/98d6832e9f9edb7e554aaa90d9d4296bb00af16e Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1433695&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/40f8f0eb00f4b3bf5bac11fb5ae132e33a492713 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1452154&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/474795f33383e562e500ad71a774ff7ba92ae3c8 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1485061&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/6b14c0aa3ed1b4ba6d5f937e9352c5383afe1c37 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1502387&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/77f26d185efb436aaac88243de19a27caa5da9b6 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1509340&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
|
I think this is the real error here: 2019/09/26 16:01:00 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF |
BTW the logs have this
and also log spam that at the very least needs to be throttled:
|
regarding the "unexpected EOF" error, this test runs under chaos so that error is expected to occur (duh), and we also see that a node got killed just before the error occurred. Maybe the driver used in this test isn't properly handling this error when it occurs during a ROLLBACK? |
@asubiotto ping on this logging stuff - we should probably make these messages look less scary if they're really expected. |
SHA: https://github.com/cockroachdb/cockroach/commits/4dcfd7d8899d90dd1816153bf59b5df647c9bbd3 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1516560&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/1c99165c39c3714f1ce9986bff75ce517f977630 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1522970&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/b519bf7f81795d360fcf458b5d5e031b3fc43e9e Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1531436&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/41bac2fd99a91d92cb3ff6426789f7e64dd6b14a Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1534498&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/a40cefbb5bd5e9d34f5db930334a385d0934d2d0 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1538863&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/a57647381a4714b48f6ec6dec0bf766eaa6746dd Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1561660&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/a4d88c2c5ab6131878d2b4552446d94fd93b1553 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1563612&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/262e6f2499e34eb4373d0450fa9f6a820a609b2c Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1565222&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/4c52530a33367c58e434111324fcda8c3d73582a Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1568299&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/62801ce77d9055c00b0e30010f5998ea2cd86686 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1569533&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/8b9f54761adc58eb9aecbf9b26f1a7987d8a01e5 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1573251&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/ff13356a005e446f3f11bd37cc9772f568d8f41f Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1578961&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/e26c2f26ca95796f84e7396b832b80f5d53605ae Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1589114&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/549d5bb865eaf9f5233cf8a068034637587b4373 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1616574&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/39bf64d28ee5a0ab79081b8b0b29230749ea0fff Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1616610&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/a60c4680e82b390ab058634a58626f56c80b27ab Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1616556&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/ed717cbaf741e3a32c76db25b16a59dc2a8221d7 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1624103&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/9ad9eb5fb8806e4b74546910ca8bda66786d4288 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1626352&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/e4fa0b8b8e674d19c7957f03ca3a2d1f716f1f1d Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1629591&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
SHA: https://github.com/cockroachdb/cockroach/commits/60412eb85271ecd1539971fcc6ea3bf11f1ca7a6 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1629609&tab=artifacts#/scaledata/filesystem_simulator/nodes=3
|
(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@734e357412dadafcd6084b8eab8e251e44e86b4a:
details
Artifacts: /scaledata/filesystem_simulator/nodes=3
powered by pkg/cmd/internal/issues |
(roachtest).scaledata/filesystem_simulator/nodes=3 failed on release-19.1@4fa6d707fb3b52693638b1438ba7dc46227684f5:
details
Artifacts: /scaledata/filesystem_simulator/nodes=3
powered by pkg/cmd/internal/issues |
(roachtest).scaledata/filesystem_simulator/nodes=3 failed on release-19.2@3cbd05602d4aeaebbccea18d66ad0fdf8db482a5:
details
Artifacts: /scaledata/filesystem_simulator/nodes=3
powered by pkg/cmd/internal/issues |
(roachtest).scaledata/filesystem_simulator/nodes=3 failed on provisional_201912132008_v20.1.0-alpha20191216@e9e2a80361a25fd9f9b179f84be4c5c3d7e7d8cb:
Repro
Artifacts: /scaledata/filesystem_simulator/nodes=3
powered by pkg/cmd/internal/issues |
(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@beb69e089b6eadc0fde6c92eb533b08d248938fe:
Repro
Artifacts: /scaledata/filesystem_simulator/nodes=3
powered by pkg/cmd/internal/issues |
Fixes cockroachdb/cockroach#36981. Fixes cockroachdb/cockroach#39618. Fixes cockroachdb/cockroach#40552. Fixes cockroachdb/cockroach#41735. cockroachdb/cockroach#41451 switched two forms of errors that can be thrown during chaos events over to a new error code class - 58, internal system errors. This commit updates `pqConnectionError` to consider this error code class as retry-worthy.
SHA: https://github.com/cockroachdb/cockroach/commits/df200cbf3f407dbf349aa601ff9036b4dff88e83
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1252822&tab=buildLog
The text was updated successfully, but these errors were encountered: