Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: tocommit(691243) is out of range [lastIndex(691183)] causes CrashLoopBackOff #30260

Closed
cgebe opened this issue Sep 16, 2018 · 6 comments
Closed
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Milestone

Comments

@cgebe
Copy link

cgebe commented Sep 16, 2018

Is this a bug report or a feature request?

Bug

BUG REPORT

  • Which version of CockroachDB are you using?

v2.1.0-beta.20180910

  • What did you do?

#28487
&
#28487 (comment)
kv.raft_log.synchronize = false

  • What did you expect to see?

The node which panicked to come back up again.

  • What did you see instead?

Pod crashes over and over again.

panic: tocommit(691243) is out of range [lastIndex(691183)]. Was the raft log corrupted, truncated, or lost? [recovered]
	panic: tocommit(691243) is out of range [lastIndex(691183)]. Was the raft log corrupted, truncated, or lost?

goroutine 276 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc420568480, 0x2f791c0, 0xc42592aea0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:184 +0x11f
panic(0x25e9200, 0xc4266a2030)
	/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/cockroachdb/cockroach/pkg/storage.(*raftLogger).Panicf(0xc4258841c0, 0x2ad85a3, 0x5d, 0xc429300060, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft.go:106 +0xfe
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc4293020e0, 0xa8c2b)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/log.go:201 +0x15c
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc4200e1200, 0x8, 0x4, 0x2, 0x541e, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/raft.go:1265 +0x54
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.stepFollower(0xc4200e1200, 0x8, 0x4, 0x2, 0x541e, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/raft.go:1210 +0x44c
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.(*raft).Step(0xc4200e1200, 0x8, 0x4, 0x2, 0x541e, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/raft.go:921 +0x1308
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.(*RawNode).Step(0xc4208fb980, 0x8, 0x4, 0x2, 0x541e, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/rawnode.go:195 +0xdd
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).stepRaftGroup.func1(0xc4208fb980, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:3927 +0x434
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked.func1(0xc42b594b88, 0x1, 0xc4208fb980, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:572 +0x30
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked(0xc42086f180, 0x2b82200, 0xc42b594b88, 0xc42b594b70, 0x7909f8)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:573 +0xc2
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroup(0xc42086f180, 0x0, 0xc42b594b88, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:598 +0x8c
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).stepRaftGroup(0xc42086f180, 0xc427cb4000, 0x26cb800, 0x30)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:3887 +0x55
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftRequestWithReplica(0xc420848580, 0x2f791c0, 0xc4292d41b0, 0xc42086f180, 0xc427cb4000, 0xc4292d41b0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3536 +0xeb
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x2f791c0, 0xc4292d41b0, 0xc42086f180, 0x2f791c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3870 +0x97
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc420848580, 0x2f791c0, 0xc4292d41b0, 0xc427cb4000, 0xc42b594ed0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3499 +0x135
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc420848580, 0x2f791c0, 0xc42592aea0, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3864 +0x229
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc4201e3680, 0x2f791c0, 0xc42592aea0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:226 +0x21b
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x2f791c0, 0xc42592aea0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:166 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc429221e20, 0xc420568480, 0xc429221e10)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0xe9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xad
  • What was the impact?

Node staying dead.

I guess the panic is legitimate but a smooth recovery would be great.
Is there another way to restore the node without wiping its data dir?

Edit: While decommissioning the node there are plenty of replicas assigned to the dead node. Decommissioning takes ages:

  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5266 |        true        |    false     
(1 row)
..................
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5265 |        true        |    false     
(1 row)
...............
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5264 |        true        |    false     
(1 row)
..............
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5205 |        true        |    false     
(1 row)
................
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5190 |        true        |    false     
(1 row)
..............
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5191 |        true        |    false     
(1 row)
..
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5190 |        true        |    false     
(1 row)
.
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5191 |        true        |    false     
(1 row)
..
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5151 |        true        |    false     
(1 row)
...............
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5139 |        true        |    false     
(1 row)
..............
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5141 |        true        |    false     
(1 row)
.
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5140 |        true        |    false     
(1 row)

  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5125 |        true        |    false     
(1 row)
...............
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5124 |        true        |    false     
(1 row)
..............
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5125 |        true        |    false     
(1 row)
.
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5124 |        true        |    false     
(1 row)
.........
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5125 |        true        |    false     
(1 row)
...
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5126 |        true        |    false     
(1 row)
.
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5125 |        true        |    false     
(1 row)
.....
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5124 |        true        |    false     
(1 row)
.
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5123 |        true        |    false     
(1 row)
.............
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5122 |        true        |    false     
(1 row)
.............
  id | is_live | replicas | is_decommissioning | is_draining  
+----+---------+----------+--------------------+-------------+
   4 |  false  |     5121 |        true        |    false     
(1 row)
.................
@jordanlewis jordanlewis added C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community A-kv-replication Relating to Raft, consensus, and coordination. labels Sep 16, 2018
@petermattis petermattis assigned tbg and unassigned petermattis Sep 17, 2018
@petermattis
Copy link
Collaborator

@tschottdorf Could this be a different manifestation of #28918? v2.1.0-beta.20180910 did not contain the fix (#29579).

@cgebe The other possibility is that kv.raft_log.synchronize has caused a problem. That setting is not intended for production usage. The documentation associated with it should probably have a bigger warning as the likelihood of permanent data loss is extremely high. In particular, if a node crashes for any reason, then it is likely to continue to crash with a panic like the one above. I cannot stress strongly enough that kv.rage_log.synchronize should not be used in production clusters or any cluster in which you don't want to periodically wipe the cluster and start from scratch.

I guess the panic is legitimate but a smooth recovery would be great.
Is there another way to restore the node without wiping its data dir?

Not currently. It is probably possible to do something better here, but it is also possible that the attempt at something better makes the situation a lot worse. Trying to recover from unexpected errors is fraught. If we were to automatically remove some of the bad data, we might inadvertently be removing the last copy of the data.

@cgebe
Copy link
Author

cgebe commented Sep 17, 2018

@petermattis I wiped the node and added it again in fresh state. Since the first incident, i did not have the same problem.

The requirements of my use case, especially time series data storage, may not align fully with cockroachdb's intended usage. I already took pressure off the insertion by placing an in-memory store in between and periodically insert to crdb separated by logical database. As i said, i simply cannot turn kv.rage_log.synchronize on, which leads to the high latencies mentioned in #28487 making my system not work at all (insertions are not carried out / transaction errors etc.). At the moment, this setting is the only way i see.

Since my knowledge is limited about the underlying processes, i will continue reporting my experience. Surely, i look forward to optimizations. I really like cockroachdb's functionality and interfaces (SQL with Views) and appreciate your activity. Switching to special time series dbs would be stepping backward in usability (specially in horizontal scaling), therefore unlikely.

@petermattis
Copy link
Collaborator

You're correct that time series data storage does not align well with CockroachDB's intended usage. You may be better served by a dedicated time series database.

Running with kv.raft_log.synchronize = false is not something we recommend or can support, and have no plans to improve.

I haven't looked at #28487, but there may be something else going on there that we should investigate.

@tbg
Copy link
Member

tbg commented Sep 18, 2018

@tschottdorf Could this be a different manifestation of #28918?

@petermattis I don't think so since the gap here is ~60 entries, though perhaps this is the failure mode I wasn't quite able to explain in that issue (where I saw a gap of 3). Right off the bat I'd say that this is more likely when a node didn't sync properly.

@tbg tbg added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Sep 26, 2018
@tbg tbg added this to the 2.1 milestone Sep 27, 2018
@bdarnell
Copy link
Contributor

kv.raft_log.synchronize is almost certainly the issue here - it's known to cause this failure when a node shuts down uncleanly (and once it happens, it'll keep happening until that node is wiped).

The raft implementation is arguably being too strict here. If a single node has its log regress like this, it's safe to just catch it back up with new log entries. The problem is that if you had this happen to several nodes at once, data could be lost (silently, since there wouldn't be a more up-to-date node to tell them about the data they're missing). At this point all bets are off because various invariants could have been violated.

@tbg
Copy link
Member

tbg commented Oct 11, 2018

Closing since this is likely caused by not syncing to disk.

@tbg tbg closed this as completed Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

No branches or pull requests

5 participants