fix(raft/log): truncate file and reset offset correctly #830

yichengq · 2014-06-05T19:34:50Z

@xiangli-cmu

xiang90 · 2014-06-05T19:36:59Z

lgtm

@philips This fixes the problem that probably causes #829.

fix(raft/log): truncate file and reset offset correctly

philips · 2014-06-06T18:48:11Z

oh, dang. good fix.

ongardie · 2014-06-17T19:37:27Z

Sorry I'm a bit late on this one and completely off-topic, but I noticed the notification for this PR and got worried. What I'm worried about is that it'd be dangerous for a server with a corrupt disk to truncate some entries off the end of its log and then continue participating in the cluster.

What's this code trying to do? What are the errors in decoding that result in the file being truncated? A few more comments could help.

yichengq · 2014-06-17T20:32:59Z

@ongardie
This code mainly focuses on a bug on implementation. Even if a file is truncated, its I/O offset doesn't change and new log entries will be appended to the old point. This could be a big problem when it loads the log next time because the data may confuse the protobuf parsing.

The error comes from failing to parse in protobuf. t is highly possible that the machine was rebooted when sync the latest entries into the disk and only part of it is recorded.

In most cases this should not be a problem for cluster because the others could recover the log.

But it would be terrible if some entries are really lost in the whole cluster. We could print out some warning for the truncation, but I don't know whether there is better solution for that. I could think about some way is to add a flag that indicates log miss when rejoining the cluster to ensure the safety.

ongardie · 2014-06-18T01:43:36Z

Yeah, it's really hard to tell whether the disk was corrupted before or after acking the write. I'd definitely add a warning here as a minimum.

fix(raft/log): truncate file and reset offset correctly

2cd367e

yichengq added a commit that referenced this pull request Jun 5, 2014

Merge pull request #830 from unihorn/98

757bb3a

fix(raft/log): truncate file and reset offset correctly

yichengq merged commit 757bb3a into etcd-io:master Jun 5, 2014

xiang90 mentioned this pull request Jun 6, 2014

panic: runtime error: slice bounds out of range #829

Closed

yichengq deleted the 98 branch December 7, 2014 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(raft/log): truncate file and reset offset correctly #830

fix(raft/log): truncate file and reset offset correctly #830

yichengq commented Jun 5, 2014

xiang90 commented Jun 5, 2014

philips commented Jun 6, 2014

ongardie commented Jun 17, 2014

yichengq commented Jun 17, 2014

ongardie commented Jun 18, 2014

fix(raft/log): truncate file and reset offset correctly #830

fix(raft/log): truncate file and reset offset correctly #830

Conversation

yichengq commented Jun 5, 2014

xiang90 commented Jun 5, 2014

philips commented Jun 6, 2014

ongardie commented Jun 17, 2014

yichengq commented Jun 17, 2014

ongardie commented Jun 18, 2014