Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd cluster startup failed #10234

Closed
swenzi opened this issue Nov 5, 2018 · 8 comments
Closed

Etcd cluster startup failed #10234

swenzi opened this issue Nov 5, 2018 · 8 comments

Comments

@swenzi
Copy link

swenzi commented Nov 5, 2018

Hi,
My etcd cluster consists of three machines. When I started the etcd cluster, two etcds reported errors. The details are as follows:

  1. http://pdqtitad6.bkt.clouddn.com/etcd-issue-1.png
  2. http://pdqtitad6.bkt.clouddn.com/etcd-issue-2.png
  3. http://pdqtitad6.bkt.clouddn.com/etcd-issue-3.png

Please help me to see why.
Thanks!

@hexfusion
Copy link
Contributor

hexfusion commented Nov 5, 2018

When I started the etcd cluster, two etcds reported errors. The details are as follows:

@swenzi can you tell us a little more information on this cluster's data? I am curious if you have updated etcd from a previous version. Specifically was etcd at anytime < 3.2.10? I assume this was an existing cluster that was previously working? Basically, we need more details to understand why we are seeing this freelist panic to understand if it is related to #8813.

etcd-issue-1

Adding image for easier reference but in future txt logs are much easier for us to review.

/cc @jpbetz

@jpbetz
Copy link
Contributor

jpbetz commented Nov 5, 2018

+1 to @hexfusion's analysis, this does look a lot like #8813 corruption that was fixed in 3.2.10+. I'd start by examining that as cause. Was a <3.2.10 version of etcd in use recently?

@swenzi
Copy link
Author

swenzi commented Nov 6, 2018

Yes, There was a cluster in the virtual machine before. Then my host suddenly went out of power. When I restarted the host and started the virtual machine, the above error occurred. My etcd version 3.2.18, thank you!

Detail:
http://pdqtitad6.bkt.clouddn.com/etcd-issue-4.png

@hexfusion
Copy link
Contributor

can you tell us a little more information on this cluster's data? I am curious if you have updated etcd from a previous version. Specifically was etcd at anytime < 3.2.10?

Hi @swenzi can you answer my question above? We understand that you are running 3.2.18 now, thanks!

@swenzi
Copy link
Author

swenzi commented Nov 8, 2018

I didn't have updated etcd from a previous version. I have used etcd:3.2.18 all the time. there was't an existing cluster that was previously working.

@swenzi
Copy link
Author

swenzi commented Nov 8, 2018

Now my cluster is good because I reinstalled the etcd cluster.

@hexfusion
Copy link
Contributor

@swenzi thank you for the details would you be able to share the data-dir for the corrupted nodes so that we can review? This may help us to better understand the underlying problem.

@swenzi
Copy link
Author

swenzi commented Nov 16, 2018

Sorry, the cluster is reinstalled, and the data is not retained. If this problem arises again, I will share the data with you. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants