Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PD] delete store node failed #1297

Closed
Xavier1994 opened this issue Oct 25, 2018 · 9 comments
Closed

[PD] delete store node failed #1297

Xavier1994 opened this issue Oct 25, 2018 · 9 comments
Assignees
Labels
type/bug The issue is confirmed as a bug.

Comments

@Xavier1994
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.

I delete a tikv node from pd-ctl, but after i delete the node, the node state is offline, I expect it is tomstone. and my region still on this store

2018-10-25 4 18 09
2018-10-25 4 18 15

  1. What did you expect to see?

  2. What did you see instead?

  3. What version of PD are you using (pd-server -V)?

@Xavier1994
Copy link
Author

I wait for a long time(> 1h), still this state, and i restart pd && tikv, it is still the same state

@disksing
Copy link
Contributor

Hi @Xavier1994 , we need to move all replica on store5 to other stores before set it to tombstone. Do you have enough up tikv-servers with > 80% available space? Could you check the output of "store" command? You may also check if there are any log in PD formatted as "store %v may not turn into Tombstone, there are no extra up node has enough space to accommodate the extra replica".

@Xavier1994
Copy link
Author

@disksing ok, but the region 6905 have no leader is reasonable? i hava extra 7 * tikv node. because the region 6905 have no leader, so my request to that region failed

@Xavier1994
Copy link
Author

and the "region check down-peer" command's output is also confusing
2018-10-25 5 08 59

seems there are many down peer, but pd-ctl can't display them

@Xavier1994
Copy link
Author

And I think my store capacity is enough
2018-10-25 5 12 21

@disksing
Copy link
Contributor

This does not look normal.
For the region 6905, it seems that something wrong with leader election, so that no heartbeat is sent to the PD. You can check the log related to this region by using grep "region 6905" in the corresponding tikv log files.
For the stuck offline process, you may check the grafana panels "Scheduler/ReplicaChecker" and "Opeator/Schedule Operator Create" in PD page to see if anything is wrong.
For the strange output of "region check down-peer", @nolouch do you have any clue?
In addition, if possible, you could provide the cluster access method (grafana/ssh) to us for help. You can communicate privately via email at menglong AT pingcap.com.

@nolouch
Copy link
Contributor

nolouch commented Oct 26, 2018

and @Xavier1994 Could you show me the result about pdctl>> config show all?

@Xavier1994
Copy link
Author

@nolouch I have destroyed my cluster because the machine is used for another sake. but I just change some config to speeded up the rebalance. max-snapshot: 16, max-pending-peer: 64, region-schedule-limit 64, leader-schedule-limit 16. and others use default config

@nolouch
Copy link
Contributor

nolouch commented Oct 29, 2018

@Xavier1994 thanks for your feedback. If this problem reappears, you can contact us immediately.

@rleungx rleungx added the type/bug The issue is confirmed as a bug. label Dec 12, 2018
@rleungx rleungx closed this as completed Dec 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants