[PD] delete store node failed #1297

Xavier1994 · 2018-10-25T08:20:00Z

Please answer these questions before submitting your issue. Thanks!

What did you do?
If possible, provide a recipe for reproducing the error.

I delete a tikv node from pd-ctl, but after i delete the node, the node state is offline, I expect it is tomstone. and my region still on this store

What did you expect to see?
What did you see instead?
What version of PD are you using (pd-server -V)?

The text was updated successfully, but these errors were encountered:

Xavier1994 · 2018-10-25T08:21:43Z

I wait for a long time(> 1h), still this state, and i restart pd && tikv, it is still the same state

disksing · 2018-10-25T08:46:08Z

Hi @Xavier1994 , we need to move all replica on store5 to other stores before set it to tombstone. Do you have enough up tikv-servers with > 80% available space? Could you check the output of "store" command? You may also check if there are any log in PD formatted as "store %v may not turn into Tombstone, there are no extra up node has enough space to accommodate the extra replica".

Xavier1994 · 2018-10-25T09:07:22Z

@disksing ok, but the region 6905 have no leader is reasonable? i hava extra 7 * tikv node. because the region 6905 have no leader, so my request to that region failed

Xavier1994 · 2018-10-25T09:10:09Z

and the "region check down-peer" command's output is also confusing

seems there are many down peer, but pd-ctl can't display them

Xavier1994 · 2018-10-25T09:12:57Z

And I think my store capacity is enough

disksing · 2018-10-26T02:34:20Z

This does not look normal.
For the region 6905, it seems that something wrong with leader election, so that no heartbeat is sent to the PD. You can check the log related to this region by using grep "region 6905" in the corresponding tikv log files.
For the stuck offline process, you may check the grafana panels "Scheduler/ReplicaChecker" and "Opeator/Schedule Operator Create" in PD page to see if anything is wrong.
For the strange output of "region check down-peer", @nolouch do you have any clue?
In addition, if possible, you could provide the cluster access method (grafana/ssh) to us for help. You can communicate privately via email at menglong AT pingcap.com.

nolouch · 2018-10-26T03:00:49Z

and @Xavier1994 Could you show me the result about pdctl>> config show all?

Xavier1994 · 2018-10-26T04:57:18Z

@nolouch I have destroyed my cluster because the machine is used for another sake. but I just change some config to speeded up the rebalance. max-snapshot: 16, max-pending-peer: 64, region-schedule-limit 64, leader-schedule-limit 16. and others use default config

nolouch · 2018-10-29T03:20:57Z

@Xavier1994 thanks for your feedback. If this problem reappears, you can contact us immediately.

siddontang assigned nolouch Oct 25, 2018

nolouch mentioned this issue Nov 6, 2018

server/api: fix the issue about regions/check API #1311

Merged

rleungx added the type/bug The issue is confirmed as a bug. label Dec 12, 2018

rleungx closed this as completed Dec 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PD] delete store node failed #1297

[PD] delete store node failed #1297

Xavier1994 commented Oct 25, 2018

Xavier1994 commented Oct 25, 2018

disksing commented Oct 25, 2018

Xavier1994 commented Oct 25, 2018

Xavier1994 commented Oct 25, 2018

Xavier1994 commented Oct 25, 2018

disksing commented Oct 26, 2018

nolouch commented Oct 26, 2018

Xavier1994 commented Oct 26, 2018

nolouch commented Oct 29, 2018

[PD] delete store node failed #1297

[PD] delete store node failed #1297

Comments

Xavier1994 commented Oct 25, 2018

Xavier1994 commented Oct 25, 2018

disksing commented Oct 25, 2018

Xavier1994 commented Oct 25, 2018

Xavier1994 commented Oct 25, 2018

Xavier1994 commented Oct 25, 2018

disksing commented Oct 26, 2018

nolouch commented Oct 26, 2018

Xavier1994 commented Oct 26, 2018

nolouch commented Oct 29, 2018