-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: destroy monitor if no more components in host #857
Fix: destroy monitor if no more components in host #857
Conversation
Codecov Report
@@ Coverage Diff @@
## master #857 +/- ##
===========================================
+ Coverage 10.76% 53.47% +42.71%
===========================================
Files 130 263 +133
Lines 9900 19047 +9147
===========================================
+ Hits 1066 10186 +9120
+ Misses 8595 7286 -1309
- Partials 239 1575 +1336
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@AstroProfundis PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contributes, 9547, it really awesome!
d4f69cd
to
caef688
Compare
@lucklove PTAL |
caef688
to
ae9c6d1
Compare
@lucklove PTAL again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
ae9c6d1
to
91e75c1
Compare
@lucklove Please help me about the test failure, curious about the test failure reason (exit 255), and it runs successful on my local dev env |
It's because of this:
But I'm not sure why the scale-in action not work this time, invesgating |
It's because there is a TiKV node that is still in offline process:
|
wait_instance_num_reach $name $total_sub_one $native_ssh | ||
tiup-cluster $client --yes scale-in $name -N $ipprefix.102:20160 | ||
echo "after sclae in(without prune) tikv, the monitors should not be destroyed" | ||
wait_instance_num_reach $name $total_sub_one $native_ssh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after sclae in(without prune) tikv, the monitors should not be destroyed
waiit_instance_num_reach
will call prune automaticlly, this is not what you expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why it was passed in my local dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll dig into later tonight, and thanks for your help 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It passed in your local dev, I think it may related to the opportunity:
- Only when the tikv's state is tombstone, the prune command cleanup it
- After scale in a tikv node, it will take some time to transfer to tombstone state (seconds to minutes)
So I guess your local host, when thewait_instance_num_reach
execute, the tikv node is still inpending offline
state, so it passed, but in CI, the tikv node istombstone
, so it was cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Seems in Github CI, it's hard to predicate when TiKV will finish the transferring process and go to tombstone state, after that
prune
can evict the TiKV. - have reserved only one TIDB node on the 102 node for testing the issue When you scale-in PD, node_exporter of the PD node is still present #842
c3995e1
to
4428fa3
Compare
…m_reach when prune is not needed
4428fa3
to
c742444
Compare
It's hard to predicate when tikv will be pruned in Github CI
4c1b429
to
58e43da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
fix #842
What is changed and how it works?
After scale-in component, check whether the host has any more components, if not, stop and destroy the monitor components.
Check List
Tests
Code changes
Side effects
Related changes
Release notes: