-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PD panics when list resource-group with some resource group defined. #7206
Comments
I am facing this issue in one of our TiDB 7.1 cluster. Looking at the stacktrace, could it be related to Prometheus scraping of metrics? |
This is the snippet causing the issue, it has to do with pd/pkg/mcs/resourcemanager/server/resource_group.go Lines 64 to 78 in 8950c3a
|
Yes, but we don't know how some fields changes to NaN. |
cc @glorv |
Yes, sure, here is the list of resource groups that we used. |
@hongshaoyang After PD panic, does PD panic again when listing the resource groups again? |
@CabinfeverB Yes, the PD panics again repeatedly. The TiDB cluster is deployed on Kubernetes. The PD pods keeps going into CrashLoopBackOff. The logs show the same stacktrace. This implies that there is some hidden process that is listing the resource groups repeatedly. It is not a human running the resource groups listing as the PD pods crashed outside of office hours, when there were no changes to resource groups or their configurations. |
@hongshaoyang
|
Here is the {"r_u":{"settings":{"fill_rate":14000,"burst_limit":14000},"state":{"initialized":false}}}
{"r_u":{"settings":{"fill_rate":2147483647,"burst_limit":-1},"state":{"tokens":29860685960413220,"last_update":"2023-12-27T08:19:16.269363735Z","initialized":true}}}
{"r_u":{"settings":{"fill_rate":14000,"burst_limit":14000},"state":{"tokens":14000,"last_update":"2023-12-27T08:19:17.269332808Z","initialized":true}}}
{"r_u":{"settings":{"fill_rate":14000,"burst_limit":14000},"state":{"tokens":14000,"last_update":"2023-12-27T08:19:05.143659794Z","initialized":true}}}
{"r_u":{"settings":{"fill_rate":14000,"burst_limit":14000},"state":{"tokens":-28216.64129750421,"last_update":"2023-12-27T08:19:16.4862813Z","initialized":true}}}
{"r_u":{"settings":{"fill_rate":14000,"burst_limit":14000},"state":{"tokens":14000,"last_update":"2023-12-27T08:19:17.420912119Z","initialized":true}}}
{"r_u":{"settings":{"fill_rate":14000,"burst_limit":14000},"state":{"tokens":1163.6089377586882,"last_update":"2023-12-27T08:19:15.252112524Z","initialized":true}}}
{"r_u":{"settings":{"fill_rate":14000,"burst_limit":14000},"state":{"tokens":14000,"last_update":"2023-12-27T08:19:10.78038052Z","initialized":true}}}
{"r_u":{"settings":{"fill_rate":14000,"burst_limit":14000},"state":{"tokens":11678.85839950952,"last_update":"2023-12-27T08:19:16.269380275Z","initialized":true}}}
{"r_u":{"settings":{"fill_rate":14000,"burst_limit":14000},"state":{"tokens":14000,"last_update":"2023-12-27T08:19:09.270797923Z","initialized":true}}}
|
It panics every 5-8 days, not sure why it is such an infrequent occurence. The only solution is to drop all resource groups. |
close #7206 resource_mananger: deep clone resource group Signed-off-by: nolouch <nolouch@gmail.com> Co-authored-by: tongjian <1045931706@qq.com>
close tikv#7206 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close tikv#7206 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close #7206 resource_mananger: deep clone resource group Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: nolouch <nolouch@gmail.com> Co-authored-by: ShuNing <nolouch@gmail.com> Co-authored-by: nolouch <nolouch@gmail.com>
close #7206 resource_mananger: deep clone resource group Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: nolouch <nolouch@gmail.com> Co-authored-by: ShuNing <nolouch@gmail.com> Co-authored-by: nolouch <nolouch@gmail.com>
fixed. Cannot reproduce the NaN problem, but we replace a new way to copy the data, so this issue should be fixed. |
/found customer |
Bug Report
What did you do?
create some resource group
and try to list them.
What did you expect to see?
No panic and get all resource groups
What did you see instead?
PD panic
Deleting all resource groups stops the panics.
What version of PD are you using (
pd-server -V
)?v7.1.0
The text was updated successfully, but these errors were encountered: