planner: sync-stats-load is easy to timeout if TiKV is busy and then cause wrong plans #50332
Labels
affects-8.1
This bug affects the 8.1.x(LTS) versions.
component/statistics
sig/planner
SIG: Planner
type/enhancement
The issue or PR belongs to an enhancement.
Enhancement
In the case below where TiKV is busy (CPU usage larger than 90%), there are 100+ sync-load-timeout errors in 50 minutes, which causes some wrong plans:
![image](https://private-user-images.githubusercontent.com/7499936/295870997-2554a5e0-8eea-462c-90a4-f9bdf86e1396.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3MTUzNjQsIm5iZiI6MTczOTcxNTA2NCwicGF0aCI6Ii83NDk5OTM2LzI5NTg3MDk5Ny0yNTU0YTVlMC04ZWVhLTQ2MmMtOTBhNC1mOWJkZjg2ZTEzOTYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTZUMTQxMTA0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Y2Q3ZTljYWRmYTljYzNhMzUyYTIxMDE0NzZlNGU4NzRlM2U4MDA3OTg1MzdlYzdmODUzZjc3ODZmZDZhZjg0ZSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.xdfgOvfvgg7bxzXyiz5M1m1KV3fGyD5-pGgEdObvSts)
![image](https://private-user-images.githubusercontent.com/7499936/295870043-a9a08863-c28f-4898-8cb0-ea88467715aa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3MTUzNjQsIm5iZiI6MTczOTcxNTA2NCwicGF0aCI6Ii83NDk5OTM2LzI5NTg3MDA0My1hOWEwODg2My1jMjhmLTQ4OTgtOGNiMC1lYTg4NDY3NzE1YWEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTZUMTQxMTA0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NzkxZTZhYWYzZTNhNWJkMTAzYzIyZWNjOGYyZjEwYjBmZjk0NDkzMDQxYzBhMGI4MDYyNmFkY2Y5MTU0NTU2YiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.zYD_lbtEk5ioJU59aG6C0awyNZYUFJ1XUVWv0UfxKxo)
![image](https://private-user-images.githubusercontent.com/7499936/295874254-d8df7612-271b-4eb8-b4e9-51c5e24fe5c1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3MTUzNjQsIm5iZiI6MTczOTcxNTA2NCwicGF0aCI6Ii83NDk5OTM2LzI5NTg3NDI1NC1kOGRmNzYxMi0yNzFiLTRlYjgtYjRlOS01MWM1ZTI0ZmU1YzEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTZUMTQxMTA0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZWJhNjA1MWY1Y2NkY2ZkMGQxZmQ0MmQ4Zjk3MzFmZTRjMzJhOGNhNzQ0MGJlNzRmM2RiZmFlMzMwNjE1YWY3NCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.J_MjMOA3fYGPTjvEupCoZ5eqTvb2eiM9V6JWLmYVWEA)
From @winoros A temporary solution to mitigate this issue is to increase the priority of sync-stats-load requests from NORMAL to HIGH:
![image](https://private-user-images.githubusercontent.com/7499936/295874683-62b5bdba-200f-46ae-8941-f88de088bb90.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3MTUzNjQsIm5iZiI6MTczOTcxNTA2NCwicGF0aCI6Ii83NDk5OTM2LzI5NTg3NDY4My02MmI1YmRiYS0yMDBmLTQ2YWUtODk0MS1mODhkZTA4OGJiOTAucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTZUMTQxMTA0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Njc1Njg4MTA1MDQzMmQ3YzFiZmZjOTgzOWIxOWE0NWE5Y2UxOThkZDc3YTMyNzkzNDQ0NDVjNTU4ODZlYTFjOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.6HjCmopY1eQ-ebUV_kACn3Jz1INRe5sfp4C9BJeVmZ4)
The text was updated successfully, but these errors were encountered: