-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
br: failed to backup to s3 due to the limit of request rate. #30087
Comments
It seems like the rate limiting needs to happen in TiKV, since TiKV is the component actually interacting with S3. Or maybe you think the limiting should happen in BR, since BR is telling each of the TiKV nodes when and how many regions to back up? |
Yes, BR build tasks(one table or index corresponds to one task) and control the task interacting with TiKV. each task corresponds to one or many regions in TiKV.(according to table or index size).
But according to our experience. It's hard to believe the normal backup can reach the 3500 PUT per second. |
For the long term. we need a flow controller. that control the request do not exceed cloud rate limit. and after that we may abandon such |
) ref #11666, ref pingcap/tidb#30087 Signed-off-by: ti-srebot <ti-srebot@pingcap.com> Signed-off-by: 3pointer <luancheng@pingcap.com> Co-authored-by: 3pointer <luancheng@pingcap.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
This causes backup failures for us :( It is not in production (yet), but we don't have a workaround. |
After tikv/tikv#11666 merge, we do have a workaround by setting In our experience, 100+ tikv and set So I leave this issue opened here. |
you can try the workaround first. if it works. I'll mark this issue major not critical. |
Thanks, we will try this and get back to you soon! |
Bad news: this option worked at the start, but after an hour or so a lot of retrying started. So we are having to set the backup To give you an idea of the impact:
Because TiDB doesn't have PITR, we have to rely on full backups. I consider it reasonable to require daily backups complete in 3-4 hours, so that limits our maximum TiDB cluster size to 4TB (which is a serious limitation: its not much larger than MySQL). Part of the reason why we are hitting this, is because the backup is not following S3 best practices (additional link). There is a quota of 3500 PUT/COPY/POST/DELETE requests per prefix -- but no limits on the number of prefixes. So if the writes can be distributed across prefixes, we can scale to larger than 4TB clusters no problem. |
Thanks for feedback. I also realize the same issue after we discuss this issue internally. we can change the backup structure, the detailed solution is as below, and we are working on it.
|
Signed-off-by: Gaoming <zhanggaoming028@gmail.com>
Signed-off-by: Gaoming <zhanggaoming028@gmail.com>
Signed-off-by: zhanggaoming <gaoming.zhang@pingcap.com>
Signed-off-by: zhanggaoming <gaoming.zhang@pingcap.com>
Signed-off-by: zhanggaoming <gaoming.zhang@pingcap.com>
@3pointer is there a PR associated with a final fix for this issue? |
PR is in tikv: tikv/tikv#12958 |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Didn't reproduce for now. just the image of the minimal reproduce step.
2. What did you expect to see? (Required)
br or tikv slow down the request of backup. still success after other service stopped.
3. What did you see instead (Required)
TiKV received
503 Slow Down
error and br didn't handle this error properly make the backup interrupt.4. What is your TiDB version? (Required)
all
The text was updated successfully, but these errors were encountered: