-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: move schema change lease to schema change job. #34211
Comments
Worse than the thing getting gossipped, the range can grow too large at which point the cluster is completely stuck. |
FWIW, this is the descriptor with the flipping lease. I thought I linked it from #34211, but it looks like I forgot: |
@dt I believe this issue is a biggie. This came up again recently; we've gotten multiple reports for 19.1 about writes to system config ranges being backpressured because the range is too large - at which point the cluster is pretty hosed. |
37605: storage: don't backpressure writes to the system config span r=andreimatei a=nvanbenschoten We're seeing in multiple issues (e.g. #35842, #37337, #37530, etc.) that write backpressure is kicking in on the system config span range itself. This is a serious issue because it means that write backpressure can't be disabled using the `kv.range.backpressure_range_size_multiplier` cluster setting. The reason for the system config span growth is still somewhat unclear and it is being tracked #34211 (comment). It's likely that it has to do with a runaway schema change that continuously grabs and releases a schema change lease. This is an issue and should be fixed, but letting this cause such devastation to a cluster is a problem. To address this, this commit disables write backpressure on the system config span. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@andreimatei do you have any idea which schema change might be at fault here? They're supposed to renew their leases ever 2.5min so with a 25h TTL I'd expect no more than 600 MVCC revisions of a given desc, but I heard someone say we're seeing thousands per minute so something seems off. |
I have to dig through the logs, which I'll do hopefully later today (I'll send you a link to them). The last time when we saw this, it was about a schema change retrying endlessly. |
The latest incarnation of this has #38088 as the cause for the schema change retries. |
Closing since schema change leases (i.e. |
The schema change lease is stored in the table descriptor. What this means is when it gets updated every 2-3 minutes, the system config span gets gossiped. The schema change lease is also modifying the descriptor without changing the Version on the descriptor. It is best that it be moved to the schema change job.
The text was updated successfully, but these errors were encountered: