-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: don't hold latches while rate limiting ExportRequests #66338
kv: don't hold latches while rate limiting ExportRequests #66338
Conversation
2d32435
to
f2b537b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
pkg/util/limit/limiter.go, line 28 at r1 (raw file):
} // Reservation is a allocation from a limiter which should be released once the
"an allocation"
pkg/util/limit/limiter.go, line 30 at r1 (raw file):
// Reservation is a allocation from a limiter which should be released once the // limited task has been completed. type Reservation interface {
Is this new interface necessary? Can't clients work on quota pool allocs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/store_send.go, line 295 at r1 (raw file):
// Limit the number of concurrent Export requests, as these often scan and // entire Range at a time and place significant read load on a Store. return s.limiters.ConcurrentExportRequests.Begin(ctx)
Do you think its worth emitting a trace event or a log message (like above) of some sort?
This commit moves ExportRequest rate limiting up above evaluation and outside of latching. This ensures that if an ExportRequest is rate-limited, it does not inadvertently block others while waiting. Interestingly, we were already careful about this for `AddSSTableRequests` because it is easier for them to block others while holding latches. In the case of read-only requests like `ExportRequest`, blocking others is less common. However, it is still possible. Notably, MVCC write requests with lower timestamps than the read will still block. Additionally, non-MVCC requests like range splits will block. In one customer investigation, we found that an export request was holding latches and blocking a non-MVCC request (a range split) which was transitively blocking all write traffic to the range and all read traffic to the RHS of the range. We believe that the stall lasted for so long (seconds to minutes) because the ExportRequest was throttled while holding its latches. I did notice that some of this code was just touched by cockroachdb#66092, so any potential backport here may be a little involved. Release note (bug fix): Backups no longer risk the possibility of blocking conflicting writes while being rate limited by the kv.bulk_io_write.concurrent_export_requests concurrency limit.
f2b537b
to
cbc0810
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/store_send.go, line 295 at r1 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
Do you think its worth emitting a trace event or a log message (like above) of some sort?
Yes, done.
pkg/util/limit/limiter.go, line 28 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
"an allocation"
Done.
pkg/util/limit/limiter.go, line 30 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Is this new interface necessary? Can't clients work on quota pool allocs?
We don't need it, but I found it strange that the limit
package would be returning a *quotapool.IntAlloc
. Hiding this behind an interface is free and seemed like a good idea.
bors r+ |
Build succeeded: |
This commit moves ExportRequest rate limiting up above evaluation and
outside of latching. This ensures that if an ExportRequest is
rate-limited, it does not inadvertently block others while waiting.
Interestingly, we were already careful about this for
AddSSTableRequests
because it is easier for them to block others while holding latches. In
the case of read-only requests like
ExportRequest
, blocking others isless common. However, it is still possible. Notably, MVCC write requests
with lower timestamps than the read will still block. Additionally,
non-MVCC requests like range splits will block. In one customer
investigation, we found that an export request was holding latches and
blocking a non-MVCC request (a range split) which was transitively
blocking all write traffic to the range and all read traffic to the RHS
of the range. We believe that the stall lasted for so long (seconds to
minutes) because the ExportRequest was throttled while holding its
latches.
I did notice that some of this code was just touched by #66092, so any
potential backport here may be a little involved.
Release note (bug fix): Backups no longer risk the possibility of
blocking conflicting writes while being rate limited by the
kv.bulk_io_write.concurrent_export_requests concurrency limit.
/cc. @cockroachdb/kv