Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: don't hold latches while rate limiting ExportRequests #66338

Merged

Conversation

nvanbenschoten
Copy link
Member

This commit moves ExportRequest rate limiting up above evaluation and
outside of latching. This ensures that if an ExportRequest is
rate-limited, it does not inadvertently block others while waiting.

Interestingly, we were already careful about this for AddSSTableRequests
because it is easier for them to block others while holding latches. In
the case of read-only requests like ExportRequest, blocking others is
less common. However, it is still possible. Notably, MVCC write requests
with lower timestamps than the read will still block. Additionally,
non-MVCC requests like range splits will block. In one customer
investigation, we found that an export request was holding latches and
blocking a non-MVCC request (a range split) which was transitively
blocking all write traffic to the range and all read traffic to the RHS
of the range. We believe that the stall lasted for so long (seconds to
minutes) because the ExportRequest was throttled while holding its
latches.

I did notice that some of this code was just touched by #66092, so any
potential backport here may be a little involved.

Release note (bug fix): Backups no longer risk the possibility of
blocking conflicting writes while being rate limited by the
kv.bulk_io_write.concurrent_export_requests concurrency limit.

/cc. @cockroachdb/kv

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)


pkg/util/limit/limiter.go, line 28 at r1 (raw file):

}

// Reservation is a allocation from a limiter which should be released once the

"an allocation"


pkg/util/limit/limiter.go, line 30 at r1 (raw file):

// Reservation is a allocation from a limiter which should be released once the
// limited task has been completed.
type Reservation interface {

Is this new interface necessary? Can't clients work on quota pool allocs?

Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_send.go, line 295 at r1 (raw file):

		// Limit the number of concurrent Export requests, as these often scan and
		// entire Range at a time and place significant read load on a Store.
		return s.limiters.ConcurrentExportRequests.Begin(ctx)

Do you think its worth emitting a trace event or a log message (like above) of some sort?

This commit moves ExportRequest rate limiting up above evaluation and
outside of latching. This ensures that if an ExportRequest is
rate-limited, it does not inadvertently block others while waiting.

Interestingly, we were already careful about this for `AddSSTableRequests`
because it is easier for them to block others while holding latches. In
the case of read-only requests like `ExportRequest`, blocking others is
less common. However, it is still possible. Notably, MVCC write requests
with lower timestamps than the read will still block. Additionally,
non-MVCC requests like range splits will block. In one customer
investigation, we found that an export request was holding latches and
blocking a non-MVCC request (a range split) which was transitively
blocking all write traffic to the range and all read traffic to the RHS
of the range. We believe that the stall lasted for so long (seconds to
minutes) because the ExportRequest was throttled while holding its
latches.

I did notice that some of this code was just touched by cockroachdb#66092, so any
potential backport here may be a little involved.

Release note (bug fix): Backups no longer risk the possibility of
blocking conflicting writes while being rate limited by the
kv.bulk_io_write.concurrent_export_requests concurrency limit.
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @andreimatei)


pkg/kv/kvserver/store_send.go, line 295 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Do you think its worth emitting a trace event or a log message (like above) of some sort?

Yes, done.


pkg/util/limit/limiter.go, line 28 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

"an allocation"

Done.


pkg/util/limit/limiter.go, line 30 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Is this new interface necessary? Can't clients work on quota pool allocs?

We don't need it, but I found it strange that the limit package would be returning a *quotapool.IntAlloc. Hiding this behind an interface is free and seemed like a good idea.

@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 11, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants