Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvflowcontroller: implement kvflowcontrol.Controller #95905

Merged
merged 1 commit into from
Feb 8, 2023

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Jan 26, 2023

Part of #95563.

kvflowcontroller.Controller is a concrete implementation of the kvflowcontrol.Controller interface. It provides flow control for replication traffic in KV and is typically held at the node-level. Internally it maintain flow token buckets for {regular,elastic} work traveling along each kvflowcontrol.Stream. When work tries to Admit(), the controller blocks until there are flow tokens available the specified stream/work class.

For each stream we maintain two flow token buckets, one for each work class (regular and elastic). To provide coarse grained prioritization, we want the following properties:

  • When elastic tokens are deducted/returned, they only affect the elastic bucket. It cannot cause blocking for regular work (which would be a form of priority inversion).
  • When regular tokens get deducted, they deduct from both the regular and elastic buckets. We do want regular work to starve out elastic work. If sufficient regular work has not yet been logically admitted, and other regular work is waiting for it, we don't want elastic work to be able to go through.

To get a sense of what this ends up looking like, consider the following datadriven snippet:

history
----
                   regular |  elastic
                    +16MiB |  +8.0MiB
======================================
 -1.0MiB regular    +15MiB |  +7.0MiB
 -7.0MiB regular   +8.0MiB |      +0B (elastic blocked)
 -2.0MiB regular   +6.0MiB |      +0B (elastic blocked)
 -2.0MiB regular   +4.0MiB |      +0B (elastic blocked)
 +2.0MiB regular   +6.0MiB |      +0B (elastic blocked)
 -2.0MiB regular   +4.0MiB |      +0B (elastic blocked)
 +4.0MiB regular   +8.0MiB |      +0B (elastic blocked)
 +2.0MiB regular    +10MiB |  +2.0MiB
 -2.0MiB elastic    +10MiB |      +0B (elastic blocked)
 -2.0MiB regular   +8.0MiB |  -2.0MiB (elastic blocked)
 +2.0MiB regular    +10MiB |      +0B (elastic blocked)
 +2.0MiB elastic    +10MiB |  +2.0MiB
 +6.0MiB regular    +16MiB |  +8.0MiB

The kvflowcontrol.Controller implementation is furnished with the following metrics:

- kvadmission.flow_controller.{regular,elastic}_tokens_{available,deducted,returned}
- kvadmission.flow_controller.{regular,elastic}_requests_{waiting,admitted,errored}
- kvadmission.flow_controller.{regular,elastic}_wait_duration
- kvadmission.flow_controller.{regular,elastic}_{,blocked_}stream_count

This commit also introduces a deterministic simulator to understand the kind of traffic shaping this flow controller does. The simulator is able to spit out each of the metrics above. To do all this, we added a utility asciitsdb library that comes 'batteries-included' -- it integrates with our metrics registry, is able to scrape relevant metrics, and spit out ascii plots for metric values, all of which we use in tests added in this package. It looks like the following:

# Note again the mirroring between token returns which immediately
# allows admission, followed by token deductions.
plot start=15s end=30s
kvadmission.flow_controller.regular_tokens_{deducted,returned} unit=MiB rate=true
----
 2.1 ┤       ╭╮
 2.0 ┤      ╭╮│         ╭╮          ╭╮╮
 1.9 ┤      │╰╮         ││╮         │││
 1.7 ┤     ╭╯╯│        ╭╯╰╮        ╭╯╰╮
 1.6 ┤     ││ │╮       │╯ │        │╯ │
 1.4 ┤     │╯ ││      ╭╯  │       ╭╯  │╮
 1.3 ┤    ╭╯  ╰╮      │   ╰╮      │╯  ││
 1.1 ┤    │╯   │╮    ╭╯    │     ╭╯   ╰╮
 1.0 ┤   ╭╯    ││    │╯    │     │╯    │╮
 0.9 ┤   │╯    ╰╮   ╭╯     │╮   ╭╯     ││   ╭
 0.7 ┤  ╭╯      │   │      ╰╮  ╭╯      ││  ╭╯
 0.6 ┤ ╭╯╯      │╮ ╭╯       │  │╯      ╰╮  │╯
 0.4 ┤ │╯       ││╭╯        │ ╭╯        │╮╭╯
 0.3 ┤╭╯        ╰─╯│        ╰─╯         │╭╯
 0.1 ┼╯╯         │╭╯         ╰╯         ╰╯╯
-0.0 ┼╯          ╰╯
      rate(regular_tokens_{deducted,returned}) (MiB)

# So we're still admitting work choppily, and observe corresponding
# deductions in the waiting request rate. But given the open-loop
# thread above, the # of waiting request is still growing
# unboundedly.
plot start=15s end=30s
kvadmission.flow_controller.regular_requests_{admitted,waiting} unit=reqs/s rate=true
kvadmission.flow_controller.regular_requests_waiting unit=reqs/s
----
----
 10.7 ┤       ╭╮
  9.9 ┼╮      ││  ╭╮                  ╭╮  ╭╮
  9.2 ┤╰╮    ╭╯│  │╰╮    ╭─╮  ╭╮     ╭╯│  ││
  8.4 ┤ │    │ ╰╮ │ │   ╭╯ │ ╭╯╰╮   ╭╯ │  │╰╮
  7.7 ┤ │   ╭╯  │ │ ╰╮  │  │ │  │   │  ╰╮╭╯ │
  6.9 ┤ ╰╮  │   │╭╯  │ ╭╯  ╰╮│  ╰╮ ╭╯   ││  ╰╮
  6.1 ┤  ╰╮╭╯   ││   ╰╮│    ╭╯   │ │    ││   ╰
  5.4 ┤   ││    ╰│    │╯    │    ╰╮╯    ╭╯
  4.6 ┤   ╰╮    ╭╯    ╰╮    │    ╭╰╮    │╮
  3.9 ┤   ╭╰╮   ││   ╭╯│    │╮   │ │    ││   ╭
  3.1 ┤  ╭╯ ╰╮  │╰╮  │ ╰╮  ╭╯│  ╭╯ ╰╮  ╭╯│  ╭╯
  2.3 ┤ ╭╯   │  │ │ ╭╯  ╰╮ │ │ ╭╯   ╰╮ │ ╰╮╭╯
  1.6 ┤ │    ╰╮╭╯ │ │    │ │ ╰╮│     │ │  ││
  0.8 ┤╭╯     ││  │╭╯    ╰─╯  ╰╯     ╰╮│  ╰╯
  0.1 ┼╯      ││  ╰╯                  ╰╯
 -0.7 ┤       ╰╯
       rate(regular_requests_{admitted,waiting}) (reqs/s)

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif force-pushed the 230123.kvflowcontroller branch from 045a704 to 0415275 Compare January 26, 2023 02:30
@irfansharif irfansharif force-pushed the 230123.kvflowcontroller branch 3 times, most recently from 0095365 to 70fee38 Compare January 30, 2023 17:42
@irfansharif irfansharif marked this pull request as ready for review January 30, 2023 17:44
@irfansharif irfansharif requested review from a team as code owners January 30, 2023 17:44
@irfansharif irfansharif force-pushed the 230123.kvflowcontroller branch 4 times, most recently from af2585f to 19b9e7e Compare January 30, 2023 22:03
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

               regular |  elastic
                +16MiB |  +8.0MiB

======================================
-1.0MiB regular +15MiB | +7.0MiB
-7.0MiB regular +8.0MiB | +0B (elastic blocked)
-2.0MiB regular +6.0MiB | +0B (elastic blocked)

btw, this +0B instead of being negative is confusing (I think we discussed this before).
Some more about this below

Generally looks good. I haven't yet read the metrics stuff and the tests

Reviewed 10 of 22 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 50 at r1 (raw file):

	mu struct {
		syncutil.Mutex
		limit tokensPerWorkClass

A code comment would be useful. Presumably this limit map tracks the two cluster settings declared above.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 69 at r1 (raw file):

	// TODO(irfansharif): React to these cluster settings changing. Changing the
	// limits is easy enough, but we also want to resize existing flow control
	// buckets. Should we just reset everything across the board, relying on

I am guessing the resizing is of the form where if the limit is changing by N (N can be negative), we simply add N to the bucket.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 199 at r1 (raw file):

// returning and waiting for flow tokens.
type bucket struct {
	tokens          tokensPerWorkClass

The synchronization story is not clear. tokens is protected by a Mutex, but callers are waiting on waitForTokensCh after releasing the mutex. So multiple waiters can start waiting because tokens is negative but only one will signaled when the tokens become positive?
Is there some grant chaining logic this is trying to implement?
Also waitForTokensCh needs a definition of what it conceptually represents.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 231 at r1 (raw file):

	// TODO(irfansharif): Stop enforcing the ceiling under test builds. Under
	// production builds should we also perhaps enforce some floor? Say -16 MiB

Why do we need to enforce the ceiling or a floor?
I would imagine that under test builds we would check (a) that we are not going above the configured ceiling, since that means we have double returned tokens, (b) after the test has quiesced, we could check that we are at the configured ceiling, otherwise we have not returned some tokens.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 244 at r1 (raw file):

		b.tokens[class] += delta
		if delta > 0 {
			b.tokens[class] = min(b.tokens[class], limit[class]) // enforce ceiling

If this is indicative of a bug, we should log an error and increment some error metric (so we can use centralized monitoring or something to quickly check whether we have a distributed accounting bug).


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 249 at r1 (raw file):

		b.tokens[class] += delta
		if delta > 0 {
			b.tokens[class] = min(b.tokens[class], limit[class]) // enforce ceiling

ditto


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 251 at r1 (raw file):

			b.tokens[class] = min(b.tokens[class], limit[class]) // enforce ceiling

			if b.tokens[regular] > limit[elastic] {

I don't understand the logic in this if-block or the next else-block.
I was expecting simply

	case regular:
		b.tokens[class] += delta
		if b.tokens[class] > limit[class] {
		  log.Errorf(...)
		  b.tokens[class] = limit[class]
		}
		b.tokens[elastic] += delta
		// Same error checking as above
		...

That is, regular tokens always cause the same adjustment to elastic.
So (16MB, 8MB) to start with, and say -16MB of regular adjustment happens, then we have (0, -8MB). And when 8MB of regular adjustment happens we are at (8MB, 0).
Am I missing something?


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 304 at r1 (raw file):

	return c.mu.limit
}

(reminder to self) stopped here in this file

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 22 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 30 at r1 (raw file):

	// ID receiving replication traffic, which is a high cardinality measure
	// (storeIDs could ratchet up arbitrarily). The similar argument technically
	// applies for per-tenant metrics, but there we'd rather eat the cost.

tenant x store cardinality can indeed be very high.
But we should consider:

  • metrics per store (aggregated across tenant). Things like waitDuration, requestsWaiting, requestsAdmitted, ...
  • a debug page handler on the node that will give us the token counts and queue lengths of all buckets. For CC-like environments, where we can look at an ongoing issue, such "inspectz" pages can be very useful (kv,*:state inspection pages for a cluster node #66772)

pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 57 at r1 (raw file):

	requestsWaiting = metric.Metadata{
		Name:        "kvadmission.flow_controller.%s_requests_waiting",
		Help:        "Number of %s requests waiting for flow tokens",

nit: we are using both "requests" and "work" to mean the same thing. Worth picking one.d


pkg/util/asciitsdb/asciitsdb.go line 35 at r1 (raw file):

		syncutil.Mutex
		scraped bool
		points  map[string][]float64

Can you add a comment here about the key and value? Why is there a []float64 as value instead of a single float64?


pkg/util/asciitsdb/asciitsdb.go line 271 at r1 (raw file):

}

func rate(input []float64, r int) []float64 {

I don't understand what is going on here. What do input and r represent?

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 22 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_simulation_test.go line 56 at r1 (raw file):

//     t='end' , issuing the specified 'rate' of requests of the
//     given work 'class', over the given 'stream', where the flow tokens are
//     {deducted,returned} with the given bandwidth. If flow tokens are being

what is the rate? Is that the how the granularity of the adjustment is decided? That is, if adjust=+100bytes/d and rate=5/s, then each request/return is going to do 100/5=20 bytes?

@irfansharif irfansharif force-pushed the 230123.kvflowcontroller branch 2 times, most recently from 6c8d2b2 to 0bbcf4a Compare February 5, 2023 05:11
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 50 at r1 (raw file):

Previously, sumeerbhola wrote…

A code comment would be useful. Presumably this limit map tracks the two cluster settings declared above.

Done.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 69 at r1 (raw file):

Previously, sumeerbhola wrote…

I am guessing the resizing is of the form where if the limit is changing by N (N can be negative), we simply add N to the bucket.

Done, PTAL.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 199 at r1 (raw file):

So multiple waiters can start waiting because tokens is negative but only one will signaled when the tokens become positive?

When the first waiter gets signaled, and it finds available tokens, it signals the next waiter on its way out. If any waiter that was signaled didn't find available tokens, it goes back to waiting, signaled only when tokens are returned.

Is there some grant chaining logic this is trying to implement?

The grant chaining is happening when admitted requests, on their way out, signal waiters (if any).

Added the two following comment blocks:

// - Tokens are protected by Controller.mu;
// - Waiting requests do so by waiting on channel signalCh, without
//   holding mutexes. Requests first check for available tokens, waiting if
//   unavailable.
//   - Whenever tokens are returned, signalCh is signaled, waking up a single
//     waiter.
//     - If the request finds no available tokens, it starts waiting again.
//   - Whenever a request gets admitted, it signals the next waiter if any.
// TODO(irfansharif): Right now we continue forwarding admission
// grants to request while the available tokens > 0, which can lead
// to over-admission since deduction is deferred (see
// testdata/simulation/over_admission). One mitigation could be
// terminating grant forwarding if the 'tentatively deducted tokens'
// exceeds some amount (say, 16 MiB). When tokens are actually
// deducted, we'll reduce from this 'tentatively deducted' count.
// We can re-signal() on every actual token deduction where
// available tokens is still > 0.

pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 231 at r1 (raw file):

Previously, sumeerbhola wrote…

Why do we need to enforce the ceiling or a floor?
I would imagine that under test builds we would check (a) that we are not going above the configured ceiling, since that means we have double returned tokens, (b) after the test has quiesced, we could check that we are at the configured ceiling, otherwise we have not returned some tokens.

Removed the TODO and added a log.Fatal under test builds if we have an accounting bug. And yes, those would would the kinds of tests to write once we've addressed the various interactions outlined in doc.go.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 244 at r1 (raw file):

Previously, sumeerbhola wrote…

If this is indicative of a bug, we should log an error and increment some error metric (so we can use centralized monitoring or something to quickly check whether we have a distributed accounting bug).

Done.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 249 at r1 (raw file):

Previously, sumeerbhola wrote…

ditto

Done.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 251 at r1 (raw file):

Previously, sumeerbhola wrote…

I don't understand the logic in this if-block or the next else-block.
I was expecting simply

	case regular:
		b.tokens[class] += delta
		if b.tokens[class] > limit[class] {
		  log.Errorf(...)
		  b.tokens[class] = limit[class]
		}
		b.tokens[elastic] += delta
		// Same error checking as above
		...

That is, regular tokens always cause the same adjustment to elastic.
So (16MB, 8MB) to start with, and say -16MB of regular adjustment happens, then we have (0, -8MB). And when 8MB of regular adjustment happens we are at (8MB, 0).
Am I missing something?

The blocking properties are the same, see the first git diff for the test below, with your suggested change.

image.png

I only tried the more complicated scheme to try and avoid -ve elastic tokens that's now possible in the steady state. But that's probably not a good reason (and also, tokens are still allowed to go into the negative). Anyway, changed. Here's how the metrics end up looking like now (I liked that we could say "regular work can deplete elastic tokens" by pointing to it being at zero).

image copy 1.png

image copy 2.png


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 30 at r1 (raw file):

Previously, sumeerbhola wrote…

tenant x store cardinality can indeed be very high.
But we should consider:

  • metrics per store (aggregated across tenant). Things like waitDuration, requestsWaiting, requestsAdmitted, ...
  • a debug page handler on the node that will give us the token counts and queue lengths of all buckets. For CC-like environments, where we can look at an ongoing issue, such "inspectz" pages can be very useful (kv,*:state inspection pages for a cluster node #66772)

Added TODOs. TIL about inspectz pages, those seem very handy and a great idea.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 57 at r1 (raw file):

Previously, sumeerbhola wrote…

nit: we are using both "requests" and "work" to mean the same thing. Worth picking one.d

Done (requests).


pkg/util/asciitsdb/asciitsdb.go line 35 at r1 (raw file):

Previously, sumeerbhola wrote…

Can you add a comment here about the key and value? Why is there a []float64 as value instead of a single float64?

Done, it's a mapping between metric name => scraped data points (which is why it's a list). Every Scrape() we append a single data point for all registered metrics.


pkg/util/asciitsdb/asciitsdb.go line 271 at r1 (raw file):

Previously, sumeerbhola wrote…

I don't understand what is going on here. What do input and r represent?

Added a comment:

// rate takes a list of individual data points (typically read from a cumulative
// counter) and returns a derivative, computed simply by taking each
// corresponding input point and subtracting one r points ago. If r points are
// collected per-{time unit}, this ends up commuting the rate of growth
// per-{time unit}.

pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_simulation_test.go line 56 at r1 (raw file):

Previously, sumeerbhola wrote…

what is the rate? Is that the how the granularity of the adjustment is decided? That is, if adjust=+100bytes/d and rate=5/s, then each request/return is going to do 100/5=20 bytes?

Yes.

@irfansharif irfansharif force-pushed the 230123.kvflowcontroller branch from 0bbcf4a to fb01590 Compare February 6, 2023 14:14
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Have a few more testdata files left to read.

Reviewed 2 of 22 files at r1, 10 of 14 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


-- commits line 99 at r3:
isn't this waiting metric a gauge? If yes, why the reqs/s units?


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 61 at r3 (raw file):

		// We maintain flow token buckets for {regular,elastic} work along each
		// stream. This is lazily instantiated.
		buckets map[kvflowcontrol.Stream]bucket

What is the GC story for buckets? Ok to add a TODO.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 141 at r3 (raw file):

			// testdata/simulation/over_admission). One mitigation could be
			// terminating grant forwarding if the 'tentatively deducted tokens'
			// exceeds some amount (say, 16 MiB). When tokens are actually

The problem with "tentatively deducted" is the pathological case where all these tentatively deducted tokens are due to requests that are waiting on latches and locks. And the next request that wants to be admitted is not contending with those latches/locks but gets stuck here.
I think we had discussed something like counting the number of requests that went through this bucket and are past latch/lock acquisition but not yet evaluated, and block here if that count is greater than GOMAXPROCS (or some small multiple of it).


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 261 at r3 (raw file):

func (b *bucket) signal() {
	select {
	case b.signalCh <- struct{}{}:

This is a non-blocking write to the channel that ensures it is topped up to 1 entry, yes?
Please add a code comment.

Also, we need a comment on why the non-atomic read of tokens followed by waiting on signalCh is correct with a concurrent request adding tokens. The concurrent request that adds tokens will top up the channel so that the first waiter who read tokens < 0 will find the channel signaled. The invariant we are ensuring is that whenever tokens > 0, the channel will have an entry, so at least one request that observed tokens < 0 will get unblocked, which will unblock others.


pkg/util/asciitsdb/asciitsdb.go line 275 at r3 (raw file):

// counter) and returns a derivative, computed simply by taking each
// corresponding input point and subtracting one r points ago. If r points are
// collected per-{time unit}, this ends up commuting the rate of growth

nit: computing


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/testdata/simulation/admitting_waiting_choppy line 106 at r3 (raw file):

# off of zero.
plot start=15s end=30s
kvadmission.flow_controller.regular_tokens_available            unit=MiB

Loving these visual tests!

btw, why is this value ever positive given the waiting requests and deduction-delay=0?


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/testdata/simulation/admitting_waiting_choppy line 156 at r3 (raw file):

plot start=15s end=30s
kvadmission.flow_controller.regular_requests_{admitted,waiting} unit=reqs/s rate=true
kvadmission.flow_controller.regular_requests_waiting unit=reqs/s

same question about waiting being a gauge.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done reading

Reviewed 1 of 22 files at r1, 2 of 14 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/testdata/simulation/overview line 27 at r3 (raw file):

# flow tokens until t=20s, we deplete them at t=8s. Also note that despite this
# being regular traffic, we deduct elastic flow tokens for coarse intra-tenant
# prioritization -- those tokens get deducted at t=4s. Both are deducted at

did "those tokens get deducted at t=4s" mean to say that at t=4s the elastic streams are down to 0 tokens?

@irfansharif irfansharif force-pushed the 230123.kvflowcontroller branch from fb01590 to 16609bd Compare February 8, 2023 04:56
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


-- commits line 99 at r3:

Previously, sumeerbhola wrote…

isn't this waiting metric a gauge? If yes, why the reqs/s units?

Fixed.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 61 at r3 (raw file):

Previously, sumeerbhola wrote…

What is the GC story for buckets? Ok to add a TODO.

Added the following TODO:

// TODO(irfansharif): Sort out the GC story for these buckets. When
// streams get closed permanently (tenants get deleted, nodes removed)
// or when completely inactive (no tokens deducted/returned over 30+
// minutes), clear these out.

pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 141 at r3 (raw file):

Previously, sumeerbhola wrote…

The problem with "tentatively deducted" is the pathological case where all these tentatively deducted tokens are due to requests that are waiting on latches and locks. And the next request that wants to be admitted is not contending with those latches/locks but gets stuck here.
I think we had discussed something like counting the number of requests that went through this bucket and are past latch/lock acquisition but not yet evaluated, and block here if that count is greater than GOMAXPROCS (or some small multiple of it).

That sounds easy enough to do. Expanded the comment.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 261 at r3 (raw file):

Previously, sumeerbhola wrote…

This is a non-blocking write to the channel that ensures it is topped up to 1 entry, yes?
Please add a code comment.

Also, we need a comment on why the non-atomic read of tokens followed by waiting on signalCh is correct with a concurrent request adding tokens. The concurrent request that adds tokens will top up the channel so that the first waiter who read tokens < 0 will find the channel signaled. The invariant we are ensuring is that whenever tokens > 0, the channel will have an entry, so at least one request that observed tokens < 0 will get unblocked, which will unblock others.

Done.


pkg/util/asciitsdb/asciitsdb.go line 275 at r3 (raw file):

Previously, sumeerbhola wrote…

nit: computing

Done (these are the kinds of typos you'd expect sharing an office with urbanists like @itsbilal).


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/testdata/simulation/admitting_waiting_choppy line 106 at r3 (raw file):

Previously, sumeerbhola wrote…

Loving these visual tests!

btw, why is this value ever positive given the waiting requests and deduction-delay=0?

These are artifacts of the simulation code itself. We're processing at most one waiting request per tick, adjusting tokens at the end of one tick, scraping metrics, processing waiters at the next tick, re-scraping metrics. So some (small) oscillations around 0MiB is expected.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/testdata/simulation/admitting_waiting_choppy line 156 at r3 (raw file):

Previously, sumeerbhola wrote…

same question about waiting being a gauge.

Same answer as above, just artifacts of the simulation code itself and how rapidly we process waiting requests (just 1 per simulation tick).


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/testdata/simulation/overview line 27 at r3 (raw file):

Previously, sumeerbhola wrote…

did "those tokens get deducted at t=4s" mean to say that at t=4s the elastic streams are down to 0 tokens?

Yes, sorry, meant to say "those tokens get depleted at t=4s" which is what the graph shows.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)


-- commits line 98 at r4:
This is the same gauge, yes? And the rate is interesting because the wait queue is increasing?


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/testdata/simulation/admitting_waiting_choppy line 106 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

These are artifacts of the simulation code itself. We're processing at most one waiting request per tick, adjusting tokens at the end of one tick, scraping metrics, processing waiters at the next tick, re-scraping metrics. So some (small) oscillations around 0MiB is expected.

Ack

Part of cockroachdb#95563.

kvflowcontroller.Controller is a concrete implementation of the
kvflowcontrol.Controller interface. It provides flow control for
replication traffic in KV and is typically held at the node-level.
Internally it maintain flow token buckets for {regular,elastic} work
traveling along each kvflowcontrol.Stream. When work tries to Admit(),
the controller blocks until there are flow tokens available the
specified stream/work class.

For each stream we maintain two flow token buckets, one for each work
class (regular and elastic). To provide coarse grained prioritization,
we want the following properties:
- When elastic tokens are deducted/returned, they only affect the
  elastic bucket. It cannot cause blocking for regular work (which would
  be a form of priority inversion).
- When regular tokens get deducted, they deduct from both the regular
  and elastic buckets. We do want regular work to starve out elastic
  work. If sufficient regular work has not yet been logically admitted,
  and other regular work is waiting for it, we don't want elastic work
  to be able to go through.

To get a sense of what this ends up looking like, consider the following
datadriven snippet:

    history
    ----
                       regular |  elastic
                        +16MiB |  +8.0MiB
    ======================================
     -1.0MiB regular    +15MiB |  +7.0MiB
     -7.0MiB regular   +8.0MiB |      +0B (elastic blocked)
     -2.0MiB regular   +6.0MiB |  -2.0MiB (elastic blocked)
     -2.0MiB regular   +4.0MiB |  -4.0MiB (elastic blocked)
     +2.0MiB regular   +6.0MiB |  -2.0MiB (elastic blocked)
     -2.0MiB regular   +4.0MiB |  -4.0MiB (elastic blocked)
     +4.0MiB regular   +8.0MiB |      +0B (elastic blocked)
     +2.0MiB regular    +10MiB |  +2.0MiB
     -2.0MiB elastic    +10MiB |      +0B (elastic blocked)
     -2.0MiB regular   +8.0MiB |  -2.0MiB (elastic blocked)
     +2.0MiB regular    +10MiB |      +0B (elastic blocked)
     +2.0MiB elastic    +10MiB |  +2.0MiB
     +6.0MiB regular    +16MiB |  +8.0MiB

The kvflowcontrol.Controller implementation is furnished with the
following metrics:

- kvadmission.flow_controller.{regular,elastic}_tokens_available
- kvadmission.flow_controller.{regular,elastic}_tokens_deducted
- kvadmission.flow_controller.{regular,elastic}_tokens_returned
- kvadmission.flow_controller.{regular,elastic}_tokens_unaccounted
- kvadmission.flow_controller.{regular,elastic}_requests_waiting
- kvadmission.flow_controller.{regular,elastic}_requests_admitted
- kvadmission.flow_controller.{regular,elastic}_requests_errored
- kvadmission.flow_controller.{regular,elastic}_wait_duration
- kvadmission.flow_controller.{regular,elastic}_stream_count
- kvadmission.flow_controller.{regular,elastic}_blocked_stream_count

This commit also introduces a deterministic simulator to understand
the kind of traffic shaping this flow controller does. The simulator is
able to spit out each of the metrics above. To do all this, we added a
utility asciitsdb library that comes 'batteries-included' -- it
integrates with our metrics registry, is able to scrape relevant
metrics, and spit out ascii plots for metric values, all of which we use
in tests added in this package. It looks like the following:

    # Note again the mirroring between token returns which immediately
    # allows admission, followed by token deductions.
    plot start=15s end=30s
    kvadmission.flow_controller.regular_tokens_{deducted,returned} unit=MiB rate=true
    ----
     2.1 ┤       ╭╮
     2.0 ┤      ╭╮│         ╭╮          ╭╮╮
     1.9 ┤      │╰╮         ││╮         │││
     1.7 ┤     ╭╯╯│        ╭╯╰╮        ╭╯╰╮
     1.6 ┤     ││ │╮       │╯ │        │╯ │
     1.4 ┤     │╯ ││      ╭╯  │       ╭╯  │╮
     1.3 ┤    ╭╯  ╰╮      │   ╰╮      │╯  ││
     1.1 ┤    │╯   │╮    ╭╯    │     ╭╯   ╰╮
     1.0 ┤   ╭╯    ││    │╯    │     │╯    │╮
     0.9 ┤   │╯    ╰╮   ╭╯     │╮   ╭╯     ││   ╭
     0.7 ┤  ╭╯      │   │      ╰╮  ╭╯      ││  ╭╯
     0.6 ┤ ╭╯╯      │╮ ╭╯       │  │╯      ╰╮  │╯
     0.4 ┤ │╯       ││╭╯        │ ╭╯        │╮╭╯
     0.3 ┤╭╯        ╰─╯│        ╰─╯         │╭╯
     0.1 ┼╯╯         │╭╯         ╰╯         ╰╯╯
    -0.0 ┼╯          ╰╯
          rate(regular_tokens_{deducted,returned}) (MiB)

    # So we're still admitting work choppily, and observe corresponding
    # deductions in the waiting request rate. But given the open-loop
    # thread above, the # of waiting request is still growing
    # unboundedly.
    plot start=15s end=30s
    kvadmission.flow_controller.regular_requests_{admitted,waiting} unit=reqs/s rate=true
    kvadmission.flow_controller.regular_requests_waiting unit=reqs
    ----
    ----
     10.7 ┤       ╭╮
      9.9 ┼╮      ││  ╭╮                  ╭╮  ╭╮
      9.2 ┤╰╮    ╭╯│  │╰╮    ╭─╮  ╭╮     ╭╯│  ││
      8.4 ┤ │    │ ╰╮ │ │   ╭╯ │ ╭╯╰╮   ╭╯ │  │╰╮
      7.7 ┤ │   ╭╯  │ │ ╰╮  │  │ │  │   │  ╰╮╭╯ │
      6.9 ┤ ╰╮  │   │╭╯  │ ╭╯  ╰╮│  ╰╮ ╭╯   ││  ╰╮
      6.1 ┤  ╰╮╭╯   ││   ╰╮│    ╭╯   │ │    ││   ╰
      5.4 ┤   ││    ╰│    │╯    │    ╰╮╯    ╭╯
      4.6 ┤   ╰╮    ╭╯    ╰╮    │    ╭╰╮    │╮
      3.9 ┤   ╭╰╮   ││   ╭╯│    │╮   │ │    ││   ╭
      3.1 ┤  ╭╯ ╰╮  │╰╮  │ ╰╮  ╭╯│  ╭╯ ╰╮  ╭╯│  ╭╯
      2.3 ┤ ╭╯   │  │ │ ╭╯  ╰╮ │ │ ╭╯   ╰╮ │ ╰╮╭╯
      1.6 ┤ │    ╰╮╭╯ │ │    │ │ ╰╮│     │ │  ││
      0.8 ┤╭╯     ││  │╭╯    ╰─╯  ╰╯     ╰╮│  ╰╯
      0.1 ┼╯      ││  ╰╯                  ╰╯
     -0.7 ┤       ╰╯
           rate(regular_requests_{admitted,waiting}) (reqs/s)

      118 ┤                                      ╭
      115 ┤                                   ╭──╯
      112 ┤                                  ╭╯
      108 ┤                                 ╭╯
      105 ┤                           ╭─────╯
      102 ┤                         ╭─╯
       99 ┤                       ╭─╯
       96 ┤                     ╭─╯
       92 ┤                    ╭╯
       89 ┤              ╭─────╯
       86 ┤            ╭─╯
       83 ┤          ╭─╯
       80 ┤         ╭╯
       76 ┤        ╭╯
       73 ┤ ╭──────╯
       70 ┼─╯
               regular_requests_waiting (reqs)
     ----
     ----

Release note: None
@irfansharif irfansharif force-pushed the 230123.kvflowcontroller branch from 16609bd to c16ca44 Compare February 8, 2023 21:14
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


-- commits line 98 at r4:

Previously, sumeerbhola wrote…

This is the same gauge, yes? And the rate is interesting because the wait queue is increasing?

Yes, the same gauge. I also thought it was interesting because whenever the rate of wait queue growth is dampened, it's because of an increase in admission rate during token availability.

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build succeeded:

@craig craig bot merged commit 0ab83da into cockroachdb:master Feb 8, 2023
@irfansharif irfansharif deleted the 230123.kvflowcontroller branch February 8, 2023 23:31
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Feb 9, 2023
Started tripping in non-essential CI after cockroachdb#95905 with 'Nanoseconds is a
pure function but its return value is ignored (SA4017)'.

Release note: None
craig bot pushed a commit that referenced this pull request Feb 9, 2023
96769: sql/schemachanger: remove oprules from declarative schema changer r=fqazi a=fqazi

This PR will do the following:

1. Enforce assertions to block not implemented operations outside of special drop scenarios
2. Eliminate oprules for both dropping descriptors and columns, so that operations are still generated as no-ops.

96826: go.mod: bump Pebble to 65fa048bf403 r=raduberinde a=itsbilal

```
65fa048b *: Add shared.Storage to Options.
697aa63a objstorage: introduce shared object catalog
1bb7e13c pebble: extract manifest rotation logic into record.RotationHelper
7d1e4ba7 db: change flushable ingest key to encode just the FileNum
e35d9f5c *: Add objstorage.shared.Storage interface for storing sstables
c54158bf vfs: move loggingFS to vfs
4eba71e5 replay: explcitly flush when replaying flushes
e935ca05 replay: add option to apply subset of a workload
f2e58dc4 deps: update github.com/cespare/xxhash to v2.2.0
7491312f deps: update github.com/klauspost/compress to v1.15.15
a0da8a96 cmd/pebble: allow setting custom Pebble Options through a CLI flag
be9f8bdd replay: fix TotalWAmp calculation
65aa6d29 cmd/pebble: remove temporary run directories before exit
805b5f0d replay: fix and refactor compaction tracking
2f980560 cmd/pebble: add -count to replay command
71b17f4f objstorage: delete obsolete objects through provider
```

Epic: None.
Release note: None.

96835: kvflowcontroller: fix TestLint/TestStaticCheck r=irfansharif a=irfansharif

Started tripping in non-essential CI after #95905 with 'Nanoseconds is a pure function but its return value is ignored (SA4017)'.

Release note: None

Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Feb 11, 2023
Part of cockroachdb#95563.

kvflowcontrol.Handle is used to interface with replication flow control;
it's typically backed by a node-level kvflowcontrol.Controller. Handles
are held on replicas initiating replication traffic, i.e. are both the
leaseholder and raft leader, and manage multiple streams underneath
(typically one per active member of the raft group).

When replicating log entries, these replicas choose the log position
(term+index) the data is to end up at, and use this handle to track the
token deductions on a per log position basis. When informed of admitted
log entries on the receiving end of the stream, we free up tokens by
specifying the highest log position up to which we've admitted
(below-raft admission, for a given priority, takes log position into
account -- see kvflowcontrolpb.AdmittedRaftLogEntries for more details).

We also extend the testing framework introduced in cockroachdb#95905 to also
support writing tests for kvflowcontrol.Handle -- it's now pulled into
its own kvflowsimulator package.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Feb 16, 2023
Part of cockroachdb#95563.

kvflowcontrol.Handle is used to interface with replication flow control;
it's typically backed by a node-level kvflowcontrol.Controller. Handles
are held on replicas initiating replication traffic, i.e. are both the
leaseholder and raft leader, and manage multiple streams underneath
(typically one per active member of the raft group).

When replicating log entries, these replicas choose the log position
(term+index) the data is to end up at, and use this handle to track the
token deductions on a per log position basis. When informed of admitted
log entries on the receiving end of the stream, we free up tokens by
specifying the highest log position up to which we've admitted
(below-raft admission, for a given priority, takes log position into
account -- see kvflowcontrolpb.AdmittedRaftLogEntries for more details).

We also extend the testing framework introduced in cockroachdb#95905 to also
support writing tests for kvflowcontrol.Handle -- it's now pulled into
its own kvflowsimulator package. We're able to write tests that look
like the following:

    # Set up a triply connected handle (to s1, s2, s3) and start issuing
    # writes at 1MiB/s. For two of the streams, return tokens at exactly
    # the rate its being deducted (1MiB/s). For the third stream (s3),
    # we return flow tokens at only 0.5MiB/s.
    timeline
    t=0s         handle=h op=connect    stream=t1/s1   log-position=1/0
    t=0s         handle=h op=connect    stream=t1/s2   log-position=1/0
    t=0s         handle=h op=connect    stream=t1/s3   log-position=1/0
    t=[0s,50s)   handle=h class=regular adjust=-1MiB/s   rate=10/s
    t=[0.2s,50s) handle=h class=regular adjust=+1MiB/s   rate=10/s stream=t1/s1
    t=[0.2s,50s) handle=h class=regular adjust=+1MiB/s   rate=10/s stream=t1/s2
    t=[0.2s,50s) handle=h class=regular adjust=+0.5MiB/s rate=10/s stream=t1/s3
    ----

    # Observe:
    # - Total available tokens flatlines at 32MiB since flow tokens for
    #   s3 eventually depletes and later bounces off of 0MiB. We
    #   initially have 3*16MiB = 48MiB worth of flow tokens, and end
    #   up at 48MiB-16MiB = 32MiB.
    # - Initially the rate of token deductions (3*1MiB/s = 3MiB/s) is
    #   higher than the token returns (1MiB/s+1MiB/s+0.5MiB/s =
    #   2.5MiB/s), but after we start shaping it to the slowest
    #   stream, they end up matching at (0.5MiB/s*3 = 1.5MiB/s).
    # - The blocked stream count bounces between 0 and 1 as the s3
    #   stream gets blocked/unblocked as tokens are
    #   deducted/returned. The demand for tokens (1MiB/s) is higher
    #   than the replenishment rate (0.5MiB/s).
    # - The overall admission rate is reduced from 30 reqs/s to 25
    #   reqs/s, mapping to token deduction rate of 3MiB/s to 2.5MiB/s
    #   (1MiB/s divvied across 10 reqs). The difference between 30
    #   reqs/s and 25 reqs/s is found in the +5 reqs/s accumulating in
    #   the wait queue.
    plot
    kvadmission.flow_controller.regular_tokens_available            unit=MiB
    kvadmission.flow_controller.regular_tokens_{deducted,returned}  unit=MiB/s rate=true
    kvadmission.flow_controller.regular_blocked_stream_count        unit=streams
    kvadmission.flow_controller.regular_requests_{admitted,waiting} unit=reqs/s rate=true
    ----
    ----
     47.7 ┼╮
     46.6 ┤╰─╮
     45.6 ┤  ╰─╮
     44.5 ┤    ╰╮
     43.5 ┤     ╰─╮
     42.4 ┤       ╰╮
     41.4 ┤        ╰─╮
     40.3 ┤          ╰─╮
     39.3 ┤            ╰╮
     38.2 ┤             ╰─╮
     37.2 ┤               ╰─╮
     36.1 ┤                 ╰╮
     35.1 ┤                  ╰─╮
     34.0 ┤                    ╰─╮
     33.0 ┤                      ╰╮
     31.9 ┤                       ╰───────────────
                regular_tokens_available (MiB)

     3.0 ┤╭───────────────────────╮
     2.8 ┤│                       │
     2.6 ┤╭────────────────────────╮
     2.4 ┤│                       ╰│
     2.2 ┤│                        │
     2.0 ┤│                        │
     1.8 ┤│                        │
     1.6 ┤│                        ╰─────────────
     1.4 ┤│
     1.2 ┤│
     1.0 ┤│
     0.8 ┤│
     0.6 ┤│
     0.4 ┤│
     0.2 ┤│
     0.0 ┼╯
          rate(regular_tokens_{deducted,returned}) (MiB/s)

     1.0 ┤                                 ╭╮   ╭
     0.9 ┤                            ╭╮   ││   │
     0.9 ┤                            ││   ││   │
     0.8 ┤                            ││   ││   │
     0.7 ┤                            ││   ││   │
     0.7 ┤                         ╭╮ ││╭╮ ││   │
     0.6 ┤                         ││ ││││ ││╭─╮│
     0.5 ┤                         │╰╮│││╰╮│││ ││
     0.5 ┤                         │ ││││ ││││ ││
     0.4 ┤                         │ ││││ ││││ ││
     0.3 ┤                         │ ││││ ││││ ││
     0.3 ┤                         │ ╰╯││ ││││ ││
     0.2 ┤                         │   ││ ╰╯╰╯ ╰╯
     0.1 ┤                        ╭╯   ╰╯
     0.1 ┤                        │
     0.0 ┼────────────────────────╯
           regular_blocked_stream_count (streams)

     30.0 ┤╭───────────────────────╮
     28.0 ┤│                       ╰╮
     26.0 ┤│                        ╰─────────────
     24.0 ┤│
     22.0 ┤│
     20.0 ┤│
     18.0 ┤│
     16.0 ┤│
     14.0 ┤│
     12.0 ┤│
     10.0 ┤│
      8.0 ┤│
      6.0 ┤│                        ╭─────────────
      4.0 ┤│                        │
      2.0 ┤│                       ╭╯
      0.0 ┼────────────────────────╯
           rate(regular_requests_{admitted,waiting}) (reqs/s)
    ----
    ----

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Feb 17, 2023
Part of cockroachdb#95563.

kvflowcontrol.Handle is used to interface with replication flow control;
it's typically backed by a node-level kvflowcontrol.Controller. Handles
are held on replicas initiating replication traffic, i.e. are both the
leaseholder and raft leader, and manage multiple streams underneath
(typically one per active member of the raft group).

When replicating log entries, these replicas choose the log position
(term+index) the data is to end up at, and use this handle to track the
token deductions on a per log position basis. When informed of admitted
log entries on the receiving end of the stream, we free up tokens by
specifying the highest log position up to which we've admitted
(below-raft admission, for a given priority, takes log position into
account -- see kvflowcontrolpb.AdmittedRaftLogEntries for more details).

We also extend the testing framework introduced in cockroachdb#95905 to also
support writing tests for kvflowcontrol.Handle -- it's now pulled into
its own kvflowsimulator package. We're able to write tests that look
like the following:

    # Set up a triply connected handle (to s1, s2, s3) and start issuing
    # writes at 1MiB/s. For two of the streams, return tokens at exactly
    # the rate its being deducted (1MiB/s). For the third stream (s3),
    # we return flow tokens at only 0.5MiB/s.
    timeline
    t=0s         handle=h op=connect    stream=t1/s1   log-position=1/0
    t=0s         handle=h op=connect    stream=t1/s2   log-position=1/0
    t=0s         handle=h op=connect    stream=t1/s3   log-position=1/0
    t=[0s,50s)   handle=h class=regular adjust=-1MiB/s   rate=10/s
    t=[0.2s,50s) handle=h class=regular adjust=+1MiB/s   rate=10/s stream=t1/s1
    t=[0.2s,50s) handle=h class=regular adjust=+1MiB/s   rate=10/s stream=t1/s2
    t=[0.2s,50s) handle=h class=regular adjust=+0.5MiB/s rate=10/s stream=t1/s3
    ----

    # Observe:
    # - Total available tokens flatlines at 32MiB since flow tokens for
    #   s3 eventually depletes and later bounces off of 0MiB. We
    #   initially have 3*16MiB = 48MiB worth of flow tokens, and end
    #   up at 48MiB-16MiB = 32MiB.
    # - Initially the rate of token deductions (3*1MiB/s = 3MiB/s) is
    #   higher than the token returns (1MiB/s+1MiB/s+0.5MiB/s =
    #   2.5MiB/s), but after we start shaping it to the slowest
    #   stream, they end up matching at (0.5MiB/s*3 = 1.5MiB/s).
    # - The blocked stream count bounces between 0 and 1 as the s3
    #   stream gets blocked/unblocked as tokens are
    #   deducted/returned. The demand for tokens (1MiB/s) is higher
    #   than the replenishment rate (0.5MiB/s).
    # - The overall admission rate is reduced from 30 reqs/s to 25
    #   reqs/s, mapping to token deduction rate of 3MiB/s to 2.5MiB/s
    #   (1MiB/s divvied across 10 reqs). The difference between 30
    #   reqs/s and 25 reqs/s is found in the +5 reqs/s accumulating in
    #   the wait queue.
    plot
    kvadmission.flow_controller.regular_tokens_available            unit=MiB
    kvadmission.flow_controller.regular_tokens_{deducted,returned}  unit=MiB/s rate=true
    kvadmission.flow_controller.regular_blocked_stream_count        unit=streams
    kvadmission.flow_controller.regular_requests_{admitted,waiting} unit=reqs/s rate=true
    ----
    ----
     47.7 ┼╮
     46.6 ┤╰─╮
     45.6 ┤  ╰─╮
     44.5 ┤    ╰╮
     43.5 ┤     ╰─╮
     42.4 ┤       ╰╮
     41.4 ┤        ╰─╮
     40.3 ┤          ╰─╮
     39.3 ┤            ╰╮
     38.2 ┤             ╰─╮
     37.2 ┤               ╰─╮
     36.1 ┤                 ╰╮
     35.1 ┤                  ╰─╮
     34.0 ┤                    ╰─╮
     33.0 ┤                      ╰╮
     31.9 ┤                       ╰───────────────
                regular_tokens_available (MiB)

     3.0 ┤╭───────────────────────╮
     2.8 ┤│                       │
     2.6 ┤╭────────────────────────╮
     2.4 ┤│                       ╰│
     2.2 ┤│                        │
     2.0 ┤│                        │
     1.8 ┤│                        │
     1.6 ┤│                        ╰─────────────
     1.4 ┤│
     1.2 ┤│
     1.0 ┤│
     0.8 ┤│
     0.6 ┤│
     0.4 ┤│
     0.2 ┤│
     0.0 ┼╯
          rate(regular_tokens_{deducted,returned}) (MiB/s)

     1.0 ┤                                 ╭╮   ╭
     0.9 ┤                            ╭╮   ││   │
     0.9 ┤                            ││   ││   │
     0.8 ┤                            ││   ││   │
     0.7 ┤                            ││   ││   │
     0.7 ┤                         ╭╮ ││╭╮ ││   │
     0.6 ┤                         ││ ││││ ││╭─╮│
     0.5 ┤                         │╰╮│││╰╮│││ ││
     0.5 ┤                         │ ││││ ││││ ││
     0.4 ┤                         │ ││││ ││││ ││
     0.3 ┤                         │ ││││ ││││ ││
     0.3 ┤                         │ ╰╯││ ││││ ││
     0.2 ┤                         │   ││ ╰╯╰╯ ╰╯
     0.1 ┤                        ╭╯   ╰╯
     0.1 ┤                        │
     0.0 ┼────────────────────────╯
           regular_blocked_stream_count (streams)

     30.0 ┤╭───────────────────────╮
     28.0 ┤│                       ╰╮
     26.0 ┤│                        ╰─────────────
     24.0 ┤│
     22.0 ┤│
     20.0 ┤│
     18.0 ┤│
     16.0 ┤│
     14.0 ┤│
     12.0 ┤│
     10.0 ┤│
      8.0 ┤│
      6.0 ┤│                        ╭─────────────
      4.0 ┤│                        │
      2.0 ┤│                       ╭╯
      0.0 ┼────────────────────────╯
           rate(regular_requests_{admitted,waiting}) (reqs/s)
    ----
    ----

Release note: None
craig bot pushed a commit that referenced this pull request Feb 17, 2023
96642: kvflowhandle: implement kvflowcontrol.Handle r=irfansharif a=irfansharif

Part of #95563.

kvflowcontrol.Handle is used to interface with replication flow control;
it's typically backed by a node-level kvflowcontrol.Controller. Handles
are held on replicas initiating replication traffic, i.e. are both the
leaseholder and raft leader, and manage multiple streams underneath
(typically one per active member of the raft group).

When replicating log entries, these replicas choose the log position
(term+index) the data is to end up at, and use this handle to track the
token deductions on a per log position basis. When informed of admitted
log entries on the receiving end of the stream, we free up tokens by
specifying the highest log position up to which we've admitted
(below-raft admission, for a given priority, takes log position into
account -- see kvflowcontrolpb.AdmittedRaftLogEntries for more details).

We also extend the testing framework introduced in #95905 to also
support writing tests for kvflowcontrol.Handle -- it's now pulled into
its own kvflowsimulator package. We're able to write tests that look
like the following:

    # Set up a triply connected handle (to s1, s2, s3) and start issuing
    # writes at 1MiB/s. For two of the streams, return tokens at exactly
    # the rate its being deducted (1MiB/s). For the third stream (s3),
    # we return flow tokens at only 0.5MiB/s.
    timeline
    t=0s         handle=h op=connect    stream=t1/s1   log-position=1/0
    t=0s         handle=h op=connect    stream=t1/s2   log-position=1/0
    t=0s         handle=h op=connect    stream=t1/s3   log-position=1/0
    t=[0s,50s)   handle=h class=regular adjust=-1MiB/s   rate=10/s
    t=[0.2s,50s) handle=h class=regular adjust=+1MiB/s   rate=10/s stream=t1/s1
    t=[0.2s,50s) handle=h class=regular adjust=+1MiB/s   rate=10/s stream=t1/s2
    t=[0.2s,50s) handle=h class=regular adjust=+0.5MiB/s rate=10/s stream=t1/s3
    ----

    # Observe:
    # - Total available tokens flatlines at 32MiB since flow tokens for
    #   s3 eventually depletes and later bounces off of 0MiB. We
    #   initially have 3*16MiB = 48MiB worth of flow tokens, and end
    #   up at 48MiB-16MiB = 32MiB.
    # - Initially the rate of token deductions (3*1MiB/s = 3MiB/s) is
    #   higher than the token returns (1MiB/s+1MiB/s+0.5MiB/s =
    #   2.5MiB/s), but after we start shaping it to the slowest
    #   stream, they end up matching at (0.5MiB/s*3 = 1.5MiB/s).
    # - The blocked stream count bounces between 0 and 1 as the s3
    #   stream gets blocked/unblocked as tokens are
    #   deducted/returned. The demand for tokens (1MiB/s) is higher
    #   than the replenishment rate (0.5MiB/s).
    # - The overall admission rate is reduced from 30 reqs/s to 25
    #   reqs/s, mapping to token deduction rate of 3MiB/s to 2.5MiB/s
    #   (1MiB/s divvied across 10 reqs). The difference between 30
    #   reqs/s and 25 reqs/s is found in the +5 reqs/s accumulating in
    #   the wait queue.
    plot
    kvadmission.flow_controller.regular_tokens_available            unit=MiB
    kvadmission.flow_controller.regular_tokens_{deducted,returned}  unit=MiB/s rate=true
    kvadmission.flow_controller.regular_blocked_stream_count        unit=streams
    kvadmission.flow_controller.regular_requests_{admitted,waiting} unit=reqs/s rate=true
    ----
    ----
     47.7 ┼╮
     46.6 ┤╰─╮
     45.6 ┤  ╰─╮
     44.5 ┤    ╰╮
     43.5 ┤     ╰─╮
     42.4 ┤       ╰╮
     41.4 ┤        ╰─╮
     40.3 ┤          ╰─╮
     39.3 ┤            ╰╮
     38.2 ┤             ╰─╮
     37.2 ┤               ╰─╮
     36.1 ┤                 ╰╮
     35.1 ┤                  ╰─╮
     34.0 ┤                    ╰─╮
     33.0 ┤                      ╰╮
     31.9 ┤                       ╰───────────────
                regular_tokens_available (MiB)

     3.0 ┤╭───────────────────────╮
     2.8 ┤│                       │
     2.6 ┤╭────────────────────────╮
     2.4 ┤│                       ╰│
     2.2 ┤│                        │
     2.0 ┤│                        │
     1.8 ┤│                        │
     1.6 ┤│                        ╰─────────────
     1.4 ┤│
     1.2 ┤│
     1.0 ┤│
     0.8 ┤│
     0.6 ┤│
     0.4 ┤│
     0.2 ┤│
     0.0 ┼╯
          rate(regular_tokens_{deducted,returned}) (MiB/s)

     1.0 ┤                                 ╭╮   ╭
     0.9 ┤                            ╭╮   ││   │
     0.9 ┤                            ││   ││   │
     0.8 ┤                            ││   ││   │
     0.7 ┤                            ││   ││   │
     0.7 ┤                         ╭╮ ││╭╮ ││   │
     0.6 ┤                         ││ ││││ ││╭─╮│
     0.5 ┤                         │╰╮│││╰╮│││ ││
     0.5 ┤                         │ ││││ ││││ ││
     0.4 ┤                         │ ││││ ││││ ││
     0.3 ┤                         │ ││││ ││││ ││
     0.3 ┤                         │ ╰╯││ ││││ ││
     0.2 ┤                         │   ││ ╰╯╰╯ ╰╯
     0.1 ┤                        ╭╯   ╰╯
     0.1 ┤                        │
     0.0 ┼────────────────────────╯
           regular_blocked_stream_count (streams)

     30.0 ┤╭───────────────────────╮
     28.0 ┤│                       ╰╮
     26.0 ┤│                        ╰─────────────
     24.0 ┤│
     22.0 ┤│
     20.0 ┤│
     18.0 ┤│
     16.0 ┤│
     14.0 ┤│
     12.0 ┤│
     10.0 ┤│
      8.0 ┤│
      6.0 ┤│                        ╭─────────────
      4.0 ┤│                        │
      2.0 ┤│                       ╭╯
      0.0 ┼────────────────────────╯
           rate(regular_requests_{admitted,waiting}) (reqs/s)
    ----
    ----

Release note: None

97256: ui: tenant-gate transaction insights, change selection order of workload insights r=ericharmeling a=ericharmeling

Related to #96353, #92936.

This commit hides the dropdown to switch workload insights from statement insights to transaction insights on tenants. When #96353 is resolved, we can re-enable transaction insights for tenant clusters.

The commit also switches the order of workload insights, to show statement insights first (see #92936).

`@gemma-shay` We should add a note on the [Insights docs](https://www.cockroachlabs.com/docs/stable/ui-insights-page.html) and/or the serverless docs that Transaction Insights aren't available on serverless.

Release note: None

Epic: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Eric Harmeling <eric.harmeling@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants