Add support for concurrent dispatch and configurable prefetch #418

benmoss · 2021-09-15T18:12:24Z

Changes

🎁 Add ability to configure the dispatcher's prefetch
🎁 Dispatcher sends events concurrently

/kind enhancement

Fixes #329

Release Note

:page_facing_up: A trigger will now send events to its subscriber in parallel, with the number of in-flight messages defaulting to 10 and configurable via the annotation`rabbitmq.eventing.knative.dev/prefetchCount`.

/assign @embano1 @gabo1208

Users can now set the annotationrabbitmq.eventing.knative.dev/prefetchCount to control the concurrency of the dispatcher and the number of in-flight events allowed. This has the implication that event order is no longer guaranteed with prefetch counts > 1. The default prefetch if not specified is 10, this is totally arbitrary and I'm open to opinions on a more sensible default if anyone has any 😄 .

knative-prow-robot · 2021-09-15T18:12:26Z

@benmoss: GitHub didn't allow me to assign the following users: embano1.

Note that only knative-sandbox members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

Changes

🎁 Add ability to configure the dispatcher's prefetch

🎁 Dispatcher sends events concurrently

/kind enhancement

Fixes #329

Release Note
:page_facing_up: A trigger will now send events to its subscriber in parallel, with the number of in-flight messages defaulting to 10 and configurable via the annotation`rabbitmq.eventing.knative.dev/prefetchCount`.
/assign @embano1 @gabo1208

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow-robot · 2021-09-15T18:12:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benmoss

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [benmoss]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

benmoss · 2021-09-16T13:46:23Z

/hold

codecov · 2021-09-17T14:28:27Z

Codecov Report

Merging #418 (753c41c) into main (c9535fa) will decrease coverage by 0.00%.
The diff coverage is 87.96%.

@@            Coverage Diff             @@
##             main     #418      +/-   ##
==========================================
- Coverage   75.54%   75.54%   -0.01%     
==========================================
  Files          44       44              
  Lines        2781     2785       +4     
==========================================
+ Hits         2101     2104       +3     
- Misses        548      549       +1     
  Partials      132      132

Impacted Files	Coverage Δ
test/e2e/recorder.go	`0.00% <0.00%> (ø)`
pkg/dispatcher/dispatcher.go	`79.76% <78.00%> (-2.17%)`	⬇️
pkg/reconciler/trigger/resources/dispatcher.go	`94.31% <98.24%> (+0.27%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9535fa...753c41c. Read the comment docs.

embano1

Questions:

Regarding this comment: https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L144 -> IMHO the SDK won't retry at all if set to 0.
https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L166 < this is also available as env, so use this instead?
https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L168 < use retries defined in env?
Should we define a custom consumer name (ie dispatcher) to simplify troubleshooting (especially when adding auto-scaling later)? https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L68
Do you also want to tackle Be smarter about retries #409 in this PR?

embano1 · 2021-09-19T18:50:50Z

cmd/dispatcher/main.go

 	Retry         int           `envconfig:"RETRY" required:"false"`
 	BackoffPolicy string        `envconfig:"BACKOFF_POLICY" required:"false"`
 	BackoffDelay  time.Duration `envconfig:"BACKOFF_DELAY" required:"false"`
 }

 const (
 	defaultBackoffDelay = 50 * time.Millisecond
-	defaultPrefetch     = 1
 	defaultPrefetchSize = 0


I think it's safe to drop this since we force no-ack

"The prefetch-size is ignored if the no-ack option is set" https://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.qos.prefetch-size

Where do you see that we force no-ack? https://www.rabbitmq.com/amqp-0-9-1-reference.html#domain.no-ack

That would imply we don't use acknowledgements, which is not true

This is how I'm reading go-amqp and the Rabbit docs:

When autoAck (also known as noAck) is true, the server will acknowledge
deliveries to this consumer prior to writing the delivery to the network. When
autoAck is true, the consumer should not call Delivery.Ack.

https://github.com/knative-sandbox/eventing-rabbitmq/blob/c944a0c5a3bea768ce200c6812d44072b4bb7001/vendor/github.com/streadway/amqp/channel.go#L1017

The prefetch-size is ignored if the no-ack option is set.

https://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.qos.prefetch-size

We set autoAck=false: https://github.com/knative-sandbox/eventing-rabbitmq/blob/c944a0c5a3bea768ce200c6812d44072b4bb7001/pkg/dispatcher/dispatcher.go#L70

Oh, I think this is bad documentation on rabbit's part. I think they mean

The prefetch-size is ignored if the no-ack option is set to true.

Since we set it to false, we are not using autoAck/noAck, and therefore prefetch size/prefetch count is still relevant

@gerhard can you back me up on this interpretation?

pkg/dispatcher/dispatcher.go

gerhard

A prefetch of 10 is a decent default, and better than 1 if consumer message ordering is not needed. This portion of the RabbitMQ docs explain it well: Consumer Acknowledgement Modes, Prefetch and Throughput. While a value of 100 or more can result in higher throughput, this tends to be workload specific (e.g. message payload, number of consumers etc.) and it can result in the broker using more memory than necessary. Messages that have been delivered and are waiting for ACKs will be kept in memory not just on the client side, but also on the broker side. So yes, 10 is a decent prefetch default.

gerhard · 2021-09-21T11:51:20Z

cmd/dispatcher/main.go

@@ -41,14 +41,15 @@ type envConfig struct {
 	// Should failed deliveries be requeued in the RabbitMQ?
 	Requeue bool `envconfig:"REQUEUE" default:"false" required:"true"`

+	// Number of concurrent messages in flight
+	PrefetchCount int           `envconfig:"PREFETCH_COUNT" default:"10" required:"false"`
 	Retry         int           `envconfig:"RETRY" required:"false"`
 	BackoffPolicy string        `envconfig:"BACKOFF_POLICY" required:"false"`
 	BackoffDelay  time.Duration `envconfig:"BACKOFF_DELAY" required:"false"`
 }

 const (
 	defaultBackoffDelay = 50 * time.Millisecond


I would set this as a BackoffDelay default.

Not sure what you mean. The prefetch count in milliseconds should be the backoff delay?

cmd/dispatcher/main.go

benmoss · 2021-09-21T19:55:39Z

Questions:

https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L166
< this is also available as env, so use this instead?

It looks like this used to be this way, and then @vaikas changed it in https://github.com/knative-sandbox/eventing-rabbitmq/pull/276/files#diff-7493504960c3b7e5e2f7c2d0a09116c08af37ebcd3798dced0c4478781443c3aR114-R121

Not sure I understand the comment here, "cloudevents is the max total tries".

https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L168
< use retries defined in env?

Same

Should we define a custom consumer name (ie dispatcher) to simplify troubleshooting (especially when adding auto-scaling later)? https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L68

This gets autogenerated right now,
https://github.com/knative-sandbox/eventing-rabbitmq/blob/concurrency/vendor/github.com/streadway/amqp/consumers.go#L19-L21

On my deployment this comes out to ctag-/ko-app/dispatcher-1, seems okay to me.

Do you also want to tackle Be smarter about retries #409 in this PR?
Let's keep this a separate issue

benmoss · 2021-09-21T19:58:31Z

/hold cancel

pkg/dispatcher/dispatcher.go

embano1

Not sure I understand the comment here

Sorry for the confusion, I left my comments inline. Looks good overall to me!

benmoss · 2021-09-24T16:59:57Z

/assign @salaboy

pkg/reconciler/trigger/resources/dispatcher_test.go

This lets us define different topologies and is a little more explicit in the test what's being setup

Set a default of 10 and dispatch in parallel. Had to change the unit tests to not care about ordering now that messages are being handled asynchronously.

Easier to just construct with named fields rather than a positional-argument constructor

This guarantees we don't exceed the concurrency factor set by the prefetch count. This also allows us to wait for the workers to shut down gracefully if we are cancelled.

salaboy · 2021-09-29T07:33:58Z

@benmoss after reviewing this PR with my morning coffee at hand I can see that it is doing what we discussed yesterday.
This PR with a page in the documentation showing how to use the label and the topology created on the rabbitMQ side will be a great addition.

salaboy · 2021-09-29T08:00:42Z

/lgtm

gvmw · 2021-09-30T11:15:02Z

💯

JamesMcMahon · 2021-11-11T15:34:49Z

Is there documentation around how to use this feature?

For instance, is this correct?

apiVersion: sources.knative.dev/v1alpha1
kind: RabbitmqSource
metadata:
  name: rabbitmq-source
  annotations:
    rabbitmq.eventing.knative.dev/prefetchCount: "10000"
# ...
# additional config below

embano1 · 2021-11-11T16:23:16Z

@JamesMcMahon IMHO the annotation is only respected on a trigger resource

knative-prow-robot assigned gabo1208 Sep 15, 2021

knative-prow-robot added the kind/enhancement label Sep 15, 2021

knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 15, 2021

google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Sep 15, 2021

knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 16, 2021

benmoss force-pushed the concurrency branch from 40e71a2 to c70eab3 Compare September 17, 2021 14:22

embano1 reviewed Sep 19, 2021

View reviewed changes

embano1 mentioned this pull request Sep 19, 2021

Document a Trigger's Behavior for FIFO, High Throughput modes #424

Closed

benmoss force-pushed the concurrency branch from c70eab3 to 6976202 Compare September 20, 2021 17:38

gerhard reviewed Sep 21, 2021

View reviewed changes

knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 21, 2021

embano1 reviewed Sep 23, 2021

View reviewed changes

pkg/dispatcher/dispatcher.go Outdated Show resolved Hide resolved

pkg/dispatcher/dispatcher.go Outdated Show resolved Hide resolved

pkg/dispatcher/dispatcher.go Outdated Show resolved Hide resolved

embano1 reviewed Sep 23, 2021

View reviewed changes

knative-prow-robot assigned salaboy Sep 24, 2021

gabo1208 reviewed Sep 27, 2021

View reviewed changes

pkg/reconciler/trigger/resources/dispatcher_test.go Outdated Show resolved Hide resolved

Ben Moss added 7 commits September 28, 2021 08:55

Reduce duplication in dispatcher test

8da53db

Add dispatcher prefetch annotation

710985a

Extract brokertrigger test package

003c692

This lets us define different topologies and is a little more explicit in the test what's being setup

Dispatcher allows configuring prefetch, dispatches concurrently

2541dec

Set a default of 10 and dispatch in parallel. Had to change the unit tests to not care about ordering now that messages are being handled asynchronously.

Promote dispatcher fields to public

f4b42cd

Easier to just construct with named fields rather than a positional-argument constructor

Use a worker pool for the dispatcher

a919722

This guarantees we don't exceed the concurrency factor set by the prefetch count. This also allows us to wait for the workers to shut down gracefully if we are cancelled.

Inline prefetch size

021f11e

Use the same retry behavior when retrying to the broker

753c41c

benmoss force-pushed the concurrency branch from 79e1c9e to 753c41c Compare September 28, 2021 12:58

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 29, 2021

knative-prow-robot merged commit 195ca11 into knative-extensions:main Sep 29, 2021

embano1 mentioned this pull request Sep 30, 2021

Parallelism for CloudEvent processing in broker vmware-samples/vcenter-event-broker-appliance#631

Closed

benmoss deleted the concurrency branch November 11, 2021 21:37

JamesMcMahon mentioned this pull request Nov 19, 2021

Event source should support concurrent dispatch #508

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for concurrent dispatch and configurable prefetch #418

Add support for concurrent dispatch and configurable prefetch #418

benmoss commented Sep 15, 2021 •

edited

Loading

knative-prow-robot commented Sep 15, 2021

Changes

knative-prow-robot commented Sep 15, 2021

benmoss commented Sep 16, 2021

codecov bot commented Sep 17, 2021 •

edited

Loading

embano1 left a comment

embano1 Sep 19, 2021

benmoss Sep 21, 2021

embano1 Sep 23, 2021

benmoss Sep 23, 2021

benmoss Sep 23, 2021

gerhard left a comment

gerhard Sep 21, 2021

benmoss Sep 21, 2021

gvmw Sep 30, 2021

benmoss commented Sep 21, 2021

benmoss commented Sep 21, 2021

embano1 left a comment •

edited

Loading

benmoss commented Sep 24, 2021

salaboy commented Sep 29, 2021

salaboy commented Sep 29, 2021

gvmw commented Sep 30, 2021

JamesMcMahon commented Nov 11, 2021

embano1 commented Nov 11, 2021

Add support for concurrent dispatch and configurable prefetch #418

Add support for concurrent dispatch and configurable prefetch #418

Conversation

benmoss commented Sep 15, 2021 • edited Loading

Changes

knative-prow-robot commented Sep 15, 2021

Changes

knative-prow-robot commented Sep 15, 2021

benmoss commented Sep 16, 2021

codecov bot commented Sep 17, 2021 • edited Loading

Codecov Report

embano1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerhard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benmoss commented Sep 21, 2021

benmoss commented Sep 21, 2021

embano1 left a comment • edited Loading

Choose a reason for hiding this comment

benmoss commented Sep 24, 2021

salaboy commented Sep 29, 2021

salaboy commented Sep 29, 2021

gvmw commented Sep 30, 2021

JamesMcMahon commented Nov 11, 2021

embano1 commented Nov 11, 2021

benmoss commented Sep 15, 2021 •

edited

Loading

codecov bot commented Sep 17, 2021 •

edited

Loading

embano1 left a comment •

edited

Loading