Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for concurrent dispatch and configurable prefetch #418

Merged
merged 8 commits into from
Sep 29, 2021

Conversation

benmoss
Copy link
Contributor

@benmoss benmoss commented Sep 15, 2021

Changes

  • 🎁 Add ability to configure the dispatcher's prefetch
  • 🎁 Dispatcher sends events concurrently

/kind enhancement

Fixes #329

Release Note

:page_facing_up: A trigger will now send events to its subscriber in parallel, with the number of in-flight messages defaulting to 10 and configurable via the annotation`rabbitmq.eventing.knative.dev/prefetchCount`.

/assign @embano1 @gabo1208

Users can now set the annotationrabbitmq.eventing.knative.dev/prefetchCount to control the concurrency of the dispatcher and the number of in-flight events allowed. This has the implication that event order is no longer guaranteed with prefetch counts > 1. The default prefetch if not specified is 10, this is totally arbitrary and I'm open to opinions on a more sensible default if anyone has any 😄 .

@knative-prow-robot
Copy link

@benmoss: GitHub didn't allow me to assign the following users: embano1.

Note that only knative-sandbox members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

Changes

  • 🎁 Add ability to configure the dispatcher's prefetch
  • 🎁 Dispatcher sends events concurrently

/kind enhancement

Fixes #329

Release Note

:page_facing_up: A trigger will now send events to its subscriber in parallel, with the number of in-flight messages defaulting to 10 and configurable via the annotation`rabbitmq.eventing.knative.dev/prefetchCount`.

/assign @embano1 @gabo1208

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benmoss

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 15, 2021
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Sep 15, 2021
@benmoss
Copy link
Contributor Author

benmoss commented Sep 16, 2021

/hold

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 16, 2021
@codecov
Copy link

codecov bot commented Sep 17, 2021

Codecov Report

Merging #418 (753c41c) into main (c9535fa) will decrease coverage by 0.00%.
The diff coverage is 87.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #418      +/-   ##
==========================================
- Coverage   75.54%   75.54%   -0.01%     
==========================================
  Files          44       44              
  Lines        2781     2785       +4     
==========================================
+ Hits         2101     2104       +3     
- Misses        548      549       +1     
  Partials      132      132              
Impacted Files Coverage Δ
test/e2e/recorder.go 0.00% <0.00%> (ø)
pkg/dispatcher/dispatcher.go 79.76% <78.00%> (-2.17%) ⬇️
pkg/reconciler/trigger/resources/dispatcher.go 94.31% <98.24%> (+0.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9535fa...753c41c. Read the comment docs.

Copy link
Contributor

@embano1 embano1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry int `envconfig:"RETRY" required:"false"`
BackoffPolicy string `envconfig:"BACKOFF_POLICY" required:"false"`
BackoffDelay time.Duration `envconfig:"BACKOFF_DELAY" required:"false"`
}

const (
defaultBackoffDelay = 50 * time.Millisecond
defaultPrefetch = 1
defaultPrefetchSize = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to drop this since we force no-ack

"The prefetch-size is ignored if the no-ack option is set" https://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.qos.prefetch-size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you see that we force no-ack? https://www.rabbitmq.com/amqp-0-9-1-reference.html#domain.no-ack

That would imply we don't use acknowledgements, which is not true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I'm reading go-amqp and the Rabbit docs:

When autoAck (also known as noAck) is true, the server will acknowledge
deliveries to this consumer prior to writing the delivery to the network. When
autoAck is true, the consumer should not call Delivery.Ack.

https://github.com/knative-sandbox/eventing-rabbitmq/blob/c944a0c5a3bea768ce200c6812d44072b4bb7001/vendor/github.com/streadway/amqp/channel.go#L1017

The prefetch-size is ignored if the no-ack option is set.

https://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.qos.prefetch-size

We set autoAck=false: https://github.com/knative-sandbox/eventing-rabbitmq/blob/c944a0c5a3bea768ce200c6812d44072b4bb7001/pkg/dispatcher/dispatcher.go#L70

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think this is bad documentation on rabbit's part. I think they mean

The prefetch-size is ignored if the no-ack option is set to true.

Since we set it to false, we are not using autoAck/noAck, and therefore prefetch size/prefetch count is still relevant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerhard can you back me up on this interpretation?

pkg/dispatcher/dispatcher.go Outdated Show resolved Hide resolved
Copy link

@gerhard gerhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A prefetch of 10 is a decent default, and better than 1 if consumer message ordering is not needed. This portion of the RabbitMQ docs explain it well: Consumer Acknowledgement Modes, Prefetch and Throughput. While a value of 100 or more can result in higher throughput, this tends to be workload specific (e.g. message payload, number of consumers etc.) and it can result in the broker using more memory than necessary. Messages that have been delivered and are waiting for ACKs will be kept in memory not just on the client side, but also on the broker side. So yes, 10 is a decent prefetch default.

@@ -41,14 +41,15 @@ type envConfig struct {
// Should failed deliveries be requeued in the RabbitMQ?
Requeue bool `envconfig:"REQUEUE" default:"false" required:"true"`

// Number of concurrent messages in flight
PrefetchCount int `envconfig:"PREFETCH_COUNT" default:"10" required:"false"`
Retry int `envconfig:"RETRY" required:"false"`
BackoffPolicy string `envconfig:"BACKOFF_POLICY" required:"false"`
BackoffDelay time.Duration `envconfig:"BACKOFF_DELAY" required:"false"`
}

const (
defaultBackoffDelay = 50 * time.Millisecond
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would set this as a BackoffDelay default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. The prefetch count in milliseconds should be the backoff delay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd/dispatcher/main.go Outdated Show resolved Hide resolved
@benmoss
Copy link
Contributor Author

benmoss commented Sep 21, 2021

Questions:

  1. https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L166
    < this is also available as env, so use this instead?

It looks like this used to be this way, and then @vaikas changed it in https://github.com/knative-sandbox/eventing-rabbitmq/pull/276/files#diff-7493504960c3b7e5e2f7c2d0a09116c08af37ebcd3798dced0c4478781443c3aR114-R121

Not sure I understand the comment here, "cloudevents is the max total tries".

  1. https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L168
    < use retries defined in env?

Same

  1. Should we define a custom consumer name (ie dispatcher) to simplify troubleshooting (especially when adding auto-scaling later)? https://github.com/knative-sandbox/eventing-rabbitmq/blob/c70eab331eeb664526a2b54c6e630f33c7938789/pkg/dispatcher/dispatcher.go#L68

This gets autogenerated right now,
https://github.com/knative-sandbox/eventing-rabbitmq/blob/concurrency/vendor/github.com/streadway/amqp/consumers.go#L19-L21

On my deployment this comes out to ctag-/ko-app/dispatcher-1, seems okay to me.

  1. Do you also want to tackle Be smarter about retries #409 in this PR?
    Let's keep this a separate issue

@benmoss
Copy link
Contributor Author

benmoss commented Sep 21, 2021

/hold cancel

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 21, 2021
pkg/dispatcher/dispatcher.go Outdated Show resolved Hide resolved
pkg/dispatcher/dispatcher.go Outdated Show resolved Hide resolved
pkg/dispatcher/dispatcher.go Outdated Show resolved Hide resolved
Copy link
Contributor

@embano1 embano1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the comment here

Sorry for the confusion, I left my comments inline. Looks good overall to me!

@benmoss
Copy link
Contributor Author

benmoss commented Sep 24, 2021

/assign @salaboy

Ben Moss added 7 commits September 28, 2021 08:55
This lets us define different topologies and is a little more explicit
in the test what's being setup
Set a default of 10 and dispatch in parallel.

Had to change the unit tests to not care about ordering now that
messages are being handled asynchronously.
Easier to just construct with named fields rather than a
positional-argument constructor
This guarantees we don't exceed the concurrency factor set by the
prefetch count. This also allows us to wait for the workers to shut down
gracefully if we are cancelled.
@salaboy
Copy link

salaboy commented Sep 29, 2021

@benmoss after reviewing this PR with my morning coffee at hand I can see that it is doing what we discussed yesterday.
This PR with a page in the documentation showing how to use the label and the topology created on the rabbitMQ side will be a great addition.

@salaboy
Copy link

salaboy commented Sep 29, 2021

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 29, 2021
@knative-prow-robot knative-prow-robot merged commit 195ca11 into knative-extensions:main Sep 29, 2021
@gvmw
Copy link
Contributor

gvmw commented Sep 30, 2021

💯

@JamesMcMahon
Copy link

Is there documentation around how to use this feature?

For instance, is this correct?

apiVersion: sources.knative.dev/v1alpha1
kind: RabbitmqSource
metadata:
  name: rabbitmq-source
  annotations:
    rabbitmq.eventing.knative.dev/prefetchCount: "10000"
# ...
# additional config below

@embano1
Copy link
Contributor

embano1 commented Nov 11, 2021

@JamesMcMahon IMHO the annotation is only respected on a trigger resource

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. kind/enhancement lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] allow set prefetch on dispatcher
8 participants