-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: go executor go! #1356
WIP: go executor go! #1356
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did this improve performance in your testing?
It did. Got a chance to try it out? |
not yet, will do (unless someone beats me to it, in which case just ack&merge) |
} | ||
}() | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we won't retry messages that fail to execute, because we returned a nil
error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this (calling HandlerFunc
concurrently multiple times) is represented in the Watermill Pub/Sub implementations as "Consumer Groups" (after the Kafka feature of the same name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, a consumer group delivers a message to one member of the group. The behavior without groups is to deliver to all subscribers. the GoChannel implementation we're using does not support groups, and so, all subscribers would receive the message. This is not an issue right now since we only have one subscriber. But it does mean that we have to wait for for the subscriber to be available in order to process the next event.
Instead, I took the approach of only doing message processing (unmarshaling) in the handler, and taking the heavy operation outside of it. Which also seemed to me like the right thing to do. In an HTTP server I would have added a status endpoint to represent the state of the long running operation. Here, we already had such a thing, we were just taking over the message handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By moving the execution to a different goroutine, we've also removed Watermill's ability to recognize that a message delivery has failed and retry it. Given our current infrastructure, this is probably the right choice, but I wanted to call out that the error we were delivering earlier is now being logged but not retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @jhrozek about the retrying part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that the retry logic even works correctly to be honest. We had a bug not that long ago (fixed in 3406936) where we'd retry all messages. Now we'd only retry those that reply a special error code but even then would just loop forever I think and not give up after the configurable 3 tries because after the 3 tries the retry middleware does, the message is just nacked and then you go to: https://github.com/ThreeDotsLabs/watermill/blob/d20a6051456863fa147b9f9034b66500a6f0ffdf/pubsub/gochannel/pubsub.go#L377 - so you'd just send the message again. I didn't see where/how we'd tell watermill to just give up nacked messages.
} | ||
}() | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By moving the execution to a different goroutine, we've also removed Watermill's ability to recognize that a message delivery has failed and retry it. Given our current infrastructure, this is probably the right choice, but I wanted to call out that the error we were delivering earlier is now being logged but not retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JAORMX - let me give it a try too before merging
79e74c8
to
3724cbe
Compare
Had to rebase. |
@rdimitrov and I checked this out and we're having issues with parallelism when needing to alert on a single entity at the same time. It seems to me that #1292 is needed first before enabling this. |
3724cbe
to
5c113b4
Compare
This limits the executor event handling code to just parse the message. It subsequently spawns a goroutine to do the actual evaluation. The intent is to not block the message handler when we're receiving events so we'd have faster executions. Add mechanism for executor to indicate when it's done. Add graceful termination to executor This makes sure that the executor cancels any profile runs based on a per execution timeout or the server itself shutting down.
5c113b4
to
1e4b3cf
Compare
superceded by #1654 |
This limits the executor event handling code to just parse the message. It
subsequently spawns a goroutine to do the actual evaluation.
The intent is to not block the message handler when we're receiving events
so we'd have faster executions.