Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: go executor go! #1356

Closed
wants to merge 1 commit into from
Closed

WIP: go executor go! #1356

wants to merge 1 commit into from

Conversation

JAORMX
Copy link
Contributor

@JAORMX JAORMX commented Oct 31, 2023

This limits the executor event handling code to just parse the message. It
subsequently spawns a goroutine to do the actual evaluation.

The intent is to not block the message handler when we're receiving events
so we'd have faster executions.

@JAORMX JAORMX requested review from jhrozek and rdimitrov October 31, 2023 14:24
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this improve performance in your testing?

internal/engine/executor.go Outdated Show resolved Hide resolved
internal/engine/executor.go Show resolved Hide resolved
@JAORMX
Copy link
Contributor Author

JAORMX commented Oct 31, 2023

did this improve performance in your testing?

It did. Got a chance to try it out?

@jhrozek
Copy link
Contributor

jhrozek commented Oct 31, 2023

did this improve performance in your testing?

It did. Got a chance to try it out?

not yet, will do (unless someone beats me to it, in which case just ack&merge)

}
}()

return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we won't retry messages that fail to execute, because we returned a nil error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this (calling HandlerFunc concurrently multiple times) is represented in the Watermill Pub/Sub implementations as "Consumer Groups" (after the Kafka feature of the same name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, a consumer group delivers a message to one member of the group. The behavior without groups is to deliver to all subscribers. the GoChannel implementation we're using does not support groups, and so, all subscribers would receive the message. This is not an issue right now since we only have one subscriber. But it does mean that we have to wait for for the subscriber to be available in order to process the next event.

Instead, I took the approach of only doing message processing (unmarshaling) in the handler, and taking the heavy operation outside of it. Which also seemed to me like the right thing to do. In an HTTP server I would have added a status endpoint to represent the state of the long running operation. Here, we already had such a thing, we were just taking over the message handler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By moving the execution to a different goroutine, we've also removed Watermill's ability to recognize that a message delivery has failed and retry it. Given our current infrastructure, this is probably the right choice, but I wanted to call out that the error we were delivering earlier is now being logged but not retried.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @jhrozek about the retrying part

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that the retry logic even works correctly to be honest. We had a bug not that long ago (fixed in 3406936) where we'd retry all messages. Now we'd only retry those that reply a special error code but even then would just loop forever I think and not give up after the configurable 3 tries because after the 3 tries the retry middleware does, the message is just nacked and then you go to: https://github.com/ThreeDotsLabs/watermill/blob/d20a6051456863fa147b9f9034b66500a6f0ffdf/pubsub/gochannel/pubsub.go#L377 - so you'd just send the message again. I didn't see where/how we'd tell watermill to just give up nacked messages.

internal/engine/executor.go Show resolved Hide resolved
evankanderson
evankanderson previously approved these changes Nov 2, 2023
}
}()

return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By moving the execution to a different goroutine, we've also removed Watermill's ability to recognize that a message delivery has failed and retry it. Given our current infrastructure, this is probably the right choice, but I wanted to call out that the error we were delivering earlier is now being logged but not retried.

Copy link
Member

@rdimitrov rdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JAORMX - let me give it a try too before merging

@JAORMX
Copy link
Contributor Author

JAORMX commented Nov 2, 2023

Had to rebase.

@JAORMX JAORMX changed the title go executor go! WIP: go executor go! Nov 2, 2023
@JAORMX
Copy link
Contributor Author

JAORMX commented Nov 2, 2023

@rdimitrov and I checked this out and we're having issues with parallelism when needing to alert on a single entity at the same time. It seems to me that #1292 is needed first before enabling this.

This limits the executor event handling code to just parse the message. It
subsequently spawns a goroutine to do the actual evaluation.

The intent is to not block the message handler when we're receiving events
so we'd have faster executions.

Add mechanism for executor to indicate when it's done.

Add graceful termination to executor

This makes sure that the executor cancels any profile runs based on
a per execution timeout or the server itself shutting down.
@JAORMX
Copy link
Contributor Author

JAORMX commented Nov 14, 2023

superceded by #1654

@JAORMX JAORMX closed this Nov 14, 2023
@JAORMX JAORMX deleted the go-executor-go branch April 12, 2024 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants