-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove RX as the primary pipe for gRPC communications #8281
Conversation
remove RX as the primary pipe for gRPC communications
1: GrpcWorkerChannelTests span up duplicate channels 2: suppress write failures (async void)
… TestFunctionRpcService and GrpcWorkerChannelTests to match
@mgravell apologies for the delay on this. We'll get eyes on this ASAP. Would it be possible to have this rebased to deal with the conflicts? Thanks! |
Will do
…On Fri, 29 Apr 2022, 22:21 Fabio Cavalcante, ***@***.***> wrote:
@mgravell <https://github.com/mgravell> apologies for the delay on this.
We'll get eyes on this ASAP. Would it be possible to have this rebased to
deal with the conflicts?
Thanks!
—
Reply to this email directly, view it on GitHub
<#8281 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEHMENKREBPYEY25NFYVTVHRHEHANCNFSM5SPTV72Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Tagging @safihamid @paulbatum - fyi @fabiocav / @brettsam - would be good to get test E2E on private stamp before merging this PR |
# Conflicts: # src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs # test/WebJobs.Script.Tests/Workers/Rpc/GrpcWorkerChannelTests.cs # test/WebJobs.Script.Tests/Workers/Rpc/TestFunctionRpcService.cs
@fabiocav I believe that's merged; it looks like some cheese moved around the topic of "function load collections", which made the tests merge a bit messy, but: should work |
// advertise that we support multiple streams, and hint at a number; with this flag, we allow | ||
// clients to connect multiple back-hauls *with the same workerid*, and rely on the internal | ||
// plumbing to make sure we don't process everything N times | ||
initRequest.Capabilities.Add(RpcWorkerConstants.MultiStream, "10"); // TODO: make this configurable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the number ("10") a readonly constant for now?
Is the TODO for this PR or future work? If future work, can we create a github issue for this and mention it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have very general questions here. I think we should get this run on a private stamp to make sure we have no cold start regressions.
I also want to take one more pass through to think through things like restarts and failures -- this is tricky!
@mgravell anything we can do to help you get this PR merged? |
@mgravell indeed, this is high priority on our side, so whatever we can do to help, please let us know. |
I looked at this earlier on a call - it looks like the main pending item is to test regressions in cold start and not code tidy here as the blocker, or am I missing a thread where that's already in play? Happy to help on the code side. |
I think it's a mix of both, would be great to get PR feedback addressed but not a huge blocker. @safihamid and @pragnagopa can you help with cold start testing? |
Ok I will get cold start comparison numbers for this by end of next week. I hope most PR feedbacks are addressed by then. |
Co-authored-by: Lilian Kasem <likasem@microsoft.com>
Co-authored-by: Lilian Kasem <likasem@microsoft.com>
…unctions-host into mgravell/remove-rx
INCOMPLETE - still 2 new failing tests # Conflicts: # src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs # test/WebJobs.Script.Tests/Workers/Rpc/GrpcWorkerChannelTests.cs
Merge is in progress; fighting |
No noticeable cold start changes with this PR in my private stamp as long as we are using the right language worker during placeholder mode. i.e. use Node placeholder for Node app, ... |
Most failures here are due to more merges since the PR - 1 left I couldn't get through tonight:
I'm not setup for end-to-end on laptop so can't debug readily but something is amiss there with services - will poke more tomorrow. |
… new system Here we're simulating latency in the outbound pipe was it was intended to when everything was posting through the script manager through RX. Also makes the latency non-static and more test friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noticed there's a TODO here, should probably resolve before merge! (+ any other TODOs I may have missed). If not addressed, you should create an issue for any items we're not handling in this PR
Unsure if this covers #8604 completely or not - hope so!
…SendsInvocationCancelRequest
Gentle reminder we should try and get this in before more conflicts pile up against the old script manager approach. |
Let's get it in -- no objections from me. Although I do see conflicts again... |
@brettsam merged once more from dev and addressed one test race (existing issue but hey, improvements!) |
Thanks, @NickCraver. Good to have this merged once conflicts are resolved. |
@fabiocav I can't see merge conflicts local or PR UX, any idea what it's conflicting with? |
@NickCraver odd, I don't see the usual list, just the conflict warning in GH. If you can merge locally, go for it! |
I beat the GitHub UI! |
Issue describing the changes in this PR
resolves #8280; performance locally is a 5x+ speedup in throughput
the main crux of the new flow is described in src/WebJobs.Script.Grpc/Eventing/GrpcEventExtensions.cs, but in short: we establish a per-worker
Channel<InboundGrpcEvent>
andChannel<OutboundGrpcEvent>
to handle the backlog; instead of publishing to RX, we publish to the writer of the relevant queue. Separately, we have an async worker deque data from the queues, and process accordingly. This is much more direct, avoids a lot of RX machinery, and creates isolation between workers.Pull request checklist
release_notes.md
Additional information
Additional PR information