-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement client interceptors for grpc-native-core #59
Conversation
Can you give a high-level explanation of this implementation? How do the different classes interact? What is the common code path for a call using one or more interceptors? |
At a high level, this PR inserts an InterceptingCall between the JS client and the C-core. Before: After: client_interceptors.js presents the API interceptors must be written to, and the mechanism for executing a chain of interceptors. The interceptors are chained together dynamically, allowing them to be provided either during client construction or during any RPC invocation. All interception work is completed before submitting a batch to
Inbound
Logic previously in client.js for creating the client_batch and the callback provided to A typical code path with two interceptors (InterceptorFoo and InterceptorBar) would look like this. This is the outbound path for a unary call performing the Client.makeUnaryRequest client.js At this point the _joinOpsCustom client_interceptors.js Inbound operations take a similar path but the interceptor order is reversed, we call the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of the logic in client_batch.js
may be more trouble than it's worth. The primary purpose of using a batch instead of sending each op separately is as an optimization to coalesce events internally. But now that interceptors can introduce asynchronous operations that delay ops individually, coalescing ops into a batch that is started when the last op is ready may actually negate that advantage. So, I think it would be better to simply always call startBatch
with each individual op when the interceptor makes it ready. If that turns out that that is a significant performance hit, I think we can make up for it by optimizing it differently, such as by using a UV prepare handle to coalece all ops that are started within a single event loop cycle.
Client.prototype.makeUnaryRequest = function(method, serialize, deserialize, | ||
argument, metadata, options, | ||
callback) { | ||
Client.prototype.makeUnaryRequest = function(method_descriptor, argument, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is part of the public API; we can't change the method signature. The same goes for the other methods on the Client
prototype. However, I don't think that's a major problem; you can pass that method descriptor information in the options
argument.
* @constructor | ||
*/ | ||
function BatchDefinition(batch_ops, trigger_ops, outbound_handler, | ||
inbound_handler) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document the method signatures of outbound_handler
and inbound_handler
.
@@ -0,0 +1,572 @@ | |||
/** | |||
* @license | |||
* Copyright 2015 gRPC authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copyright notices on new files should be 2018 (or 2017, since the file was actually written last year).
* interceptor (if it applies to the method) | ||
* @constructor | ||
*/ | ||
var InterceptorProvider = function(get_interceptor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an object type here just to wrap a single function? Why not just define the InterceptorProvider
type as a function with a signature that matches get_interceptor
's signature?
var batch_registry = new BatchRegistry(); | ||
var intercepting_call = client_interceptors.getInterceptingCall( | ||
call_constructor, interceptors, batch_registry, args.options); | ||
var emitter = new ClientUnaryCall(intercepting_call); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an InterceptingCall
is being passed to the ClientUnaryCall
here, then the ClientUnaryCall
documentation should be change to reflect that. The same goes for the other surface API call types.
How is progress going on this? Anything I can do to help get this merged? |
Thank you for your pull request. Before we can look at your contribution, we need to ensure all contributors are covered by a Contributor License Agreement. After the following items are addressed, please respond with a new comment here, and the automated system will re-verify.
Regards, |
@Crevil Re-writing this to eliminate the batching of operations is going to take some time, and I'll also need to assess the performance impact for our use cases. I hope to have the requested changes addressed in the next two weeks. |
Sound great @drobertduke . We have been looking forward to this change and I would just check up if we could assist in getting it forward. |
Update: I hope to have a new version of this PR posted next week |
In the meantime, it looks like you actually created this PR before we switched to the CNCF CLA, so you'll have to sign that before this can be merged. |
Thank you for your pull request. Before we can look at your contribution, we need to ensure all contributors are covered by a Contributor License Agreement. After the following items are addressed, please respond with a new comment here, and the automated system will re-verify.
Regards, |
I've re-written the PR to send operations as they become available instead of trying to match the same batches the client was making originally. We've deployed this new code to production in a limited capacity and do not see a performance penalty from this change vs earlier versions of this PR. The new interaction pattern with I've signed the CLA now, thanks for the reminder. |
var batch = { | ||
[grpc.opType.SEND_MESSAGE]: serialize(message), | ||
}; | ||
call.startBatch(batch, function () {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should I reconcile multiple batches (each with their own errors potentially) vs a single callback provided by the consumer? I've skipped the callback in early batches in favor of using it when receiving a status as seen in the client streaming RPC but will this obscure useful errors from the consumer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that what you are doing is correct. I think that every instance of multiple operations in a batch where the single callback actually matters includes a status
operation. And I'm pretty sure that the status operation is always guaranteed to complete eventually. And I'm also pretty sure that if there is an error with another operation, it is always because of some circumstance that would cause a non-OK status. So in the end, you will want to surface the status.
One thing you need to be careful of here to avoid spurious errors is that you have to start the SEND_MESSAGE
batch before you start the SEND_CLOSE_FROM_CLIENT
batch, because the latter indicates that no more messages will be sent and the write will fail if you do them in the wrong order. I haven't seen yet if you are ensuring that ordering elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-streaming operations are ordered in client.js, here's the unary case: https://github.com/grpc/grpc-node/pull/59/files#diff-86af358fb1304514580c24212de4f182R486
This doesn't guarantee against a badly-writtten interceptor forcing an incorrect order but I assume we don't need to guard against that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you consider an interceptor that does any asynchronous operations to be badly written? Because I'm pretty sure that an asychronous write interceptor is sufficient to cause those operations to be performed out of order if there is no guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will admit, perhaps there was more value to the batch processing code you had than I previously realized. But I think there are other good solutions to this problem. For example, if the individual InterceptingCall
methods have an additional callback parameter or return a promise, you can propagate "the corresponding batch has been started" back up the call stack to ensure that the batches are started in order. Another option could be to share more information across the interceptor stack. If there is a pending start
call that hasn't gone through all of the interceptors, don't send any messages, and if there are any pending sendMessage
operations in the interceptor stack, don't start the halfClose
batch.
In general, the guarantees we need here are weaker than what you get from batches. The start
operations have to happen at the same time as or before the sendMessage
operations, which have to happen at the same time as or before the halfClose
operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually brings up another question for me: if a sendMessage
interceptor on a streaming request takes an asnychronous action before passing the message to the next layer, is there any protection against messages getting passed down out of order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, the sendMessageWithContext
actually answers that question, and I think it also provides a solution to the other problem. I am still worried about start
, and not just because of unary calls. For any call, sending a message before sending the metadata will not do the right thing, and may crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's desirable to prevent an interceptor from changing the order of streaming messages. Interceptors can modify messages, swap a message out for a new message, or short circuit the operation entirely to skip a message. Building an ordered list of messages as they are passed in by the client's consumer and then enforcing that order when starting batches would make certain use cases impossible, like circuit breaking on streaming messages.
re: enforcing the order of operations in non-streaming requests, I was expecting interceptor authors to do that work if they want to make a continuation asynchronous. Adding a mechanism to the framework to delay an operation until a required operation is ready could reduce the burden on interceptor authors with async continuations. But it could also lead to difficult-to-debug scenarios with silent failures:
- An interceptor author writes a bad interceptor which unexpectedly fails to call the SEND_MESSAGE continuation n% of the time.
- An operation-order-enforcement mechanism in the framework allows the SEND_METADATA operation to be sent but doesn't send CLOSE n% of the time because it is waiting on a SEND_MESSAGE that never comes.
- n% of all calls pile up without error until they time out or memory runs out.
The same bad interceptor would cause immediate failures with useful errors if the framework is not waiting on an order-enforcement mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of sendMessageWithContext
already enforces the order of messages by preventing the top-level stream from sending down each message until the previous message has been fully processed.
And from my point of view, it feels very surprising to users that this interceptor API is clearly written to support asynchronous operations, but if they try to actually use an asynchronous operation, it will screw up some requests unless they do significant extra work to explicitly track their own pending operations. On top of that, in the current library it is not possible for the user to call the API in a way that results in out-of-order calls to the C core. I think that is a valuable guarantee and I would be very hesitant to change it.
I've added a mechanism to define 'dependencies' for individual operations, check whether those dependencies are met and defer the batch if needed. The use case only applies to RPCs with synchronous requests (unary and serverStreaming). I'm guessing this implementation is fairly expensive. Some of the lodash calls can be optimized away but I'm leaving them in for now for clarity. I'll work on producing some benchmarks. |
I ran benchmarks on an m4 EC2 instance for three versions of grpc-native:
Each benchmark is the time it takes to issue and get responses to 50k simple unary RPC requests (with no interceptors) to a local server running in another process. I ran the benchmark three times for each version. BASE - 044102d
PR without order protection - f644c06
PR with order protection - d83d354
Looking at the median timing for each version we get these results (lower is better):
I was surprised by the performance improvement of the PR without order protection over the base. Looking at the
I won't have time to work on performance improvements (reducing |
A NodeJS implementation of client-side interceptors, as described in the proposal: grpc/proposal#14
* @param {string} path | ||
* @param {grpc.Client~CallOptions=} options Options object. | ||
*/ | ||
exports.getCall = function(channel, path, options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function is only used in client_interceptors.js
, so it should be defined there.
OK, I can't say I've totally wrapped my head around all of the code here, but at this point I can't find any other problems, so I'm going to merge it. Thank you for your contribution, and your patience in working through the contribution process. Assuming nothing goes wrong, this will be published in the 1.11 release. |
Is there a timeline for when the 1.11 release is expected? Happy to test some of this out in our organization as well if that helps. |
A NodeJS implementation of client-side interceptors, as described in
the proposal: grpc/proposal#14