Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return transport errors to the caller #399

Merged
merged 7 commits into from
Mar 24, 2023
Merged

Return transport errors to the caller #399

merged 7 commits into from
Mar 24, 2023

Conversation

bruwozniak
Copy link
Contributor

resolves #389

This is just cherry picking relevant commits from https://github.com/tikue/tarpc/tree/gats and adding a test with a mock transport that errors out on sending.

@tikue tikue self-assigned this Mar 16, 2023
@tikue tikue self-requested a review March 16, 2023 19:38
Copy link
Collaborator

@tikue tikue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks so much! Just a couple of comments :)

tarpc/src/server.rs Outdated Show resolved Hide resolved
tarpc/src/client.rs Outdated Show resolved Hide resolved
@bruwozniak
Copy link
Contributor Author

@tikue Rebased and extended the test to all cases (I tried to compress it into one tabular test not to cause bloat). Please take anoter look.

let (tx, mut rx) = oneshot::channel();
let resp = send_request(&mut channel, "hi", tx, &mut rx).await;
assert!(dispatch.as_mut().poll(cx).is_pending());
for (error, cause) in vec![
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I'm sorry about this, I just realized I had left my comment about test coverage on the wrong enum. This test, which is testing RpcError::Send, was totally fine as you had it before.

What I meant to say, but failed to, was that RpcError had some other variants added in this PR, and those new variants could use test coverage—are they returned in the proper circumstances? For example, Disconnected was replaced by Shutdown and Receive errors.

The additional test cases I had in mind:

  • RpcError::Shutdown is returned when the dispatch task has shut down, preventing the client from:
    1. initiating an RPC
    2. completing an RPC
  • An RPC should be completed with RpcError::Receive if the dispatch task receives an error when attempting to receive a message from the transport. Failing to receive is a permanent error, which shouldn't change, but the dispatch task should still send the more specific Receive error to all pending RPCs (of which there could be a few).

I've added comments to the specific places in the code that correspond to these test cases. I actually think RpcError::Receive is unused in this PR, so you can remove it if you'd like, and I can file an issue to add it back later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries @tikue , it happens. I think now I added what you had in mind. I don't think it makes sense to delete what I already added though? It seems useful for the future and more tests usually don't hurt, WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, though I think these are really 5 different test cases (should be different #[test] fns). It's the match expression that leads me to think that — each error variant has different test logic.

I like the test coverage, but would prefer to put the test cases in different fns.

tarpc/src/client.rs Show resolved Hide resolved
tarpc/src/client.rs Show resolved Hide resolved
tarpc/src/client.rs Outdated Show resolved Hide resolved
tikue and others added 6 commits March 23, 2023 12:59
Previously, InFlightRequests required the client response type to be a
server response. However, this prevented injection of non-server
responses: for example, if the client fails to send a request, it should
complete the request with an IO error rather than a server error.
Previously, a client channel would immediately disconnect when
encountering an error in Transport::try_send. One kind of error that can
occur in try_send is message validation, e.g. validating a message is
not larger than a configured frame size. The problem with shutting down
the client immediately is that debuggability suffers: it can be hard to
understand what caused the client to fail. Also, these errors are not
always fatal, as with frame size limits, so complete shutdown was
extreme.

By bubbling up errors, it's now possible for the caller to
programmatically handle them. For example, the error could be walked
via anyhow::Error:

    2023-01-10T02:49:32.528939Z  WARN client: the client failed to send the request

    Caused by:
        0: could not write to the transport
        1: frame size too big
Copy link
Collaborator

@tikue tikue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 minor things and then I think we're good to merge :)

tarpc/src/client/in_flight_requests.rs Show resolved Hide resolved
let (tx, mut rx) = oneshot::channel();
let resp = send_request(&mut channel, "hi", tx, &mut rx).await;
assert!(dispatch.as_mut().poll(cx).is_pending());
for (error, cause) in vec![
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, though I think these are really 5 different test cases (should be different #[test] fns). It's the match expression that leads me to think that — each error variant has different test logic.

I like the test coverage, but would prefer to put the test cases in different fns.

@bruwozniak bruwozniak requested a review from tikue March 24, 2023 13:38
Copy link
Collaborator

@tikue tikue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for making this great improvement!

@tikue tikue merged commit 93f3880 into google:master Mar 24, 2023
@bruwozniak bruwozniak deleted the client_errors branch March 26, 2023 08:16
@bruwozniak
Copy link
Contributor Author

@tikue thanks you for very helpful guidance. May I ask for a release with this change so we can include it in our project?

@tikue
Copy link
Collaborator

tikue commented Mar 27, 2023

I published v0.32.0 immediately after merging this :)

tikue added a commit to tikue/tarpc that referenced this pull request Apr 17, 2023
* Make client::InFlightRequests generic over result.

Previously, InFlightRequests required the client response type to be a
server response. However, this prevented injection of non-server
responses: for example, if the client fails to send a request, it should
complete the request with an IO error rather than a server error.

* Gracefully handle client-side send errors.

Previously, a client channel would immediately disconnect when
encountering an error in Transport::try_send. One kind of error that can
occur in try_send is message validation, e.g. validating a message is
not larger than a configured frame size. The problem with shutting down
the client immediately is that debuggability suffers: it can be hard to
understand what caused the client to fail. Also, these errors are not
always fatal, as with frame size limits, so complete shutdown was
extreme.

By bubbling up errors, it's now possible for the caller to
programmatically handle them. For example, the error could be walked
via anyhow::Error:

```
    2023-01-10T02:49:32.528939Z  WARN client: the client failed to send the request

    Caused by:
        0: could not write to the transport
        1: frame size too big
```

* Some follow-up work: right now, read errors will bubble up to all pending RPCs. However, on the write side, only `start_send` bubbles up. `poll_ready`, `poll_flush`, and `poll_close` do not propagate back to pending RPCs. This is probably okay in most circumstances, because fatal write errors likely coincide with fatal read errors, which *do* propagate back to clients. But it might still be worth unifying this logic.

---------

Co-authored-by: Tim Kuehn <tikue@google.com>
tikue added a commit to tikue/tarpc that referenced this pull request Apr 17, 2023
* Make client::InFlightRequests generic over result.

Previously, InFlightRequests required the client response type to be a
server response. However, this prevented injection of non-server
responses: for example, if the client fails to send a request, it should
complete the request with an IO error rather than a server error.

* Gracefully handle client-side send errors.

Previously, a client channel would immediately disconnect when
encountering an error in Transport::try_send. One kind of error that can
occur in try_send is message validation, e.g. validating a message is
not larger than a configured frame size. The problem with shutting down
the client immediately is that debuggability suffers: it can be hard to
understand what caused the client to fail. Also, these errors are not
always fatal, as with frame size limits, so complete shutdown was
extreme.

By bubbling up errors, it's now possible for the caller to
programmatically handle them. For example, the error could be walked
via anyhow::Error:

```
    2023-01-10T02:49:32.528939Z  WARN client: the client failed to send the request

    Caused by:
        0: could not write to the transport
        1: frame size too big
```

* Some follow-up work: right now, read errors will bubble up to all pending RPCs. However, on the write side, only `start_send` bubbles up. `poll_ready`, `poll_flush`, and `poll_close` do not propagate back to pending RPCs. This is probably okay in most circumstances, because fatal write errors likely coincide with fatal read errors, which *do* propagate back to clients. But it might still be worth unifying this logic.

---------

Co-authored-by: Tim Kuehn <tikue@google.com>
tikue added a commit to tikue/tarpc that referenced this pull request Apr 17, 2023
* Make client::InFlightRequests generic over result.

Previously, InFlightRequests required the client response type to be a
server response. However, this prevented injection of non-server
responses: for example, if the client fails to send a request, it should
complete the request with an IO error rather than a server error.

* Gracefully handle client-side send errors.

Previously, a client channel would immediately disconnect when
encountering an error in Transport::try_send. One kind of error that can
occur in try_send is message validation, e.g. validating a message is
not larger than a configured frame size. The problem with shutting down
the client immediately is that debuggability suffers: it can be hard to
understand what caused the client to fail. Also, these errors are not
always fatal, as with frame size limits, so complete shutdown was
extreme.

By bubbling up errors, it's now possible for the caller to
programmatically handle them. For example, the error could be walked
via anyhow::Error:

```
    2023-01-10T02:49:32.528939Z  WARN client: the client failed to send the request

    Caused by:
        0: could not write to the transport
        1: frame size too big
```

* Some follow-up work: right now, read errors will bubble up to all pending RPCs. However, on the write side, only `start_send` bubbles up. `poll_ready`, `poll_flush`, and `poll_close` do not propagate back to pending RPCs. This is probably okay in most circumstances, because fatal write errors likely coincide with fatal read errors, which *do* propagate back to clients. But it might still be worth unifying this logic.

---------

Co-authored-by: Tim Kuehn <tikue@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No error details when exceeding of LengthDelimitedCodec's max frame length
2 participants