Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thrift: deal with TStruct serialization failures #744

Merged
merged 6 commits into from
Oct 11, 2019

Conversation

twilly
Copy link
Contributor

@twilly twilly commented Jul 3, 2019

A Thrift server handler can return success with a TStruct that may
return an error when writing a serialized form. This error is ignored,
which cause a invalid third argument to be written (or not written at
all).

We can better handle these kinds of errors earlier by buffering a
serialized TStruct and sending along a system error if the
serialization fails.

Fixes #743

@twilly twilly requested review from abhinav and prashantv July 3, 2019 23:28
@codecov
Copy link

codecov bot commented Jul 3, 2019

Codecov Report

Merging #744 into dev will increase coverage by 0.39%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #744      +/-   ##
==========================================
+ Coverage   87.78%   88.17%   +0.39%     
==========================================
  Files          40       40              
  Lines        4101     4101              
==========================================
+ Hits         3600     3616      +16     
+ Misses        382      370      -12     
+ Partials      119      115       -4
Impacted Files Coverage Δ
channel.go 88.46% <100%> (ø) ⬆️
mex.go 72.51% <0%> (-2.85%) ⬇️
inbound.go 82.19% <0%> (+1.57%) ⬆️
connection.go 88.38% <0%> (+4.79%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74b0fff...ca727e3. Read the comment docs.

*NOTE*: I need feedback on XXX lines.

A thrift server handler can return success with a TStruct that may
return an error when writing a serialized form. This error is ignored,
which cause a invalid third argument to be written (or not written at
all).

We can better handle these kinds of errors earlier by buffering a
serialized TStruct and sending along a system error if the serialization
fails.

Fixes #743.
@twilly twilly force-pushed the tristan/another-day-another-error branch from 7c16143 to b07b24f Compare July 3, 2019 23:34
thrift/server.go Outdated
@@ -185,6 +186,17 @@ func (s *Server) handle(origCtx context.Context, handler handler, method string,
return err
}

// Buffer the response write so we can send a system error to the client.
// It's too late to send a system error by the time we write Arg3.
var respStruct bytes.Buffer // XXX should we make a buffer pool?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a decision best made with data, but I expect pooling to yield a 20% improvement in latency.

thrift/server.go Outdated
var respStruct bytes.Buffer // XXX should we make a buffer pool?
wp = getProtocolWriter(&respStruct)
if err := resp.Write(wp.protocol); err != nil {
call.Response().SendSystemError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can return an err at which point it should be a tableflip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes with writing out the buffer, eg the socket died. Plenty of tableflip failure modes that will at least get a server side error.

Copy link
Contributor

@prashantv prashantv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid additional buffering if possible -- it's an easy fix, but it also has a performance impact that might not be necessary.

Have we tried other options:

  • writing directly to arg3, but then sending an error halfway through?
  • writing to arg3, but if it fails, logging an error, and blackholing that response (no more bytes, but the response is never completed. client will see a timeout)

If we have to buffer, then we probably should pool. We'll need benchmark numbers to compare, but considering most other things are pooled, it seems like this would have a visible performance impact.

thrift/server.go Outdated
@@ -185,6 +186,17 @@ func (s *Server) handle(origCtx context.Context, handler handler, method string,
return err
}

// Buffer the response write so we can send a system error to the client.
// It's too late to send a system error by the time we write Arg3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure about this? I'm quite sure Muttley can send an error halfway through a response, and the clients see the error.

@twilly
Copy link
Contributor Author

twilly commented Jul 4, 2019

writing directly to arg3, but then sending an error halfway through?

I'm still looking into this. I did a quick pass and wasn't sure we can terminate a half-written arg3 and then start another trailer arg.

writing to arg3, but if it fails, logging an error, and blackholing that response (no more bytes, but the response is never completed. client will see a timeout)

I'm trying to avoid this case. Timeouts don't tell the client what happened and the server knows what happened.

As discussed previously, this eliminates buffering of the responses. It
appears that TChannel can send system errors after partial writes to the
output stream.

This also adds a test for the partial writes case.
@abhinav
Copy link
Contributor

abhinav commented Oct 10, 2019

@prashantv I've removed the buffering of responses and added a test to verify
that the system error propagates for partial writes.

@abhinav abhinav requested a review from witriew October 10, 2019 17:02
thrift/server_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@prashantv prashantv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests!

err = writer.Close()
defer thriftProtocolPool.Put(wp)

if err := resp.Write(wp.protocol); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unrelated to this diff but just thought of this) @twilly just as a safety check, let's verify that the passed in protocol here is a TBinaryProtocol so that users don't end up calling Write with a custom protocol and expect it to work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Created an internal ticket to track this.

@abhinav abhinav merged commit fff536c into dev Oct 11, 2019
@abhinav abhinav deleted the tristan/another-day-another-error branch October 11, 2019 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle errors when writing the thrift struct value returned by a TChanServer.Handle.
5 participants