Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSDK-7105 - Fix server streaming #107

Merged
merged 4 commits into from
Apr 4, 2024

Conversation

stuqdog
Copy link
Member

@stuqdog stuqdog commented Apr 2, 2024

  • Simplifies and corrects the "end of stream" determining logic, removes "eos" arg
  • (flyby) parse next_message_length as a usize initially to reduce cases of try_into().unwrap()
  • (flyby) remove extraneous http import
  • (flyby) fix example code credentials template

Comment on lines -208 to -214
// 1-5 because those are the length header bytes for gRPC
to_add_bytes.clone_from_slice(&data[1..5]);
let mut next_message_length = u32::from_be_bytes(to_add_bytes);
// if this is all streaming calls we need to tell the server when we're done with
// the stream, otherwise neither side will know we're done, trailers will never be
// sent/processed, and we'll hang on the strream.
let it_was_all_a_stream = usize::try_from(next_message_length).unwrap() + 5 < data.len();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this logic to determine whether or not we were in a stream was only done once, at the top level loop. This made sense insofar as it would accurately tell us if we were in a streaming scenario (a fact that wouldn't change), but it wasn't ideal for determining when we were at end of stream (which we were using it for).

Comment on lines -247 to -254
eos: if !remaining.is_empty() {
// stream definitely isn't done if there's more to send
false
} else {
// if we intentionally sent an eos or the http request was inferrably
// a stream
eos || it_was_all_a_stream
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny enough, we had a bit of a "two wrongs do make a right" scenario here. We shouldn't be saying "stream done" every time we're in a stream (which we were doing before), but this actually led to client streaming working correctly instead because we were just defaulting to saying it was always eos, which seemed to be fine.

The issue seems to be that in unary calls, eos (as far as I can tell) is meaningless so the fact that we were always saying "false" for eos in those cases (because eos was being passed as false, and because it definitely wasn't a stream so it_was_all_a_stream evaluated to false) didn't really matter. In client streaming cases, because we weren't ever re-evaluating the it_was_all_a_stream value, we were actually passing "true" here at all times! Thus, everything worked out okay.

But in server streaming cases, it_was_all_a_stream was evaluating to false and so we were saying eos: false. This is exactly the same as in the unary case, which worked perfectly normally. My theory is that on the server side, eos is evaluated/matters in all streaming call cases (thus causing issues with our server streaming request despite the fact that the request is unary), but never in unary call cases (thus allowing unary requests to succeed despite the fact that eos was always false).

let stream = stream.clone();
let eos = remaining.is_empty();
Copy link
Member Author

@stuqdog stuqdog Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got bit here a bit by trying to be too close to Go. The Go logic doesn't seem to have a way of reliably saying that the stream is done, hence taking eos as an argument. We initially copied that in Rust, but it turns out we do have a good way to determine if the stream is done or not, since we receive a complete vec of bytes containing all the data of all the calls. Thus we can just check for that vec to be empty, removing the need for an eos value to ever be passed and reducing the risk of it being passed incorrectly.

Worth noting, the case where there is meaningful latency delay between messages being sent on the client side in a bidi streaming context remains a little uncertain. We seem to consistently receive all messages in a single vec, and thus can reliably infer when eos occurs. I'm not certain what will happen in cases where there's a heavy delay (is there blocking somewhere in rust to collect all the stream messages? will we end up making multiple call calls under the hood?), in part because (to my knowledge) we don't have any APIs that work that way yet. But if and when we do, some revisiting may be necessary. There's a comment to this effect in the file already.

Comment on lines +244 to +248
// note(ethan): the variable name that used to exist for determining `eos` was
// `it_was_all_a_stream`. Which isn't important at all, but
// `it_was_all_a_stream` is the best variable name I've ever shipped into
// production and it makes me sad to see it go, so I wanted to memorialize
// it somehow!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😭

@stuqdog stuqdog marked this pull request as ready for review April 2, 2024 18:52
@stuqdog stuqdog requested a review from a team as a code owner April 2, 2024 18:52
@stuqdog
Copy link
Member Author

stuqdog commented Apr 2, 2024

(note to reviewers): I'm still doing a liiittle bit more testing as I consider more cases so it's possible there will be future changes based on that. But, I have decently high confidence that this is correct and didn't want to prevent review from happening. If anything meaningful does change, I will be sure to ping reviewers and flag it.

Copy link
Member

@purplenicole730 purplenicole730 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand what was going on, but your explanations really helped. LGTM! I'm glad we're working towards using a more "correct" way of doing things

@stuqdog stuqdog merged commit bcf637a into viamrobotics:main Apr 4, 2024
4 checks passed
@stuqdog stuqdog deleted the fix-server-streaming branch April 4, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants