RSDK-7105 - Fix server streaming #107

stuqdog · 2024-04-02T18:31:29Z

Simplifies and corrects the "end of stream" determining logic, removes "eos" arg
(flyby) parse next_message_length as a usize initially to reduce cases of try_into().unwrap()
(flyby) remove extraneous http import
(flyby) fix example code credentials template

stuqdog · 2024-04-02T18:33:15Z

src/rpc/client_channel.rs

-        // 1-5 because those are the length header bytes for gRPC
-        to_add_bytes.clone_from_slice(&data[1..5]);
-        let mut next_message_length = u32::from_be_bytes(to_add_bytes);
-        // if this is all streaming calls we need to tell the server when we're done with
-        // the stream, otherwise neither side will know we're done, trailers will never be
-        // sent/processed, and we'll hang on the strream.
-        let it_was_all_a_stream = usize::try_from(next_message_length).unwrap() + 5 < data.len();


All of this logic to determine whether or not we were in a stream was only done once, at the top level loop. This made sense insofar as it would accurately tell us if we were in a streaming scenario (a fact that wouldn't change), but it wasn't ideal for determining when we were at end of stream (which we were using it for).

stuqdog · 2024-04-02T18:40:20Z

src/rpc/client_channel.rs

-                        eos: if !remaining.is_empty() {
-                            // stream definitely isn't done if there's more to send
-                            false
-                        } else {
-                            // if we intentionally sent an eos or the http request was inferrably
-                            // a stream
-                            eos || it_was_all_a_stream
-                        },


Funny enough, we had a bit of a "two wrongs do make a right" scenario here. We shouldn't be saying "stream done" every time we're in a stream (which we were doing before), but this actually led to client streaming working correctly instead because we were just defaulting to saying it was always eos, which seemed to be fine.

The issue seems to be that in unary calls, eos (as far as I can tell) is meaningless so the fact that we were always saying "false" for eos in those cases (because eos was being passed as false, and because it definitely wasn't a stream so it_was_all_a_stream evaluated to false) didn't really matter. In client streaming cases, because we weren't ever re-evaluating the it_was_all_a_stream value, we were actually passing "true" here at all times! Thus, everything worked out okay.

But in server streaming cases, it_was_all_a_stream was evaluating to false and so we were saying eos: false. This is exactly the same as in the unary case, which worked perfectly normally. My theory is that on the server side, eos is evaluated/matters in all streaming call cases (thus causing issues with our server streaming request despite the fact that the request is unary), but never in unary call cases (thus allowing unary requests to succeed despite the fact that eos was always false).

stuqdog · 2024-04-02T18:42:00Z

src/rpc/client_channel.rs

                let stream = stream.clone();
+                let eos = remaining.is_empty();


We got bit here a bit by trying to be too close to Go. The Go logic doesn't seem to have a way of reliably saying that the stream is done, hence taking eos as an argument. We initially copied that in Rust, but it turns out we do have a good way to determine if the stream is done or not, since we receive a complete vec of bytes containing all the data of all the calls. Thus we can just check for that vec to be empty, removing the need for an eos value to ever be passed and reducing the risk of it being passed incorrectly.

Worth noting, the case where there is meaningful latency delay between messages being sent on the client side in a bidi streaming context remains a little uncertain. We seem to consistently receive all messages in a single vec, and thus can reliably infer when eos occurs. I'm not certain what will happen in cases where there's a heavy delay (is there blocking somewhere in rust to collect all the stream messages? will we end up making multiple call calls under the hood?), in part because (to my knowledge) we don't have any APIs that work that way yet. But if and when we do, some revisiting may be necessary. There's a comment to this effect in the file already.

stuqdog · 2024-04-02T18:51:22Z

src/rpc/client_channel.rs

+                        // note(ethan): the variable name that used to exist for determining `eos` was
+                        // `it_was_all_a_stream`. Which isn't important at all, but
+                        // `it_was_all_a_stream` is the best variable name I've ever shipped into
+                        // production and it makes me sad to see it go, so I wanted to memorialize
+                        // it somehow!


stuqdog · 2024-04-02T18:53:24Z

(note to reviewers): I'm still doing a liiittle bit more testing as I consider more cases so it's possible there will be future changes based on that. But, I have decently high confidence that this is correct and didn't want to prevent review from happening. If anything meaningful does change, I will be sure to ping reviewers and flag it.

purplenicole730

It took me a while to understand what was going on, but your explanations really helped. LGTM! I'm glad we're working towards using a more "correct" way of doing things

stuqdog and others added 4 commits April 2, 2024 09:50

init commit

15b6ed8

Merge branch 'viamrobotics:main' into fix-server-streaming

52d2dd7

simplify eos logic

76428f3

it was all a dream...

aee50dc

stuqdog commented Apr 2, 2024

View reviewed changes

stuqdog marked this pull request as ready for review April 2, 2024 18:52

stuqdog requested a review from a team as a code owner April 2, 2024 18:52

stuqdog requested review from njooma and purplenicole730 April 2, 2024 18:52

purplenicole730 approved these changes Apr 3, 2024

View reviewed changes

njooma approved these changes Apr 3, 2024

View reviewed changes

stuqdog merged commit bcf637a into viamrobotics:main Apr 4, 2024
4 checks passed

stuqdog deleted the fix-server-streaming branch April 4, 2024 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSDK-7105 - Fix server streaming #107

RSDK-7105 - Fix server streaming #107

stuqdog commented Apr 2, 2024

stuqdog Apr 2, 2024

stuqdog Apr 2, 2024

stuqdog Apr 2, 2024 •

edited

Loading

stuqdog Apr 2, 2024

stuqdog commented Apr 2, 2024

purplenicole730 left a comment

RSDK-7105 - Fix server streaming #107

RSDK-7105 - Fix server streaming #107

Conversation

stuqdog commented Apr 2, 2024

stuqdog Apr 2, 2024

Choose a reason for hiding this comment

stuqdog Apr 2, 2024

Choose a reason for hiding this comment

stuqdog Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

stuqdog Apr 2, 2024

Choose a reason for hiding this comment

stuqdog commented Apr 2, 2024

purplenicole730 left a comment

Choose a reason for hiding this comment

stuqdog Apr 2, 2024 •

edited

Loading