-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSDK-6515 Add dial timeout and stats #188
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if there are other metrics we're interested in grabbing in this code. @cheukt and I spoke a bit offline about somehow reporting these metrics back to the signaling server, but I think debug logs are fine for now until we know the shape of the data we're trying to collect.
lib/src/rpc/dial.dart
Outdated
if (disableWebRtc) { | ||
return _dialDirectGrpc(address, opts, sessionCallback); | ||
chan = _dialDirectGrpc(address, opts, sessionCallback).timeout(opts.timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sense that this might not be the idiomatic way to provide a timeout to a Future
, @clintpurser / @njooma . I went down quite a rabbit-hole of using Completer
s to try to guard wherever we call dial
in the SDK, but what I've done here does seem to limit the amount of time we're able to await
the Future<ClientChannelBase>
... Hitting the timeout creates a "context deadline exceeded" error that seems to break app, though, so would love to talk offline about how to catch
that and gracefully report a timeout in connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idiomatically i think this is the correct way to add a timeout to a future, you can still await it normally this way and it will throw a TimeException upon timing out. Alternatively you can add a method into the .timeout()
call to handle how you prefer, i believe using this would resolve your deadline exceeded exception, but we most likely want to catch that somewhere as well. Happy to dig in with you if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool; I think an onTimeout
function will likely be the way to go. Looking into it now, but may need some advice from other Flutterers to get a sense of how to not break the mobile/example apps. Thanks for the input!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to wait until the others weigh in since I don't have the necessary background information, but looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some suggested changes, but nothing blocking
@@ -129,16 +134,23 @@ Future<ClientChannelBase> dial(String address, DialOptions? options, String Func | |||
} catch (e) { | |||
_logger.d('Error dialing with mDNS; falling back to other methods', error: e); | |||
} | |||
mdnsSW.stop(); | |||
_logger.d('STATS: mDNS discovery took ${mdnsSW.elapsed}'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would love to capture this data somehow so we could historically track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah NetCode will be working on that eventually (gathering this data and tracking it to get a sense of how our connection establishment performance looks over time). For now, I think we're ok just debug-logging a number of measurements in this dial code and in other connection establishment paths.
lib/src/rpc/dial.dart
Outdated
if (disableWebRtc) { | ||
return _dialDirectGrpc(address, opts, sessionCallback); | ||
chan = _dialDirectGrpc(address, opts, sessionCallback).timeout(opts.timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idiomatically i think this is the correct way to add a timeout to a future, you can still await it normally this way and it will throw a TimeException upon timing out. Alternatively you can add a method into the .timeout()
call to handle how you prefer, i believe using this would resolve your deadline exceeded exception, but we most likely want to catch that somewhere as well. Happy to dig in with you if you want.
can you add a column to the table that specifies which log you were looking at to get those times? and can flutter use structured logging? would be nice to have the specific data separated out from the message |
Yep can add that column! Updating the logging a little bit now. I'm not sure about structured logging without an external library like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! can you add a ticket for flutter sdk to maybe add structured logging
https://viam.atlassian.net/browse/RSDK-6515
Adds a
timeout
field toDialOptions
that defaults to 10s. Adds a few basic metrics to dialing collected through debug logs.mDNS does not seem to work well with the flutter SDK, so I disabled it in testing. Note that mDNS is also disabled in the RC app, too, as its apparent advantage in "limited connectivity" situations is not really relevant with a phone, as phones are usually operating with some kind of network connection.
Here are some statistics I collected with the new instrumentation that led me to choose a 10s default dial timeout (about 10x "ICE connected" times across different subnets). These statistics are collected across 100 runs in both "Situation"s.
cc @dgottlieb , @maximpertsov , @ale7714