Start measuring Ping latency closer to sending request and remove Stopwatch instance #1714

eduardobr · 2021-03-28T17:53:57Z

Inspired on ValueStopWatch: https://github.com/dotnet/extensions/blob/master/src/Shared/src/ValueStopwatch/ValueStopwatch.cs, I'm removing the instance of Stopwatch from TimerMessage.

There's a change of behavior here:
We're not starting latency measurement when message is instantiated, but right before it is written to network.

Decreasing diff on #1714 to make things a bit easier to read.

NickCraver · 2021-05-22T13:49:12Z

Pushed up naming separate to more easily see changes here!

NickCraver

As I understand it, a subtle but important difference here is in system clock changes. For example if a system clock is off and get corrected by an NTP sync (this happens pretty often), it'll introduce incorrect timings, since 2 absolute timestamps, as opposed to Stopwatch (which is relative) will not be relatively correct. It could show too much time or even negative time, when corrections occur.

As an example, let's say a command starts just after midnight:

[System time: 00:00:05.500] Command starts (`timingMessage.CreatedTimestamp` is set)
[System time: 00:00:06.000] NTP time corrects, it's actually [00:00:04] (clock drift was 2 seconds)
[System time: 00:00:05.000] Command finishes

...according to profiling, this command took -500ms, because that's that the diff in the 2 timestamps told us.

I agree on potential efficiency here, but I think we are trading correctness (no non-high performance counter systems), which is a hard sell when it comes to profiling. But maybe, the number of systems affected would be so low we're okay with that.

Thoughts? Suggestions?

eduardobr · 2021-05-23T01:26:47Z

Hi Nick,
I was just checking the source of StopWatch, we can see that the same static GetTimestamp is used internally when Start is called and when we get Elapsed property of the running StopWatch.
Considering this, the change here calculates time exactly same way. It moved from StopWatch.Start() to StopWatch.GetTimestamp().
Unless I’m missing some other bit?

Other than this, what I really wanted when I was looking this code is to understand when the Ping() latency measurement starts. This is because in some projects I’m working I can see the latency calculated being much higher than the actual trips to redis when measured from outside the application. I suspect the reason was because the application was running in containers with extremely low CPU (0.2 cpu core) and the actual code path to run a Ping() call was slower than the actual roundtrip to Redis when application was busy processing other thing in parallel.
So I wanted to have a measure of latency that started as late as possible before reaching the network from the application and has finished as early as possible when the response comes.
But I think that requires a larger refactoring to achieve and needs to be announced so people don’t think suddenly their network became faster for some reason.
I would be happy to try something if you think all of this makes sense.
If the intention of TimerMessage class is really to measure everything with the code execution included, then maybe we can have a GetLatency() or use a different, new class (not TimerMessage) only for Ping because it’s documented to return latency.

Update: At least the start time for measuring latency could be quite precise if we get it in TimerMessage.WriteImpl instead, no?

Update 2: going deeper in the library code I just noticed that TimerMessage is only used for Pings, so I've changed it there already to GetTimestamp on WriteImpl().
I made a quick and dirty test running 10k pings in a for loop and storing the time taken from TimerMessage constructor to WriteImpl and the Max was surprisingly 27ms, with average being 0.05ms. Those shouldn't be present in the latency measurement anymore.
I hope it all makes sense.

…pwatch instance

NickCraver

Looking good - thank you!

Adding #1714 to release notes

eduardobr force-pushed the feature/optimize-timermessage branch from 6552038 to 16a6124 Compare March 28, 2021 18:16

NickCraver pushed a commit that referenced this pull request May 22, 2021

Naming fixes

c625c1d

Decreasing diff on #1714 to make things a bit easier to read.

NickCraver mentioned this pull request May 22, 2021

Profiler naming fixes #1734

Merged

NickCraver added a commit that referenced this pull request May 22, 2021

Profiler naming fixes (#1734)

f2a7671

Decreasing diff on #1714 to make things a bit easier to read.

NickCraver added the 🐌 performance label May 22, 2021

NickCraver reviewed May 22, 2021

View reviewed changes

eduardobr force-pushed the feature/optimize-timermessage branch from 0b7a81c to 1eeaf4c Compare May 23, 2021 18:06

Start measuring Ping latency closer to sending request and remove Sto…

e4328e4

…pwatch instance

eduardobr force-pushed the feature/optimize-timermessage branch from 1eeaf4c to e4328e4 Compare May 23, 2021 19:12

eduardobr changed the title ~~Reduce allocation on TimerMessage by reusing Message.CreatedTimestamp~~ Start measuring Ping latency closer to sending request and remove Stopwatch instance May 23, 2021

NickCraver approved these changes May 25, 2021

View reviewed changes

NickCraver merged commit 69a17b1 into StackExchange:main May 25, 2021

NickCraver pushed a commit that referenced this pull request May 25, 2021

Release notes update

4425978

Adding #1714 to release notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start measuring Ping latency closer to sending request and remove Stopwatch instance #1714

Start measuring Ping latency closer to sending request and remove Stopwatch instance #1714

eduardobr commented Mar 28, 2021 •

edited

Loading

NickCraver commented May 22, 2021

NickCraver left a comment •

edited

Loading

eduardobr commented May 23, 2021 •

edited

Loading

NickCraver left a comment

Start measuring Ping latency closer to sending request and remove Stopwatch instance #1714

Start measuring Ping latency closer to sending request and remove Stopwatch instance #1714

Conversation

eduardobr commented Mar 28, 2021 • edited Loading

NickCraver commented May 22, 2021

NickCraver left a comment • edited Loading

Choose a reason for hiding this comment

eduardobr commented May 23, 2021 • edited Loading

NickCraver left a comment

Choose a reason for hiding this comment

eduardobr commented Mar 28, 2021 •

edited

Loading

NickCraver left a comment •

edited

Loading

eduardobr commented May 23, 2021 •

edited

Loading