Windows: improve timer precision #1165

orangemocha · 2014-02-28T18:50:26Z

Improving timing precision by using QueryPerformanceCounter.

This is part of the fix for Node.js' test-timers-first-fire.js.

Nodejs-Jenkins · 2014-02-28T18:50:27Z

Thank you for contributing this pull request! Here are a few pointers to make sure your submission will be considered for inclusion.

The following commiters were not found in the CLA:

Alexis Campailla

You can fix all these things without opening another issue.

Please see CONTRIBUTING.md for more information

piscisaureus · 2014-02-28T18:58:07Z

This solution is kind of obvious. But in the past we deliberately didn't go for the QueryPerformanceCounter solution here because it turned out to have a significant performance impact. IIRC this effect was much more apparent when multiple node processes were running at the same time. That said, I tested mostly on what we would now consider "old laptop" - a HP laptop with a 2.8ghz dualcore Intel T9600 processor running windows 7. Maybe it's time for a re-evaluation.

An alternative would be to use timeGetTime (v8 does this too), and optionally use timeBeginPeriod to set the clock resolution to 1ms.

retep998 · 2014-02-28T20:23:19Z

Here are some numbers I obtained on Windows 8.1 x64, with a Phenom II 2.5 GHz processor.
26 nanoseconds per timeGetTime
32 nanoseconds per QueryPerformanceCounter
3 nanoseconds per GetTickCount

orangemocha · 2014-03-03T15:11:29Z

Thanks for the feedback. QueryPerformanceCounter's latency is indeed dependent on the OS version and the specific system hardware. My latest commit enables the high-precision solution only when a reasonably fast implementation is detected.

indutny · 2014-03-03T17:27:19Z

I don't really enjoy this, this solution is timing dependent and people may suffer from expecting higher resolution and getting lower if at process startup system was very busy.

@orangemocha is it possible to statically detect this? On which configurations is it faster?

orangemocha · 2014-03-04T13:35:20Z

Thank you for your feedback. Unfortunately there is no reliable static way of determining the type of Qpc implementation. In summary, different OS versions use different strategies and newer ones adapt to the hardware available, even using performance measurements. I don’t think it’s a good idea for us to try to guess that logic, and there is no API to do it. Foreseeably, newer systems will have improved Qpc implementations and the need for the GetTickCount approach will go away over time.

So IMHO the performance measurement is the most reliable way available to take advantage of fast high-precision timer implementations when available. I agree it’s not the prettiest thing, but I think it’s either this or sticking to GetTickCount and a 15.6ms resolution. Which approach do you like better? (This is a non-rhetorical question). What are the known scenarios for which we/people care about timer precision in node?

A different question: is the impact of even a slow Qpc implementation (think in the order of 1 microsecond) so terrible? We call uv_update_time once per iteration of the event loop, and during timer manipulations (e.g listOnTimeout). I could be wrong but this doesn’t seem like a huge deal.

Regarding you comment about busy startup time, I could try to improve the reliability of the initial measurement (e.g. by briefly raising the thread priority).

indutny · 2014-03-04T13:43:08Z

Ok, this makes sense. What do you think about adding configuration option for it? Preferably as an API method. If this sounds good to you - I guess we could just remove this hacks and just use QPC, if not configured otherwise.

orangemocha · 2014-03-14T18:38:15Z

I ran the http benchmarks on 2 different machines. The results can be summarized as follows:

Win7, fast Qpc: uv_update_time with Qpc yields +24% throughput.
Win2k8R2, slow Qpc, uv_update_time with Qpc yields +10% throughput.

The increased average throughput is largely due to the benchmarks in http\chunked.js. For reasons unclear to me, the increased precision yields dramatic improvements in these tests (up to +300%).
Excluding the http\chunked tests, the average throughput delta observed is +1.2% on machine 1), and -2.3% on machine 2).

Based on these results, and assuming the http benchmarks are a good indicator, there doesn’t seem to be a significant performance degradation using QueryPerformanceCounter.

I updated the PR with the commit that uses QueryPerformanceCounter unconditionally.

orangemocha · 2014-03-14T18:55:17Z

For the record, I gathered these measurements using the http benchmarks, with this command:
node benchmark\common.js http
I ran this all locally on each machine.
I had to replace wrk with apache benchmark because wrk doesn't build on Windows. I ran the tests 4 times per configuration and averaged the results. The standard deviation of the 4 runs is at most 1.8%.

orangemocha · 2014-04-11T16:00:47Z

Minor change to keep last_tick_count for ABI compatibility, in case we ever want to roll this back.

saghul · 2014-08-09T09:47:50Z

What is the status of this? Do we land it, discard it?

orangemocha · 2014-08-13T12:56:41Z

There hasn't been much discussion about this AFAIK.

My vote would be for landing it. v0.12.0 would be a good time to get some real world testing and get feedback on any negative performance impact. If that was the case I'd be on call for fixing it for the next minor release.

What's the decision process? If we decide to land it, I'll update the commit to resolve the merge conflict.

txdv · 2014-08-13T19:41:08Z

What commit?

saghul · 2014-08-13T19:44:37Z

There hasn't been much discussion about this AFAIK.

My vote would be for landing it. v0.12.0 would be a good time to get some real world testing and get feedback on any negative performance impact. If that was the case I'd be on call for fixing it for the next minor release.

What's the decision process? If we decide to land it, I'll update the commit to resolve the merge conflict.

Well, the decision process is that we trust each other on the respective areas of expertise :-) I can read the code, but clearly you and @piscisaureus know a lot more about it. If you guys think we should land it I'm happy to do so.

If anyone else knows about this and has some feedback to share, now would be a good time :-)

What commit?

The one on this PR: https://github.com/MSOpenTech/libuv/commit/fed2906fed784cbced7de8ab9908be60af70fb2c

orangemocha · 2014-08-19T12:30:25Z

Updated to solve merge conflicts. @piscisaureus are you ok with this?

saghul · 2014-08-19T12:58:47Z

include/uv-win.h

@@ -316,7 +316,8 @@ RB_HEAD(uv_timer_tree_s, uv_timer_s);
  HANDLE iocp;                                                                \
  /* The current time according to the event loop. in msecs. */               \
  uint64_t time;                                                              \
-  /* GetTickCount() result when the event loop time was last updated. */      \
+  /* This is unused, but we are keeping it to maintain ABI compatibility. */  \
+  /* TODO: Remove in v0.14. */                                                \


I don't think we care about ABI compatibility at this point, v0.10 and master diverged so much it's already broken in tens of places.

Right, but actually this was intended to maintain ABI compatibility in case we decide to revert this change in v0.12.x.

Ah, I see. Can you please clarify the comment a bit? " ...maintain ABI compatibility in case of revert" or something like that.

piscisaureus · 2014-08-19T15:24:11Z

@orangemocha

I am nervous about what our chinese xp bootlegger user base is going to think, but let's give it a spin. But before you land:

Do you have any idea how often the QueryPerformanceCounter() reported value wraps around? What are typical values for QueryPerformanceFrequency()?
I am nervous for relying on uv__hrtime for precise loop timings because it uses floating point math and performance counter values can get extremely large. Can we switch back to using integer math?

This comment (and the surrounding logic) needs updating:

libuv/src/win/timer.c

Lines 210 to 217 in a4f0350

    
                 /* A timeout value of UINT_MAX means infinite, so that's no good. But 
        
                  * more importantly, there's always the risk that GetTickCount wraps. 
        
                  * uv_update_time can detect this, but we must make sure that the 
        
                  * tick counter never overflows twice between two subsequent 
        
                  * uv_update_time calls. We do this by never sleeping more than half 
        
                  * the time it takes to wrap  the counter - which is huge overkill, 
        
                  * but hey, it's not so bad to wake up every 25 days. 
        
                  */

The timer-fudge-forward logic should no longer be necessary. Remove the uv__time_forward function as well as it's invocations (e.g.

libuv/src/win/core.c

Lines 320 to 324 in a4f0350

    
               /* We're sure that at least `timeout` milliseconds have expired, but 
        
                * this may not be reflected yet in the GetTickCount() return value. 
        
                * Therefore we ensure it's taken into account here. 
        
                */ 
        
               uv__time_forward(loop, timeout);

).

Can we assume that the QueryPerformanceCounter() result only ever goes up? If so, I would like to see the if statement here removed: https://github.com/joyent/libuv/pull/1165/files#diff-7a69559815db50b6df21eb36dbae4333R40
For clarity, I would like to see the inline number (1000000) be replaced by a constant or preprocessor define: https://github.com/joyent/libuv/pull/1165/files#diff-7a69559815db50b6df21eb36dbae4333R33

retep998 · 2014-08-19T17:14:07Z

A quick test on my machine reported 14318180 for the frequency, so 40,853 years until a wrap around.
Even if we had a system that went overkill with nanosecond resolution, that would be 584 years until a wrap around.

retep998 · 2014-08-19T17:20:39Z

Some googling found that someone has a system which reports 3591630000 for the frequency, which would leave them with only 162 years until a wrap around. Assuming this is a worst case scenario tied to the CPU frequency, I have no reason to believe CPU frequencies will increase significantly to the point where the wrap around becomes short enough to worry about.

piscisaureus · 2014-08-20T13:39:55Z

Thanks, @retep998. Useful.

orangemocha · 2014-08-20T15:17:19Z

Thanks, @retep998 and @piscisaureus, definitely useful!

From MSDN:
How often does QPC roll over?
Not less than 100 years from the most recent system boot, and potentially longer based on the underlying hardware timer used. For most applications, rollover isn't a concern.

Regarding switching to integer math, I actually think that floating point math is more reliable here, given that there are not guarantees on the range of magnitude of the performance counter frequency.

In fact, to be on the safer side, I think we could make uv_update_time call QueryPerformanceCounter directly, and do the int conversion after converting to the millisecond unit. See

libuv/src/win/util.c

Line 485 in a4f0350

return (uint64_t) ((double) counter.QuadPart / hrtime_frequency_);

In the worst case scenario, counter.QuadPart wraps every 100 years. We are converting to nanoseconds, and on a system with 1 microsecond resolution, this means multiplying by 1000. The conversion to uint64_t would then overflow every 100years/1000 = ~~36 days. Now the fallacy in this reasoning is that for the 64-bit counter to wrap every 100 years (~~= 2^32 seconds), the system would have to have <1ns resolution. But just to err on the safe side, I will modify uv_update_time to call QueryPerformanceCounter directly and do its own conversion. Sounds good?

Then yes to all your other points. Regarding uv__time_forward, do you know what scenario required this trick? I wonder if we can still have a problem because of rounding to the nearest millisecond.

saghul · 2014-09-04T18:03:56Z

If nothing unexpected happens I plan to release libuv 1.0.0 by the end of next week. Can you guys spend some cycles on this? @piscisaureus @orangemocha

orangemocha · 2014-09-04T20:44:34Z

Sorry, I have been swamped. Yes. I'll get to it by early next week.

saghul · 2014-09-04T20:46:32Z

Thanks Alexis!

piscisaureus · 2014-09-06T02:25:43Z

Cc @janjongboon

piscisaureus · 2014-09-06T02:26:54Z

Er, @janjongboom

orangemocha · 2014-09-09T16:13:06Z

Updated to address @piscisaureus ' feedback.

orangemocha · 2014-09-09T16:17:55Z

I have a question related to improving timer precision, though it shouldn't block this PR. Shouldn't uv_timer_start call uv_update_time before computing the timer's due time?

saghul · 2014-09-09T16:53:45Z

include/uv-win.h

+  /* This is unused, but we are keeping it to maintain ABI compatibility */   \
+  /* in case we need to revert uv_update_time to using GetTickCount */        \
+  /* in v0.12.x. */                                                           \
+  /* TODO: Remove in v0.14. */                                                \


We now have some reserved fields for backwards ABI emergencies, we can remove this I'd say.

saghul · 2014-09-09T16:57:32Z

I have a question related to improving timer precision, though it shouldn't block this PR. Shouldn't uv_timer_start call uv_update_time before computing the timer's due time?

Not sure. On Unix we don't. It's done once per loop iteration, so unless some code stalled the loop we should be OK I think.

orangemocha · 2014-09-09T18:08:32Z

Thanks @saghul ! Updated.

saghul · 2014-09-09T21:33:50Z

Thanks Alexis! LGTM. @piscisaureus any further comment?

piscisaureus · 2014-09-15T07:06:43Z

@saghul, sorry, yes.

the uv__time_forward() function is still there. Remove it?
we are now updating the time too often; there is no need to update the time in uv_poll and uv_poll_ex because uv_run also does it twice per loop iteration.

orangemocha · 2014-09-15T13:08:52Z

@piscisaureus I am confused. uv__time_forward was removed. Maybe you are looking at an outdated commit? https://github.com/joyent/libuv/pull/1165/files

I replaced the invocations of uv__time_forward() with uv_update_time(). If I just removed the invocations, I would get a couple of unit test failures (timer_from_check and timer_run_once). The point of uv__time_forward() seemed to be that we wanted the time elapsed to be reflected in the loop time, and we need to at least update the time for that to be the case.
timer_from_check is doing timer manipulation inside the check_cb and hence depends on the loop time to be updated. Is that still a valid scenario? That's part of the reason why I was asking if we sthould call up_update_time inside uv_timer_start.

saghul · 2014-09-15T21:10:15Z

I had a look at this and I believe @orangemocha is right and we need to update the time after we blocked for I/O, otherwise any timer started after polling for I/O will get the time wrong, as the timer_from_check test shows.

FWIW we also do it for Unix: https://github.com/joyent/libuv/blob/master/src/unix/linux-core.c#L203-L207

Now, one can also say that the call to uv_update_time could be removed from here: https://github.com/joyent/libuv/blob/master/src/win/core.c#L414 since it will have already been updated by the polling function.

saghul · 2014-09-15T21:12:57Z

@orangemocha: Moreover, I think we should always update the time after GetQueuedCompletionStatus, because if we stopped polling due to an I/O operation and schedule a timer in a check callback, we'd get the time wrong, because it was never updated.

orangemocha · 2014-09-16T07:04:15Z

Updated to call uv_update_time() unconditionally after GetQueuedCompletionStatus. I also made a change in test/test-timer.c, because now we are precise/fast enough that we were sometimes missing that 1 millisecond timer in the test.

orangemocha · 2014-09-16T07:09:38Z

@piscisaureus ' concern of updating the time too often is not unjustified though... now we update it multiple times and it might not even be needed. What do you guys think about updating it lazily, as in https://github.com/MSOpenTech/libuv/commit/b09abd7e849d9820c101489051bfb4aabb30a34b ?

saghul · 2014-09-16T07:39:49Z

Updated to call uv_update_time() unconditionally after GetQueuedCompletionStatus. I also made a change in test/test-timer.c, because now we are precise/fast enough that we were sometimes missing that 1 millisecond timer in the test.

Uh, how so? When running with UV_RUN_ONCE the loop will block for 1ms (in this case) and then, after I/O it will run timers again: https://github.com/joyent/libuv/blob/master/src/win/core.c#L405-L415 How can we miss it?

saghul · 2014-09-16T07:49:48Z

' concern of updating the time too often is not unjustified though... now we update it multiple times and it might not even be needed. What do you guys think about updating it lazily, as in MSOpenTech@b09abd7 ?

While that might be an improvement, can we do it as a follow-up to this PR? Just so that we don't loose track of things :-)

Now, about the patch: I think it makes the functions look a bit backwards: uv_update_time is expected to, well, update the time, but now it doesn't. And uv_now, which is supposed to return the current time is the one actually mutating it. It feels a bit off.

The idea of updating the time lazily sounds good though. One idea that crossed my mind would be to somehow "mark" the state of the loop, so we know we are in the section after polling for I/O, and only update the time if you start a timer or run uv_now, for example.

orangemocha · 2014-09-16T12:47:04Z

Definitely ok to do the lazy update as a follow up.

orangemocha · 2014-09-16T12:57:37Z

Regarding the test timer_run_once, I see what's going on. We are polling with a timeout equal to timer->due_time - loop->time, however when GetQueuedCompletedStatus returns and we update the time, the new loop->time is equal to timer->due_time -1. That is, GetQueuedCompletedStatus occasionally times out a bit early (I am guessing this could be due to rounding errors, and it never seems higher than 1 ms).

I can think of the following solutions:
a) Always add 1 to delta in uv__next_timeout()
b) Allow timers to fire 1 millisecond early
c) Upon timeout, set the loop time to the expected due time. Basically, bring back uv__time_forward().

IMO, c) is the best option. Rather than letting timers fire always late or always early, distort time a little bit when the wait ends a bit early.

saghul · 2014-09-16T13:13:22Z

@orangemocha yeah, c) looks like the best approach IMHO.

piscisaureus · 2014-09-17T05:48:25Z

History repeating itself. So whatever, c) seems best to me too.
I don't have anything against lazy updated, but can we just do the same thing on windows as we do on unix? I never imagined myself saying this, but there's so much back and forth for such a seemingly simple problem...

saghul · 2014-09-17T07:30:17Z

History repeating itself. So whatever, c) seems best to me too.
I don't have anything against lazy updated, but can we just do the same thing on windows as we do on unix? I never imagined myself saying this, but there's so much back and forth for such a seemingly simple problem...

Now the Unix and Windows code does look consistent... sans the time forward thing. For some reason Unix doesn't seem to suffer from such issue (returning a bit earlier than expected, that is).

PS: Once Alexis addresses this last thing I plan to land it and tag rc1.

Improving timing precision by using QueryPerformanceCounter. This is part of the fix for Node.js' test-timers-first-fire.js.

orangemocha · 2014-09-17T12:42:51Z

Updated.

saghul · 2014-09-17T13:05:39Z

Thanks Alexis! I'll merge it later today.

saghul · 2014-09-17T21:23:55Z

Landed in ✨ 6ced8c2 ✨. Thanks Alexis!

orangemocha · 2014-09-17T22:05:21Z

Yay, thanks!

saghul reviewed Aug 19, 2014
View reviewed changes

saghul added this to the v0.12 milestone Aug 20, 2014

saghul mentioned this pull request Aug 24, 2014

[Wish] uv_loop_t time can option use high resolution time on windows #1437

Closed

saghul mentioned this pull request Sep 5, 2014

windows: use multimedia timer instead of GetTickCount #929

Closed

saghul reviewed Sep 9, 2014
View reviewed changes

Windows: improve timer precision

63fc11c

Improving timing precision by using QueryPerformanceCounter. This is part of the fix for Node.js' test-timers-first-fire.js.

saghul closed this Sep 17, 2014

orangemocha mentioned this pull request Oct 14, 2014

Windows: update loop time lazily #1526

Open

orangemocha mentioned this pull request Dec 31, 2014

Timers sometimes fire early on Windows nodejs/node-v0.x-archive#8960

Closed

orangemocha mentioned this pull request Mar 26, 2015

win: more reliable uv__hrtime precision libuv/libuv#292

Closed

Windows: improve timer precision #1165

Windows: improve timer precision #1165

Conversation

orangemocha commented Feb 28, 2014

Nodejs-Jenkins commented Feb 28, 2014

piscisaureus commented Feb 28, 2014

retep998 commented Feb 28, 2014

orangemocha commented Mar 3, 2014

indutny commented Mar 3, 2014

orangemocha commented Mar 4, 2014

indutny commented Mar 4, 2014

orangemocha commented Mar 14, 2014

orangemocha commented Mar 14, 2014

orangemocha commented Apr 11, 2014

saghul commented Aug 9, 2014

orangemocha commented Aug 13, 2014

txdv commented Aug 13, 2014

saghul commented Aug 13, 2014

orangemocha commented Aug 19, 2014

saghul Aug 19, 2014

Choose a reason for hiding this comment

orangemocha Aug 19, 2014

Choose a reason for hiding this comment

saghul Aug 19, 2014

Choose a reason for hiding this comment

orangemocha Aug 19, 2014

Choose a reason for hiding this comment

piscisaureus commented Aug 19, 2014

retep998 commented Aug 19, 2014

retep998 commented Aug 19, 2014

piscisaureus commented Aug 20, 2014

orangemocha commented Aug 20, 2014

saghul commented Sep 4, 2014

orangemocha commented Sep 4, 2014

saghul commented Sep 4, 2014

piscisaureus commented Sep 6, 2014

piscisaureus commented Sep 6, 2014

orangemocha commented Sep 9, 2014

orangemocha commented Sep 9, 2014

saghul Sep 9, 2014

Choose a reason for hiding this comment

saghul commented Sep 9, 2014

orangemocha commented Sep 9, 2014

saghul commented Sep 9, 2014

piscisaureus commented Sep 15, 2014

orangemocha commented Sep 15, 2014

saghul commented Sep 15, 2014

saghul commented Sep 15, 2014

orangemocha commented Sep 16, 2014

orangemocha commented Sep 16, 2014

saghul commented Sep 16, 2014

saghul commented Sep 16, 2014

orangemocha commented Sep 16, 2014

orangemocha commented Sep 16, 2014

saghul commented Sep 16, 2014

piscisaureus commented Sep 17, 2014

saghul commented Sep 17, 2014

orangemocha commented Sep 17, 2014

saghul commented Sep 17, 2014

saghul commented Sep 17, 2014

orangemocha commented Sep 17, 2014