Skip to content
This repository has been archived by the owner on May 4, 2018. It is now read-only.

Windows: improve timer precision #1165

Closed
wants to merge 1 commit into from
Closed

Windows: improve timer precision #1165

wants to merge 1 commit into from

Conversation

orangemocha
Copy link

Improving timing precision by using QueryPerformanceCounter.

This is part of the fix for Node.js' test-timers-first-fire.js.

@Nodejs-Jenkins
Copy link

Thank you for contributing this pull request! Here are a few pointers to make sure your submission will be considered for inclusion.

The following commiters were not found in the CLA:

  • Alexis Campailla

You can fix all these things without opening another issue.

Please see CONTRIBUTING.md for more information

@piscisaureus
Copy link

This solution is kind of obvious. But in the past we deliberately didn't go for the QueryPerformanceCounter solution here because it turned out to have a significant performance impact. IIRC this effect was much more apparent when multiple node processes were running at the same time. That said, I tested mostly on what we would now consider "old laptop" - a HP laptop with a 2.8ghz dualcore Intel T9600 processor running windows 7. Maybe it's time for a re-evaluation.

An alternative would be to use timeGetTime (v8 does this too), and optionally use timeBeginPeriod to set the clock resolution to 1ms.

@retep998
Copy link
Contributor

Here are some numbers I obtained on Windows 8.1 x64, with a Phenom II 2.5 GHz processor.
26 nanoseconds per timeGetTime
32 nanoseconds per QueryPerformanceCounter
3 nanoseconds per GetTickCount

@orangemocha
Copy link
Author

Thanks for the feedback. QueryPerformanceCounter's latency is indeed dependent on the OS version and the specific system hardware. My latest commit enables the high-precision solution only when a reasonably fast implementation is detected.

@indutny
Copy link
Contributor

indutny commented Mar 3, 2014

I don't really enjoy this, this solution is timing dependent and people may suffer from expecting higher resolution and getting lower if at process startup system was very busy.

@orangemocha is it possible to statically detect this? On which configurations is it faster?

@orangemocha
Copy link
Author

Thank you for your feedback. Unfortunately there is no reliable static way of determining the type of Qpc implementation. In summary, different OS versions use different strategies and newer ones adapt to the hardware available, even using performance measurements. I don’t think it’s a good idea for us to try to guess that logic, and there is no API to do it. Foreseeably, newer systems will have improved Qpc implementations and the need for the GetTickCount approach will go away over time.

So IMHO the performance measurement is the most reliable way available to take advantage of fast high-precision timer implementations when available. I agree it’s not the prettiest thing, but I think it’s either this or sticking to GetTickCount and a 15.6ms resolution. Which approach do you like better? (This is a non-rhetorical question). What are the known scenarios for which we/people care about timer precision in node?

A different question: is the impact of even a slow Qpc implementation (think in the order of 1 microsecond) so terrible? We call uv_update_time once per iteration of the event loop, and during timer manipulations (e.g listOnTimeout). I could be wrong but this doesn’t seem like a huge deal.

Regarding you comment about busy startup time, I could try to improve the reliability of the initial measurement (e.g. by briefly raising the thread priority).

@indutny
Copy link
Contributor

indutny commented Mar 4, 2014

Ok, this makes sense. What do you think about adding configuration option for it? Preferably as an API method. If this sounds good to you - I guess we could just remove this hacks and just use QPC, if not configured otherwise.

@orangemocha
Copy link
Author

I ran the http benchmarks on 2 different machines. The results can be summarized as follows:

  1. Win7, fast Qpc: uv_update_time with Qpc yields +24% throughput.
  2. Win2k8R2, slow Qpc, uv_update_time with Qpc yields +10% throughput.

The increased average throughput is largely due to the benchmarks in http\chunked.js. For reasons unclear to me, the increased precision yields dramatic improvements in these tests (up to +300%).
Excluding the http\chunked tests, the average throughput delta observed is +1.2% on machine 1), and -2.3% on machine 2).

Based on these results, and assuming the http benchmarks are a good indicator, there doesn’t seem to be a significant performance degradation using QueryPerformanceCounter.

I updated the PR with the commit that uses QueryPerformanceCounter unconditionally.

@orangemocha
Copy link
Author

For the record, I gathered these measurements using the http benchmarks, with this command:
node benchmark\common.js http
I ran this all locally on each machine.
I had to replace wrk with apache benchmark because wrk doesn't build on Windows. I ran the tests 4 times per configuration and averaged the results. The standard deviation of the 4 runs is at most 1.8%.

@orangemocha
Copy link
Author

Minor change to keep last_tick_count for ABI compatibility, in case we ever want to roll this back.

@saghul
Copy link
Contributor

saghul commented Aug 9, 2014

What is the status of this? Do we land it, discard it?

@orangemocha
Copy link
Author

There hasn't been much discussion about this AFAIK.

My vote would be for landing it. v0.12.0 would be a good time to get some real world testing and get feedback on any negative performance impact. If that was the case I'd be on call for fixing it for the next minor release.

What's the decision process? If we decide to land it, I'll update the commit to resolve the merge conflict.

@txdv
Copy link
Contributor

txdv commented Aug 13, 2014

What commit?

@saghul
Copy link
Contributor

saghul commented Aug 13, 2014

There hasn't been much discussion about this AFAIK.

My vote would be for landing it. v0.12.0 would be a good time to get some real world testing and get feedback on any negative performance impact. If that was the case I'd be on call for fixing it for the next minor release.

What's the decision process? If we decide to land it, I'll update the commit to resolve the merge conflict.

Well, the decision process is that we trust each other on the respective areas of expertise :-) I can read the code, but clearly you and @piscisaureus know a lot more about it. If you guys think we should land it I'm happy to do so.

If anyone else knows about this and has some feedback to share, now would be a good time :-)

What commit?

The one on this PR: https://github.com/MSOpenTech/libuv/commit/fed2906fed784cbced7de8ab9908be60af70fb2c

@orangemocha
Copy link
Author

Updated to solve merge conflicts. @piscisaureus are you ok with this?

@@ -316,7 +316,8 @@ RB_HEAD(uv_timer_tree_s, uv_timer_s);
HANDLE iocp; \
/* The current time according to the event loop. in msecs. */ \
uint64_t time; \
/* GetTickCount() result when the event loop time was last updated. */ \
/* This is unused, but we are keeping it to maintain ABI compatibility. */ \
/* TODO: Remove in v0.14. */ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we care about ABI compatibility at this point, v0.10 and master diverged so much it's already broken in tens of places.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but actually this was intended to maintain ABI compatibility in case we decide to revert this change in v0.12.x.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Can you please clarify the comment a bit? " ...maintain ABI compatibility in case of revert" or something like that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@piscisaureus
Copy link

@orangemocha

I am nervous about what our chinese xp bootlegger user base is going to think, but let's give it a spin. But before you land:

  • Do you have any idea how often the QueryPerformanceCounter() reported value wraps around? What are typical values for QueryPerformanceFrequency()?
  • I am nervous for relying on uv__hrtime for precise loop timings because it uses floating point math and performance counter values can get extremely large. Can we switch back to using integer math?
  • This comment (and the surrounding logic) needs updating:

    libuv/src/win/timer.c

    Lines 210 to 217 in a4f0350

    /* A timeout value of UINT_MAX means infinite, so that's no good. But
    * more importantly, there's always the risk that GetTickCount wraps.
    * uv_update_time can detect this, but we must make sure that the
    * tick counter never overflows twice between two subsequent
    * uv_update_time calls. We do this by never sleeping more than half
    * the time it takes to wrap the counter - which is huge overkill,
    * but hey, it's not so bad to wake up every 25 days.
    */
  • The timer-fudge-forward logic should no longer be necessary. Remove the uv__time_forward function as well as it's invocations (e.g.

    libuv/src/win/core.c

    Lines 320 to 324 in a4f0350

    /* We're sure that at least `timeout` milliseconds have expired, but
    * this may not be reflected yet in the GetTickCount() return value.
    * Therefore we ensure it's taken into account here.
    */
    uv__time_forward(loop, timeout);
    ).
  • Can we assume that the QueryPerformanceCounter() result only ever goes up? If so, I would like to see the if statement here removed: https://github.com/joyent/libuv/pull/1165/files#diff-7a69559815db50b6df21eb36dbae4333R40
  • For clarity, I would like to see the inline number (1000000) be replaced by a constant or preprocessor define: https://github.com/joyent/libuv/pull/1165/files#diff-7a69559815db50b6df21eb36dbae4333R33

@retep998
Copy link
Contributor

A quick test on my machine reported 14318180 for the frequency, so 40,853 years until a wrap around.
Even if we had a system that went overkill with nanosecond resolution, that would be 584 years until a wrap around.

@retep998
Copy link
Contributor

Some googling found that someone has a system which reports 3591630000 for the frequency, which would leave them with only 162 years until a wrap around. Assuming this is a worst case scenario tied to the CPU frequency, I have no reason to believe CPU frequencies will increase significantly to the point where the wrap around becomes short enough to worry about.

@piscisaureus
Copy link

Thanks, @retep998. Useful.

@saghul saghul added this to the v0.12 milestone Aug 20, 2014
@orangemocha
Copy link
Author

Thanks, @retep998 and @piscisaureus, definitely useful!

From MSDN:
How often does QPC roll over?
Not less than 100 years from the most recent system boot, and potentially longer based on the underlying hardware timer used. For most applications, rollover isn't a concern.

Regarding switching to integer math, I actually think that floating point math is more reliable here, given that there are not guarantees on the range of magnitude of the performance counter frequency.

In fact, to be on the safer side, I think we could make uv_update_time call QueryPerformanceCounter directly, and do the int conversion after converting to the millisecond unit. See

return (uint64_t) ((double) counter.QuadPart / hrtime_frequency_);

In the worst case scenario, counter.QuadPart wraps every 100 years. We are converting to nanoseconds, and on a system with 1 microsecond resolution, this means multiplying by 1000. The conversion to uint64_t would then overflow every 100years/1000 = 36 days. Now the fallacy in this reasoning is that for the 64-bit counter to wrap every 100 years (= 2^32 seconds), the system would have to have <1ns resolution. But just to err on the safe side, I will modify uv_update_time to call QueryPerformanceCounter directly and do its own conversion. Sounds good?

Then yes to all your other points. Regarding uv__time_forward, do you know what scenario required this trick? I wonder if we can still have a problem because of rounding to the nearest millisecond.

@saghul
Copy link
Contributor

saghul commented Sep 4, 2014

If nothing unexpected happens I plan to release libuv 1.0.0 by the end of next week. Can you guys spend some cycles on this? @piscisaureus @orangemocha

@orangemocha
Copy link
Author

Sorry, I have been swamped. Yes. I'll get to it by early next week.

@saghul
Copy link
Contributor

saghul commented Sep 4, 2014

Thanks Alexis!

@piscisaureus
Copy link

Cc @janjongboon

@piscisaureus
Copy link

Er, @janjongboom

@orangemocha
Copy link
Author

Updated to address @piscisaureus ' feedback.

@orangemocha
Copy link
Author

I have a question related to improving timer precision, though it shouldn't block this PR. Shouldn't uv_timer_start call uv_update_time before computing the timer's due time?

/* This is unused, but we are keeping it to maintain ABI compatibility */ \
/* in case we need to revert uv_update_time to using GetTickCount */ \
/* in v0.12.x. */ \
/* TODO: Remove in v0.14. */ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have some reserved fields for backwards ABI emergencies, we can remove this I'd say.

@saghul
Copy link
Contributor

saghul commented Sep 9, 2014

I have a question related to improving timer precision, though it shouldn't block this PR. Shouldn't uv_timer_start call uv_update_time before computing the timer's due time?

Not sure. On Unix we don't. It's done once per loop iteration, so unless some code stalled the loop we should be OK I think.

@orangemocha
Copy link
Author

Thanks @saghul ! Updated.

@saghul
Copy link
Contributor

saghul commented Sep 9, 2014

Thanks Alexis! LGTM. @piscisaureus any further comment?

@piscisaureus
Copy link

@saghul, sorry, yes.

  • the uv__time_forward() function is still there. Remove it?
  • we are now updating the time too often; there is no need to update the time in uv_poll and uv_poll_ex because uv_run also does it twice per loop iteration.

@orangemocha
Copy link
Author

@piscisaureus I am confused. uv__time_forward was removed. Maybe you are looking at an outdated commit? https://github.com/joyent/libuv/pull/1165/files

I replaced the invocations of uv__time_forward() with uv_update_time(). If I just removed the invocations, I would get a couple of unit test failures (timer_from_check and timer_run_once). The point of uv__time_forward() seemed to be that we wanted the time elapsed to be reflected in the loop time, and we need to at least update the time for that to be the case.
timer_from_check is doing timer manipulation inside the check_cb and hence depends on the loop time to be updated. Is that still a valid scenario? That's part of the reason why I was asking if we sthould call up_update_time inside uv_timer_start.

@saghul
Copy link
Contributor

saghul commented Sep 15, 2014

I had a look at this and I believe @orangemocha is right and we need to update the time after we blocked for I/O, otherwise any timer started after polling for I/O will get the time wrong, as the timer_from_check test shows.

FWIW we also do it for Unix: https://github.com/joyent/libuv/blob/master/src/unix/linux-core.c#L203-L207

Now, one can also say that the call to uv_update_time could be removed from here: https://github.com/joyent/libuv/blob/master/src/win/core.c#L414 since it will have already been updated by the polling function.

@saghul
Copy link
Contributor

saghul commented Sep 15, 2014

@orangemocha: Moreover, I think we should always update the time after GetQueuedCompletionStatus, because if we stopped polling due to an I/O operation and schedule a timer in a check callback, we'd get the time wrong, because it was never updated.

@orangemocha
Copy link
Author

Updated to call uv_update_time() unconditionally after GetQueuedCompletionStatus. I also made a change in test/test-timer.c, because now we are precise/fast enough that we were sometimes missing that 1 millisecond timer in the test.

@orangemocha
Copy link
Author

@piscisaureus ' concern of updating the time too often is not unjustified though... now we update it multiple times and it might not even be needed. What do you guys think about updating it lazily, as in https://github.com/MSOpenTech/libuv/commit/b09abd7e849d9820c101489051bfb4aabb30a34b ?

@saghul
Copy link
Contributor

saghul commented Sep 16, 2014

Updated to call uv_update_time() unconditionally after GetQueuedCompletionStatus. I also made a change in test/test-timer.c, because now we are precise/fast enough that we were sometimes missing that 1 millisecond timer in the test.

Uh, how so? When running with UV_RUN_ONCE the loop will block for 1ms (in this case) and then, after I/O it will run timers again: https://github.com/joyent/libuv/blob/master/src/win/core.c#L405-L415 How can we miss it?

@saghul
Copy link
Contributor

saghul commented Sep 16, 2014

' concern of updating the time too often is not unjustified though... now we update it multiple times and it might not even be needed. What do you guys think about updating it lazily, as in MSOpenTech@b09abd7 ?

While that might be an improvement, can we do it as a follow-up to this PR? Just so that we don't loose track of things :-)

Now, about the patch: I think it makes the functions look a bit backwards: uv_update_time is expected to, well, update the time, but now it doesn't. And uv_now, which is supposed to return the current time is the one actually mutating it. It feels a bit off.

The idea of updating the time lazily sounds good though. One idea that crossed my mind would be to somehow "mark" the state of the loop, so we know we are in the section after polling for I/O, and only update the time if you start a timer or run uv_now, for example.

@orangemocha
Copy link
Author

Definitely ok to do the lazy update as a follow up.

@orangemocha
Copy link
Author

Regarding the test timer_run_once, I see what's going on. We are polling with a timeout equal to timer->due_time - loop->time, however when GetQueuedCompletedStatus returns and we update the time, the new loop->time is equal to timer->due_time -1. That is, GetQueuedCompletedStatus occasionally times out a bit early (I am guessing this could be due to rounding errors, and it never seems higher than 1 ms).

I can think of the following solutions:
a) Always add 1 to delta in uv__next_timeout()
b) Allow timers to fire 1 millisecond early
c) Upon timeout, set the loop time to the expected due time. Basically, bring back uv__time_forward().

IMO, c) is the best option. Rather than letting timers fire always late or always early, distort time a little bit when the wait ends a bit early.

@saghul
Copy link
Contributor

saghul commented Sep 16, 2014

@orangemocha yeah, c) looks like the best approach IMHO.

@piscisaureus
Copy link

History repeating itself. So whatever, c) seems best to me too.
I don't have anything against lazy updated, but can we just do the same thing on windows as we do on unix? I never imagined myself saying this, but there's so much back and forth for such a seemingly simple problem...

@saghul
Copy link
Contributor

saghul commented Sep 17, 2014

History repeating itself. So whatever, c) seems best to me too.
I don't have anything against lazy updated, but can we just do the same thing on windows as we do on unix? I never imagined myself saying this, but there's so much back and forth for such a seemingly simple problem...

Now the Unix and Windows code does look consistent... sans the time forward thing. For some reason Unix doesn't seem to suffer from such issue (returning a bit earlier than expected, that is).

PS: Once Alexis addresses this last thing I plan to land it and tag rc1.

Improving timing precision by using QueryPerformanceCounter.

This is part of the fix for Node.js' test-timers-first-fire.js.
@orangemocha
Copy link
Author

Updated.

@saghul
Copy link
Contributor

saghul commented Sep 17, 2014

Thanks Alexis! I'll merge it later today.

@saghul
Copy link
Contributor

saghul commented Sep 17, 2014

Landed in ✨ 6ced8c2 ✨. Thanks Alexis!

@saghul saghul closed this Sep 17, 2014
@orangemocha
Copy link
Author

Yay, thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants