Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

timers: float delays can cause 100% cpu usage. #8067

Conversation

chrisdickinson
Copy link

The changes in befbbad switch all comparisons over to unsigned
integer comparisons, but in certain cases list.msecs will be fractional.
If this is the case, the timer will never be removed from the list.
This fixes the issue by casting msecs to an integer by rounding up
or down.

Fixes #8065.


I might need some advice on how best to test this. The below script will intermittently freeze and block the event loop, but I can't reliably trigger this behavior 100% of the time -- there is a small percentage of times where it'll run to completion.

setTimeout(function() {
  console.log('ummm')
}, 100.3)

setTimeout(function() {
  console.log('ERRRR')
}, 4.3)

setTimeout(function() {
  console.log('ok')
}, 100)

The changes in befbbad switch all comparisons over to unsigned
integer comparisons, but in certain cases list.msecs will be fractional.
If this is the case, the timer will never be removed from the list.
This fixes the issue by casting msecs to an integer by rounding up
or down.

Fixes 8065.
@misterdjules
Copy link

The actual issue is that when we call start with a timeout argument whole value is >= 0 and < 1 (note the use of the strict inequality), it creates an infinite recursion that follows the following pattern:

  1. Some time has passed, listOnTimeout is called.
  2. listOnTimeout determines that the remaining time for the timer is a non-integer value > 0 and < 1.
  3. TimerWrap::Start is called and the timeout value is rounded to 0 by the double to integer coercion.
  4. uv__run_timers runs and if (handle->timeout > loop->time) returns false for this timer, since in this case handle->timeout is actually equal to loop->time. Thus, the callback listOnTimeout is called again, and we go back to step 2, but _without having updated the event loop's time with uv_update_time. It's like time is frozen for the next calls to listOnTimeout, and the same pattern repeats indefinitely.

Before befbbad, listOnTimeout was using Date.now() (from v8), whose value is not cached and does not need to be updated to reflect the actual current time.

This problem doesn't happen on master because of 93b0624. Applying this change to v0.10.30 fixes the issue, and it doesn't break any test. What is your opinion about this @tjfontaine?

@chrisdickinson I am not sure using Math.round is what we want. The documentation about setTimeout describes it as following:
To schedule execution of a one-time callback after delay milliseconds.
I think I've heard @tjfontaine saying that calling the callback after the delay should be a strong guarantee. The test that comes with 93b0624 seems to suggest the same:
assert.ok(delta > 0, 'Timer fired early');
Using Math.round, however, we could call the timer callback before its delay.

We could also fix this issue without too much impact on other code by clamping the timeout parameter that is passed to TimerWrap:Start and making sure its value is at least 1.

My preference goes to backporting 93b0624 . @chrisdickinson @tjfontaine What do you guys think? I'd rather proceed with caution with this one, since I'm the one who broke 0.10.30.

Last but not least, thank you @chrisdickinson for all your help!

@misterdjules
Copy link

@chrisdickinson I thought about a strategy to test this issue reliably, but I haven't come up with anything yet. Please let me know if you find out anything. Thank you again!

@chrisdickinson
Copy link
Author

@misterdjules That all looks correct to me -- thanks for the great writeup! So, there should be some protection against (0... 1) values making it into msecs, but! I think you're correct. I've switched to ceil'ing the result, earlier in the function, to make sure that msecs is outside of that (0... 1) danger zone.

@misterdjules
Copy link

@chrisdickinson Great, that gives us another candidate for a fix.

What I like about the solution that backports 93b0624 is that it seems more general.

For instance, I think that your fix would leave _unrefActive vulnerable to the same issue, it would need to ceil msecs in the same way as insert. I haven't written a test for that though, I'm going to do that now.

To a lesser extent, ceiling msecs in listOnTimeout fixes the issue until another code path (e.g: a future bug fix) that starts timers stored in this list does not ceil the value. Without a proper test, it is even more likely to happen without being detected.

@misterdjules
Copy link

@chrisdickinson I've added a repro for the same issue as #8065, but this time for internal timers. With the fix that ceils msecs in insert, the issue is not fixed for internal timers. As expected, we would need to ceil msecs in _unrefActive too.

With the backport of 93b0624, the issue with internal timers is fixed too.

@jbergstroem
Copy link
Member

I can consistently reproduce this in my app for v0.10.30 (ran into it just this morning). Will possibly cause a lot of headaches since this version is a recommended upgrade based on the v8 issue.

@tjfontaine
Copy link

We need to backport 93b0624 as well, I'm sorry I missed that during review

@misterdjules
Copy link

@tjfontaine OK, I'll send you a PR that backports 93b0624 asap.

@misterdjules
Copy link

Just for the record, PR submitted here: #8073 .

@chrisdickinson
Copy link
Author

Closing this in favor of #8073.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants