Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timers: fix subsequent enroll calls not working #19936

Closed
wants to merge 2 commits into from

Conversation

apapirovski
Copy link
Member

@apapirovski apapirovski commented Apr 11, 2018

This PR resolves one recently introduced bug and one long-standing incompatibility with browser behaviour:

  1. A bug was introduced in a recent semver-major PR which makes it impossible to enroll an already enrolled object. This resolves the issue by keeping the validation function at the top but moving the assignment of the potentially modified msecs below.

  2. Node.js currently doesn't match browser behaviour when it comes to intervals that throw during their execution. In all browsers, the interval continues running but in Node.js it will never be rescheduled after a throw nor will the destroy async hook for that interval fire.

Edit: The second change has been backed out now.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

A bug was introduced in nodejs#17704 which meant that subsequent
calls to enroll would unset the new _idleTimeout and the
enrolled object could never again function as a timer.
@apapirovski apapirovski added this to the 10.0.0 milestone Apr 11, 2018
@nodejs-github-bot nodejs-github-bot added the timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. label Apr 11, 2018
Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 11, 2018
@jasnell
Copy link
Member

jasnell commented Apr 11, 2018

Point 2 regarding throwing might push this into semver-major territory.

@apapirovski
Copy link
Member Author

@jasnell It's marked as dont-land on all the other versions for now. The 1st change is fixing a semver-major commit that hasn't landed in any releases yet.

For the second change, I'm fine if we want to have it be semver-major. Or we could also see how it does in 10.x and backport if there's a reason to do so. (It does bring us in line with the browsers and the spec.)

@Fishrock123 Fishrock123 added the semver-major PRs that contain breaking changes and should be released in the next major version. label Apr 11, 2018
Copy link
Contributor

@Fishrock123 Fishrock123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok uh I really think these should be in separate PRs:

PR 1 (or separate? whichever)

  • Commit 1, fixes an obvious bug of mine which I'm really surprised there wasn't a test for.
  • The part that makes the destroy hook fire should probably be fine and seem like a good idea... Maybe I'm for getting something though, since I'm pretty sure me and Trevor had it set like that for some reason.

PR 2

  • The browser compat part seems like a breaking change to me. Seems safer to separate it out.
    • As in, technically it's a breaking change - but I think you might need to be doing pretty bad already to actually get caught by it.

};

timers.enroll(enrollObj, 1);
timers.enroll(enrollObj, 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More reliable: put the setTimeout before this and then just re-enroll with 1. The list order should guarantee the call order is as we expect without depending on libuv behavior.

Copy link
Member Author

@apapirovski apapirovski Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but I want to indirectly test that it's getting the new value rather than just keeping the old one without depending on checking _idleTimeout. That could be another possible bug that someone could introduce.

@apapirovski
Copy link
Member Author

apapirovski commented Apr 12, 2018

Ok uh I really think these should be in separate PRs:

One of them fixes a breaking change and the other one, I suppose, is a breaking change (although I would call it a bug fix). Feels kind of unnecessary to open another PR now...

@apapirovski
Copy link
Member Author

ping @nodejs/tsc since this would ideally make it into 10.x

@benjamingr
Copy link
Member

I definitely think this should make it to v10 as semver-major, given a clean CITGM run

@apapirovski
Copy link
Member Author

given a clean CITGM run

Think the one above was ok but I'm not 100% certain what's expected and what's not these days, given my absence for the past month and a bit.

The CI was fine too.

@benjamingr
Copy link
Member

@apapirovski from what I can tell both were fine, it's just common phrasing to indicate that I'm +1 on landing and that that depends on the CI and citgm being green (which they currently are).

As in - if we make changes here (which I don't believe we need to at the moment tbh but others might disagree) we should run CITGM again.

@apapirovski
Copy link
Member Author

apapirovski commented Apr 12, 2018

As in - if we make changes here (which I don't believe we need to at the moment tbh but others might disagree) we should run CITGM again.

Haha we got our wires crossed. I think I got what you were saying but was trying to express I'm not 100% certain if the CITGM is actually fine or not since I'm not sure what the baseline is these days. 😆 (Indirectly prodding others to verify it for me...)

@Fishrock123
Copy link
Contributor

One of them fixes a breaking change and the other one, I suppose, is a breaking change (although I would call it a bug fix). Feels kind of unnecessary to open another PR now...

Not separating out breaking changes makes backporting significantly more difficult.

@apapirovski
Copy link
Member Author

Not separating out breaking changes makes backporting significantly more difficult.

Yeah but there's nothing to back-port here... one is a fix for a semver-major PR (which is landing in 10.x) and the other is semver-major itself. I would understand if they were going into different release lines but right now there's no chance of that happening.

@Fishrock123
Copy link
Contributor

Oh heh. True.

@Trott
Copy link
Member

Trott commented Apr 12, 2018

Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Needs one more TSC approval to land, I think.

@benjamingr
Copy link
Member

@Trott

LGTM. Needs one more TSC approval to land, I think.

Just wondering - Why?

@Trott
Copy link
Member

Trott commented Apr 12, 2018

Just wondering - Why?

Breaking changes require sign-off from two TSC members.

From https://github.com/nodejs/node/blob/565fd50b4ac81c8c3fca6a08ffcbc59a4b7b2504/COLLABORATOR_GUIDE.md#consensus-seeking:

Note that breaking changes (that is, pull requests that require an increase in
the major version number, known as semver-major changes) must be elevated for
review by the TSC
. This does not necessarily mean that the
PR must be put onto the TSC meeting agenda. If multiple TSC members approve
(LGTM) the PR and no Collaborators oppose the PR, it should be landed. Where
there is disagreement among TSC members or objections from one or more
Collaborators, semver-major pull requests may be put on the TSC meeting
agenda.

(Also: If that text is not sufficiently clear, then that's my fault and let's fix that. I'm 100% certain that the "multiple TSC members" is meant as "approval by at least 2 TSC members is required before landing" because I'm the one who came up with the original language. #7955)

@Fishrock123
Copy link
Contributor

Fishrock123 commented Apr 12, 2018

I'd like to think about the breaking change here a bit more still - we already don't follow browser behavior in several places and we aren't obligated to. Our platform is quite different from the browser for long running tasks and I'd like to consider the implications in people's applications a bit more.

@Trott
Copy link
Member

Trott commented Apr 13, 2018

I'd like to think about the breaking change here a bit more still - we already don't follow browser behavior in several places and we aren't obligated to. Our platform is quite different from the browser for long running tasks and I'd like to consider the implications in people's applications a bit more.

@Fishrock123 If you want to delay landing this to give more time to think about it, please put a red-X "Request Changes" on it. Otherwise, someone using git node land on this isn't going to get notified that it is in fact not yet ready to land.

@Trott
Copy link
Member

Trott commented Apr 13, 2018

(Also: It might be helpful if you could give a ballpark for how much longer you'd like to think. Days? Weeks? Something else?)

@apapirovski
Copy link
Member Author

apapirovski commented Apr 13, 2018

Our platform is quite different from the browser for long running tasks and I'd like to consider the implications in people's applications a bit more.

I don't think the current behaviour is intuitive. There's no indication that the timer won't get rescheduled just because the callback threw an error. In fact, it's mostly an implementation detail.

If someone is swallowing uncaughtException (which they probably shouldn't) or is using domains, then this interval just silently stops executing. In a situation where such an interval is meant to do some async work that is network dependant an error could be as simple as getting the wrong response from another service or something. Doesn't mean the interval should stop executing.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced our current behavior is better for Node.js.
However, I've not seen any good reasoning why we should adopt the browser way of handling errors in this regard, apart from "to increase compatibility". Can you please make a case for this change apart from "to increase compatibility"?

I'd prefer for the enroll() bug to be on a different PR.

@mcollina
Copy link
Member

If someone is swallowing uncaughtException (which they probably shouldn't) or is using domains, then this interval just silently stops executing. In a situation where such an interval is meant to do some async work that is network dependant an error could be as simple as getting the wrong response from another service or something. Doesn't mean the interval should stop executing.

Keep executing it can also cause memory leaks, and sudden behavior changes and bugs exposed in not-well-tested path in code. This is a very edge case: basically we are saying that there is a programmer error (goes to uncaughtException) and the user did not crash their application. This story looks very much like a memory leak to me.
Rescheduling can definitely cause a memory or file descriptor leak: as an example, the setInterval could potentially create a new socket each time.
I do not see how doing this improve things for our users in this condition.

@apapirovski
Copy link
Member Author

I fundamentally disagree that compatibility with the spec and browsers is a bad thing.

Anyway, I'll back out the setInterval rescheduling change from this PR and open a separate one for it.

@mcollina
Copy link
Member

You got me wrong. I think that increasing compatibility is a good thing but I’m not convinced that potentially adding memory leaks to our users code is worth it.

@apapirovski apapirovski removed the semver-major PRs that contain breaking changes and should be released in the next major version. label Apr 13, 2018
@apapirovski apapirovski changed the title timers: resolve two edge-case bugs timers: fix subsequent enroll calls not working Apr 13, 2018
@apapirovski
Copy link
Member Author

@mcollina I backed out the change and left the bug fix. Do you mind rescinding the requested changes?

Opening another two PRs with two possible solutions to the interval issue.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@apapirovski
Copy link
Member Author

Ok, opened two separate PRs for the setInterval issue:

non-semver-major that fixes the destroy hook not being called and can be backported across all release lines: #20001
semver-major from this PR: #20002

Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM

@apapirovski
Copy link
Member Author

CI: https://ci.nodejs.org/job/node-test-pull-request/14277/

(one commit was taken out, let's make sure it wasn't crucial to tests passing)

@jasnell
Copy link
Member

jasnell commented Apr 14, 2018

CI failure is unrelated.

@BridgeAR
Copy link
Member

Seems like the test is flaky:

https://ci.nodejs.org/job/node-test-binary-arm/434/RUN_SUBSET=1,label=pi1-docker/console

05:14:26 not ok 227 parallel/test-timers-enroll-second-time
05:14:26   ---
05:14:26   duration_ms: 3.79
05:14:26   severity: fail
05:14:26   exitcode: 1
05:14:26   stack: |-
05:14:26     (node:14533) [DEP0095] DeprecationWarning: timers.enroll() is deprecated. Please use setTimeout instead.
05:14:26     assert.js:250
05:14:26         throw err;
05:14:26         ^
05:14:26     
05:14:26     AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
05:14:26     
05:14:26       assert(didCall)
05:14:26     
05:14:26         at Object.enrollObj._onTimeout.common.mustCall (/home/iojs/build/workspace/node-test-binary-arm/test/parallel/test-timers-enroll-second-time.js:10:37)
05:14:26         at Object._onTimeout (/home/iojs/build/workspace/node-test-binary-arm/test/common/index.js:467:15)
05:14:26         at ontimeout (timers.js:427:11)
05:14:26         at tryOnTimeout (timers.js:289:5)
05:14:26         at listOnTimeout (timers.js:252:5)
05:14:26         at Timer.processTimers (timers.js:212:10)

@BridgeAR BridgeAR removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 16, 2018
@apapirovski
Copy link
Member Author

Didn't want to check _idleTimeout but I guess there's no other option to avoid flakiness (other than huge timeout values and/or putting it in sequential). Should be fixed.

New CI: https://ci.nodejs.org/job/node-test-pull-request/14323/

@apapirovski apapirovski added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 16, 2018
@apapirovski
Copy link
Member Author

Landed in 734eb17

@jasnell this should get into v10.x if at all possible to prevent a regression.

@apapirovski apapirovski deleted the patch-timers-bugs branch April 17, 2018 14:03
apapirovski added a commit that referenced this pull request Apr 17, 2018
A bug was introduced in #17704 which meant that subsequent
calls to enroll would unset the new _idleTimeout and the
enrolled object could never again function as a timer.

PR-URL: #19936
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
jasnell pushed a commit that referenced this pull request Apr 17, 2018
A bug was introduced in #17704 which meant that subsequent
calls to enroll would unset the new _idleTimeout and the
enrolled object could never again function as a timer.

PR-URL: #19936
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Copy link

@Johnnyboy87 Johnnyboy87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants