Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timers: fix priority queue removeAt #24322

Conversation

apapirovski
Copy link
Member

@apapirovski apapirovski commented Nov 12, 2018

The current algorithm for removeAt is pretty incorrect and only works when we continually call shift to rebalance the tree. (Don't ask me what I was thinking... in retrospect this makes very little sense.)

Fixes: #24320
Fixes: #24362

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@apapirovski apapirovski added the timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. label Nov 12, 2018
@apapirovski apapirovski requested a review from cjihrig November 12, 2018 17:09
Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm not really familiar with this part of the code, I can confirm that it fixes my issue in #24320.

@apapirovski apapirovski force-pushed the apapirovski/fix-binary-heap-remove-at branch from 65e4fbc to 2203dcc Compare November 12, 2018 17:18
@apapirovski
Copy link
Member Author

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mareksrom
Copy link

Hello, I would suggest small change in removeAt and it is necessary to integrate updateAt like in my PR, but updates version - that must be used in timers.js instead of precolateDown(1)... calling your removeAt with out of bounds pos would cause inconsistent data and updateAt can be used as part of removeAt because it only updates moved last node... The rest looks good👍

  removeAt(pos) {
    const heap = this[kHeap];
    const size = this[kSize] -1
    if (size > 0 && pos <= size) {
      this[kSize] = size;
      heap[pos] = heap[size + 1];
      heap[size + 1] = undefined;
      this.updateAt(pos);
    }
  }

  updateAt(pos) {
    const heap = this[kHeap];
    if (pos > 1 && this[kCompare](heap[pos / 2 | 0], heap[pos]) > 0)
      this.percolateUp(pos);
    else
      this.percolateDown(pos);
  }  

@mareksrom
Copy link

It would be nice to be released in 11.1.1😊

@mareksrom
Copy link

One more check - pos > 0 in removeAt...

removeAt(pos) {
    const heap = this[kHeap];
    const size = this[kSize] -1
    if (pos > 0 && size > 0 && pos <= size) {
      this[kSize] = size;
      heap[pos] = heap[size + 1];
      heap[size + 1] = undefined;
      this.updateAt(pos);
    }
  }

@apapirovski
Copy link
Member Author

@mareksrom Can you elaborate on some of these?

One more check - pos > 0 in removeAt...

Why is this needed? The expectation here should be that no one calls removeAt with invalid values. This isn't a user-facing API and as such this check seems superfluous.

Hello, I would suggest small change in removeAt and it is necessary to integrate updateAt like in my PR, but updates version - that must be used in timers.js instead of precolateDown(1)

Why is this change necessary? The position of that node is always 1 due to the semantics of where it's called from and this is the correct algorithm for an increase key operation.

Happy to adjust if you can elaborate on what I'm missing or provide a test case.

@mareksrom
Copy link

Regarding the checks - of course if we expect noone calls it with wrong argument - it does not have to be there, just used check that is already there
'''
if (size > 0 && pos <= size)
'''
and improved it to make the method safe...

regadring the updateAt I didn't test it, of course the list is peeked from queue and at this point it is root of binary heap, but then runNextTicks can be executed and the queue I think could be changed, e.g. the timer re-scheduled... Maybe this cannot happen, I didn't try to produce test case, I can try, but using updateAt is safe all the ways with just 1 compare overhead:)

@mareksrom
Copy link

It looks like clearTimeout/unenroll during runNextTicks could remove queue root while it is peeked and if rescheduled percolateDown(1) would not do the things right... To prepare this test case will take some time...

@apapirovski
Copy link
Member Author

It looks like clearTimeout/unenroll during runNextTicks could remove queue root while it is peeked and if rescheduled percolateDown(1) would not do the things right... To prepare this test case will take some time...

There are two scenarios:

  1. We remove a timeout from that queue and it's not the last item, it doesn't affect the priority queue in any way and we can still call percolateDown(1).

  2. We remove a timeout from that list and it's the last item, then we also remove the list from the timers lists and from the priority queue. In that case nothing more needs to be done.

This is performance sensitive code so we should make sure we understand what's going on and do the minimal work required to handle all the cases correctly.

@mareksrom
Copy link

I know both of then... I saw timers code first time yesterday, so not so skilled in it, so sorry for bothering:) Probably you are right that under normal circumstances user code is not able to add the timer list to the root of priority queue while the root is being processed, however it could happen in case of e.g. NTP synchronisation when moving time back, than inserted timer list could have expiry earlier than currently processed list and things could go wrong... For me, there is too many conditions with open space of running next tick code to rely on percolateDown😊 I see there some other minor issues with timers while investigating the code, I open new issue when I collect it down...

@apapirovski
Copy link
Member Author

apapirovski commented Nov 13, 2018

however it could happen in case of e.g. NTP synchronisation when moving time back, than inserted timer list could have expiry earlier than currently processed list and things could go wrong

We rely on a monotonic clock so if there are skips back, we would also have problems at the C++ layer. There's also no need to apologize — I appreciate you spending the time on this, I simply want to make sure, if we're changing the code, we understand why we're doing it and not doing it "just in case".

I see there some other minor issues with timers while investigating the code, I open new issue when I collect it down...

That would be great. Thanks!

@BridgeAR
Copy link
Member

I would like to fast track this to pull this into the v11.2 release.

Collaborators please 👍 if that's alright for you.

@apapirovski
Copy link
Member Author

@BridgeAR if you're not getting the release out before tomorrow at 9am PST then this can land organically by then anyway. If the release does need to go out before then, perhaps directly ping the people that have approved this.

@starkwang
Copy link
Contributor

Does this PR also fix #24362 ?

@apapirovski
Copy link
Member Author

@starkwang Yup

Trott pushed a commit to Trott/io.js that referenced this pull request Nov 15, 2018
PR-URL: nodejs#24322
Fixes: nodejs#24320
Fixes: nodejs#24362
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Weijia Wang <starkwang@126.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
@Trott
Copy link
Member

Trott commented Nov 15, 2018

Landed in 9ca5c52

@Trott Trott closed this Nov 15, 2018
@BridgeAR BridgeAR added the notable-change PRs with changes that should be highlighted in changelogs. label Nov 15, 2018
BridgeAR pushed a commit that referenced this pull request Nov 15, 2018
PR-URL: #24322
Fixes: #24320
Fixes: #24362
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Weijia Wang <starkwang@126.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
targos pushed a commit that referenced this pull request Nov 15, 2018
Notable changes:

* deps:
  * A new experimental HTTP parser (`llhttp`) is now supported.
    #24059
* timers:
  * Fixed an issue that could cause setTimeout to stop working as
    expected. #24322
* Windows
  * A crashing process will now show the names of stack frames if the
    node.pdb file is available.
    #23822
  * Continued effort to improve the installer's new stage that installs
    native build tools.
    #23987,
    #24348
  * child_process:
    * On Windows the `windowsHide` option default was restored to
      `false`. This means `detached` child processes and GUI apps will
      once again start in a new window.
      #24034
* Added new collaborators:
  * [oyyd](https://github.com/oyyd) - Ouyang Yadong.
    #24300
  * [psmarshall](https://github.com/psmarshall) - Peter Marshall.
    #24170
  * [shisama](https://github.com/shisama) - Masashi Hirano.
    #24136
targos pushed a commit that referenced this pull request Nov 15, 2018
Notable changes:

* deps:
  * A new experimental HTTP parser (`llhttp`) is now supported.
    #24059
* timers:
  * Fixed an issue that could cause setTimeout to stop working as
    expected. #24322
* Windows
  * A crashing process will now show the names of stack frames if the
    node.pdb file is available.
    #23822
  * Continued effort to improve the installer's new stage that installs
    native build tools.
    #23987,
    #24348
  * child_process:
    * On Windows the `windowsHide` option default was restored to
      `false`. This means `detached` child processes and GUI apps will
      once again start in a new window.
      #24034
* Added new collaborators:
  * [oyyd](https://github.com/oyyd) - Ouyang Yadong.
    #24300
  * [psmarshall](https://github.com/psmarshall) - Peter Marshall.
    #24170
  * [shisama](https://github.com/shisama) - Masashi Hirano.
    #24136

PR-URL: #24350
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notable-change PRs with changes that should be highlighted in changelogs. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

timers wrong behavior timers: timer starvation