Skip to content
This repository has been archived by the owner on Oct 7, 2020. It is now read-only.

[Converge] timers: Avoid linear scan in _unrefActive. #23

Closed
jasnell opened this issue May 21, 2015 · 29 comments
Closed

[Converge] timers: Avoid linear scan in _unrefActive. #23

jasnell opened this issue May 21, 2015 · 29 comments

Comments

@jasnell
Copy link
Member

jasnell commented May 21, 2015

See: nodejs/node-v0.x-archive@934bfe2
/cc @misterdjules

Summary of original commit message:

Before this change, _unrefActive would keep the unrefList sorted when
adding a new timer.

Because _unrefActive is called extremely frequently, this linear scan
(O(n) at worse) would make _unrefActive show high in the list of
contributors when profiling CPU usage.

This commit changes _unrefActive so that it doesn't try to keep the
unrefList sorted. The insertion thus happens in constant time."

We need to reconcile this against recent io.js changes within timers.js. Is the original sorted unrefList still there. This patch will be need to be ported if so.

@Fishrock123
Copy link
Contributor

_unrefActive() hasn't been touched, this could probably be ported to io.js.. I had been meaning to look into doing it simpler but hadn't yet.

Edit: hmm, more git conflicts than I had expected.

@Fishrock123
Copy link
Contributor

Also:

@piscisaureus
Copy link
Contributor

This patch was deliberately left out of io.js, because I felt that Timer.unref() should be really be O(1) - which it always had been prior to nodejs/node@f46ad01

@Fishrock123
Copy link
Contributor

@piscisaureus But isn't _unrefActive() for internal timeouts that are explicitly unrefed on creation? What is the point of unref()-ing those? I admit it is a little hacky to effectively have two types of timers..

@bnoordhuis
Copy link
Member

@Fishrock123 timers._unrefActive() executes an O(n) operation (a linear scan of the active internal timers) and is normally called repeatedly. For example, a timer that is associated with a net.Socket calls it every time data is sent or received.

@Fishrock123
Copy link
Contributor

@bnoordhuis understood. I don't really get what's so bad about the patch(es) in question though, although I am quite sure they are not perfect. Running O(n) at the timeout phase is a bit weird and usually undesired, but it does make some sense for timeouts that usually shouldn't timeout.

What's the other option, making it a sorted list again and using an (average) O(n log n) ((I think?)) search? Won't a timing wheel still be somewhat O(n log n) at best too?

Btw, how far did you get on that timers re-write? Even if it's not done I may be interested in seeing how things could be changed. :)

@bnoordhuis
Copy link
Member

Running O(n) at the timeout phase is a bit weird and usually undesired, but it does make some sense for timeouts that usually shouldn't timeout.

It's not O(n) on expiry, it's O(n) every time something happens that resets (moves back) the timeout. A socket can generate hundreds or thousands of timers._unrefActive() calls during its lifetime. Multiply that by the number of active sockets (thousands, usually) and you can see why that is a problem.

I've repeatedly had timers._unrefActive() show up as the biggest cost center in some of our (StrongLoop's) customers' applications. I reported that as nodejs/node-v0.x-archive#8160 and that is what eventually led up to nodejs/node-v0.x-archive@934bfe2. See the discussion in the issue for pros and cons on the current fix.

What's the other option, making it a sorted list again and using an (average) O(n log n) ((I think?)) search? Won't a timing wheel still be somewhat O(n log n) at best too?

A timer wheel has the benefit that insertion and deletion (by far most common operations) are O(1). Even a logarithmic data structure is still an improvement over the current O(n^2) approach.

Btw, how far did you get on that timers re-write? Even if it's not done I may be interested in seeing how things could be changed. :)

I had a working prototype based on a binary heap but I scrapped it because it didn't preserve insertion order. (I probably could have worked around that with a monotonic counter but I didn't want to go there.)

It should be in a branch in my fork but I'm not sure which. https://github.com/bnoordhuis/io.js/commits/timer-heap contains just the heap implementation.

@Fishrock123
Copy link
Contributor

It's not O(n) on expiry, it's O(n) every time something happens that resets (moves back) the timeout.

Do you mean the current state in io.js, or with nodejs/node-v0.x-archive@934bfe2? (I think I understood that was currently the case in io.js, but not with Julien's patch.)

@bnoordhuis
Copy link
Member

Oh, I think I understand your 'Running O(n) at the timeout phase is a bit weird' comment now.

Yes, you're right: io.js is O(n) reset + O(1) expiry whereas joyent/node is the other way around, O(1) reset + O(n) expiry. Both approaches are terrible, only under different workloads.

@jasnell
Copy link
Member Author

jasnell commented May 27, 2015

Ok, so in that case we need to come to a decision: which approach is less terrible and should we port this particular set of patches or not?

@jasnell
Copy link
Member Author

jasnell commented May 27, 2015

TSC discussion today: hold off landing these. timers need new impl. Leaving this open for now but tagging as do-not-land
@misterdjules ... please weigh in here if you have specific concerns.

@rvagg
Copy link
Member

rvagg commented May 27, 2015

we may need to consider a backup plan for this, timers has the potential to be a deep rabbit hole and it'd be nice if a "rewrite" doesn't end up deferring a release

@jasnell
Copy link
Member Author

jasnell commented May 28, 2015

I don't think there's a risk of that in this case. Unless one of the other
joyent/node maintainers objects, it's likely safe to stick with the current
io.js impl on this until the rewrite is done.
On May 27, 2015 4:48 PM, "Rod Vagg" notifications@github.com wrote:

we may need to consider a backup plan for this, timers has the potential
to be a deep rabbit hole and it'd be nice if a "rewrite" doesn't end up
deferring a release


Reply to this email directly or view it on GitHub
nodejs/node#23 (comment).

@Fishrock123
Copy link
Contributor

we may need to consider a backup plan for this, timers has the potential to be a deep rabbit hole and it'd be nice if a "rewrite" doesn't end up deferring a release

So, I won't be doing a re-write I don't think. An actual re-write should touch parts I don't really understand (like full _handle management)

Either current io.js or current node.js is bad. I do think Julien's commit alleviates the more common case however, but for a tradeoff that is philosophically worse.

@rvagg is there any way we can get some build-ish machine for running perf profiling on? A problem I'll have here is a lack of access to the perf tools linux has.

@rvagg
Copy link
Member

rvagg commented May 28, 2015

@Fishrock123 no problems, ssh root@45.55.209.17, it's all yours, just let me know when you're done with it. Ping me if you need setup help, @trevnorris probably has some gists with instructions on perf tooling for Linux too so you can hassle him if you need help on that end.

@Fishrock123
Copy link
Contributor

@rvagg what sort of machine is that btw? And what dir do you want me to use for this? Some help would probably be great once I get into it, I'd mostly be going off stuff I've heard from @bnoordhuis in the past.

@rvagg
Copy link
Member

rvagg commented May 28, 2015

I've given you Ubuntu 15.04 since I believe that the most recent kernel is best for this kind of work. Use whatever directory and user you like, I'd just probably use root in its home directory for this, the machine is exclusive to your use and is throwawayable. I've installed git and build-essential so you should have the basics that you need for getting a build going.

@rvagg
Copy link
Member

rvagg commented May 28, 2015

Here's an older one of @trevnorris's gists on the topic which includes instructions on getting and using a recent version of perf: https://gist.github.com/trevnorris/7712539 - update for a more recent kernel version for download.

@Fishrock123
Copy link
Contributor

👍 Thanks!

@misterdjules
Copy link

I'm back from vacation and I finally had some time to take a look at this.

There is a lot of background info in nodejs/node-v0.x-archive#8751 and nodejs/node-v0.x-archive#8160. The goal originally was to fix issues such as nodejs/node-v0.x-archive#6447 and https://groups.google.com/forum/#!searchin/nodejs/_unrefActive/nodejs/Uc-0BOCicyU/9UcFbcBsdK4J.

I also had documented this work here: https://github.com/joyent/node/wiki/Optimizing-_unrefActive, although this document is slightly outdated. For instance it doesn't mention what we ended up landing.

The unordered list implementation was known to be theoretically inferior, and an implementation based on a binary heap had been tested. The binary heap implementation was performing slightly worse than the unordered list but still much better than the original implementation.

The main issue with the heap implementation was that at that time we were not able to make a decision on how to integrate an internal heap implementation into the code base. My implementation was using @tjfontaine's binary heap module as a proof of concept, but I couldn't get any input from other contributors on whether or not we should add that as an internal module.

A timer wheel implementation was left out initially because we thought that it might be difficult to tune and we wanted to explore what the performance was with an unordered list and a binary heap first.

The unordered list implementation was solving what was thought to be the most common use case, and seemed to solve the problem for users (see https://groups.google.com/d/msg/nodejs/Uc-0BOCicyU/mh21O3MGj94J) and we agreed to land it as a temporary solution until we could come up with a better solution either with a heap, a timer wheel or anything else.

As far as I know, we had just a small number of users who complained about _unrefActive's performance, so we have only very few data points on how users would be impacted by any change. That makes it difficult to make any informed decision. However currently it seems that:

  1. Only a very small number of users were impacted by _unrefActive's performance, and nodejs/node-v0.x-archive@934bfe2 solved that problem for them.
  2. nodejs/node-v0.x-archive@934bfe2 didn't cause any performance regression for the rest of users.

If that still holds true then I would suggest to:

  1. Land this change and continue the work on a heap based implementation (either based on my existing work or started from scratch) and any other way we can improve the performance of internal timeouts.
  2. improve the way we measure the impact of such changes on actual users, so that our decision process is driven a bit more data driven. I don't know how to do this except by trying to leverage our current communication channels with Node.js users and ask for feedback.

However, if anyone feels strongly against this change, I would not push for landing it given our lack of objective data about how that would impact users.

@Fishrock123
Copy link
Contributor

I also had documented this work here: https://github.com/joyent/node/wiki/Optimizing-_unrefActive, although this document is slightly outdated. For instance it doesn't mention what we ended up landing.

Right, I've looked at that.

The main issue with the heap implementation was that at that time we were not able to make a decision on how to integrate an internal heap implementation into the code base. My implementation was using @tjfontaine's binary heap module as a proof of concept, but I couldn't get any input from other contributors on whether or not we should add that as an internal module.

We have fully internal modules in io.js, unaccessible to users (.. accessible with a flag for testing purposes only.)

i.e. require('internal/heap')

@misterdjules Seems like that solves the main problem?

A timer wheel implementation was left out initially because we thought that it might be difficult to tune and we wanted to explore what the performance was with an unordered list and a binary heap first.

I've read up on them, it probably would be fairly difficult to tune without far better benchmarking than we currently have.

However, if anyone feels strongly against this change, I would not push for landing it given our lack of objective data about how that would impact users.

We can hold of to landing this last then, if everything else lands, but a heap isn't ready, we can temporarily land it. I should still have more time to work on using & improving on Ben's heap this week, so we'll see how far I can get.

@rvagg
Copy link
Member

rvagg commented Jun 3, 2015

I think it's safe to remove this off the tsc-agenda for now so y'all can bikeshed here, feel free to re-add it if you'd like to re-escalate it for further discussion

@rvagg rvagg removed the tsc-agenda label Jun 3, 2015
@misterdjules
Copy link

@Fishrock123

We have fully internal modules in io.js, unaccessible to users (.. accessible with a flag for testing purposes only.)

i.e. require('internal/heap')

@misterdjules Seems like that solves the main problem?

The problem with moving my heap-based implementation forward was not technical, it was a lack of feedback. Without a LGTM or feedback about what would need to change from anyone I was not able to make any progress on it.

Instead, I got some feedback on the unordered list implementation and since that was already an improvement (based on my benchmarks and users feedback) I made the decision to push to land it and hoped to get more feedback on the heap-based implementation later.

@Fishrock123
Copy link
Contributor

The problem with moving my heap-based implementation forward was not technical, it was a lack of feedback. Without a LGTM or feedback about what would need to change from anyone I was not able to make any progress on it.

Ok, cool. I'll try to take a look at it soon.

(Progress might not be be super fast, I'm kinda moving to BC (west coast canada) for over a month next week, and attending nodeconf)

@Fishrock123
Copy link
Contributor

Progress! https://github.com/Fishrock123/io.js/commits/improve-unrefactive has given these preliminary results: https://gist.github.com/Fishrock123/23950d26346dc8077a08

I'll work on compiling more benchmark data and improving the impl more, but it seems like there is an improvement.

@Fishrock123
Copy link
Contributor

Updated my gist of results with more data. Now with data from node's unsorted array patches (both of them) applied onto io.js.

The results seem to show that the unsorted version is faster than a sorted array even in the worst case because of how these timeouts are used.

The heap version is massively outperforming both on the worst-case test, and seems acceptably fast in the common case. Next step will be to make some benchmarks that use more real-world like code and see how those perform.

@jasnell
Copy link
Member Author

jasnell commented Jul 9, 2015

👍 great to see this progressing!

@rvagg
Copy link
Member

rvagg commented Aug 24, 2015

@Fishrock123 what's the status of this? Can it go on the 4.0.0 milestone or is it too much work for now?

@Fishrock123
Copy link
Contributor

nodejs/node#268

misterdjules pushed a commit to nodejs/node that referenced this issue Sep 2, 2015
Before this change, _unrefActive would keep the unrefList sorted when
adding a new timer.

Because _unrefActive is called extremely frequently, this linear scan
(O(n) at worse) would make _unrefActive show high in the list of
contributors when profiling CPU usage.

This commit changes _unrefActive so that it doesn't try to keep the
unrefList sorted. The insertion thus happens in constant time.

However, when a timer expires, unrefTimeout has to go through the whole
unrefList because it's not ordered anymore.

It is usually not large enough to have a significant impact on
performance because:
- Most of the time, the timers will be removed before unrefTimeout is
  called because their users (sockets mainly) cancel them when an I/O
  operation takes place.
- If they're not, it means that some I/O took a long time to happen, and
  the initiator of subsequents I/O operations that would add more timers
  has to wait for them to complete.

With this change, _unrefActive does not show as a significant
contributor in CPU profiling reports anymore.

Fixes: nodejs/node-v0.x-archive#8160

Signed-off-by: Timothy J Fontaine <tjfontaine@gmail.com>

Conflicts:
	lib/timers.js

Fixes: nodejs/node-convergence-archive#23
Ref: #268
PR-URL: #2540
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
misterdjules pushed a commit to nodejs/node that referenced this issue Sep 2, 2015
Commit 934bfe2 had introduced a
regression where node would crash trying to access a null unref timer if
a given unref timer's callback would remove other unref timers set to
fire in the future.

More generally, it makes the unrefTimeout function more solid by not
mutating the unrefList while traversing it.

Fixes: nodejs/node-v0.x-archive#8897

Conflicts:
	lib/timers.js

Fixes: nodejs/node-convergence-archive#23
Ref: #268
PR-URL: #2540
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Fishrock123 added a commit to nodejs/node that referenced this issue Sep 2, 2015
This commit addresses most of the review comments in
#2540, which are kept in this
separate commit so as to better preserve the prior two patches as they
landed in 0.12.

This commit:
- Fixes a bug with unrefActive timers and disposed domains.
- Fixes a bug with unrolling an unrefActive timer from another.
- Adds a test for both above bugs.
- Improves check logic, making it stricter, simpler, or both.
- Optimizes nicer with a smaller, separate function for the try/catch.

Fixes: nodejs/node-convergence-archive#23
Ref: #268
PR-URL: #2540
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
misterdjules pushed a commit to nodejs/node that referenced this issue Sep 3, 2015
Before this change, _unrefActive would keep the unrefList sorted when
adding a new timer.

Because _unrefActive is called extremely frequently, this linear scan
(O(n) at worse) would make _unrefActive show high in the list of
contributors when profiling CPU usage.

This commit changes _unrefActive so that it doesn't try to keep the
unrefList sorted. The insertion thus happens in constant time.

However, when a timer expires, unrefTimeout has to go through the whole
unrefList because it's not ordered anymore.

It is usually not large enough to have a significant impact on
performance because:
- Most of the time, the timers will be removed before unrefTimeout is
  called because their users (sockets mainly) cancel them when an I/O
  operation takes place.
- If they're not, it means that some I/O took a long time to happen, and
  the initiator of subsequents I/O operations that would add more timers
  has to wait for them to complete.

With this change, _unrefActive does not show as a significant
contributor in CPU profiling reports anymore.

Fixes: nodejs/node-v0.x-archive#8160

Signed-off-by: Timothy J Fontaine <tjfontaine@gmail.com>

Conflicts:
	lib/timers.js

Fixes: nodejs/node-convergence-archive#23
Ref: #268
PR-URL: #2540
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
misterdjules pushed a commit to nodejs/node that referenced this issue Sep 3, 2015
Commit 934bfe2 had introduced a
regression where node would crash trying to access a null unref timer if
a given unref timer's callback would remove other unref timers set to
fire in the future.

More generally, it makes the unrefTimeout function more solid by not
mutating the unrefList while traversing it.

Fixes: nodejs/node-v0.x-archive#8897

Conflicts:
	lib/timers.js

Fixes: nodejs/node-convergence-archive#23
Ref: #268
PR-URL: #2540
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Fishrock123 added a commit to nodejs/node that referenced this issue Sep 3, 2015
This commit addresses most of the review comments in
#2540, which are kept in this
separate commit so as to better preserve the prior two patches as they
landed in 0.12.

This commit:
- Fixes a bug with unrefActive timers and disposed domains.
- Fixes a bug with unrolling an unrefActive timer from another.
- Adds a test for both above bugs.
- Improves check logic, making it stricter, simpler, or both.
- Optimizes nicer with a smaller, separate function for the try/catch.

Fixes: nodejs/node-convergence-archive#23
Ref: #268
PR-URL: #2540
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue Sep 3, 2015
Before this change, _unrefActive would keep the unrefList sorted when
adding a new timer.

Because _unrefActive is called extremely frequently, this linear scan
(O(n) at worse) would make _unrefActive show high in the list of
contributors when profiling CPU usage.

This commit changes _unrefActive so that it doesn't try to keep the
unrefList sorted. The insertion thus happens in constant time.

However, when a timer expires, unrefTimeout has to go through the whole
unrefList because it's not ordered anymore.

It is usually not large enough to have a significant impact on
performance because:
- Most of the time, the timers will be removed before unrefTimeout is
  called because their users (sockets mainly) cancel them when an I/O
  operation takes place.
- If they're not, it means that some I/O took a long time to happen, and
  the initiator of subsequents I/O operations that would add more timers
  has to wait for them to complete.

With this change, _unrefActive does not show as a significant
contributor in CPU profiling reports anymore.

Fixes: nodejs/node-v0.x-archive#8160

Signed-off-by: Timothy J Fontaine <tjfontaine@gmail.com>

Conflicts:
	lib/timers.js

Fixes: nodejs/node-convergence-archive#23
Ref: nodejs#268
PR-URL: nodejs#2540
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue Sep 3, 2015
Commit 934bfe2 had introduced a
regression where node would crash trying to access a null unref timer if
a given unref timer's callback would remove other unref timers set to
fire in the future.

More generally, it makes the unrefTimeout function more solid by not
mutating the unrefList while traversing it.

Fixes: nodejs/node-v0.x-archive#8897

Conflicts:
	lib/timers.js

Fixes: nodejs/node-convergence-archive#23
Ref: nodejs#268
PR-URL: nodejs#2540
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Fishrock123 added a commit to Fishrock123/node that referenced this issue Sep 3, 2015
This commit addresses most of the review comments in
nodejs#2540, which are kept in this
separate commit so as to better preserve the prior two patches as they
landed in 0.12.

This commit:
- Fixes a bug with unrefActive timers and disposed domains.
- Fixes a bug with unrolling an unrefActive timer from another.
- Adds a test for both above bugs.
- Improves check logic, making it stricter, simpler, or both.
- Optimizes nicer with a smaller, separate function for the try/catch.

Fixes: nodejs/node-convergence-archive#23
Ref: nodejs#268
PR-URL: nodejs#2540
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants