Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why node.js server takes longer for each new set of requests #1179

Closed
ORESoftware opened this issue Mar 22, 2018 · 19 comments
Closed

Why node.js server takes longer for each new set of requests #1179

ORESoftware opened this issue Mar 22, 2018 · 19 comments

Comments

@ORESoftware
Copy link

This regards Node.js versions 9+

I have a simple Node.js Express server running, I issued this command history:

round 1:

$ node server.js

$ node /Users/alexamil/perf.test.js
passed in these millis: 3405
$ node /Users/alexamil/perf.test.js
passed in these millis: 4010
$ node /Users/alexamil/perf.test.js
passed in these millis: 5290
$ node /Users/alexamil/perf.test.js
passed in these millis: 21170

round 2:

$ node server.js

$ node /Users/alexamil/perf.test.js
passed in these millis: 4407
$ node /Users/alexamil/perf.test.js
passed in these millis: 4865
$ node /Users/alexamil/perf.test.js
passed in these millis: 4874
$ node /Users/alexamil/perf.test.js
passed in these millis: 20808

I am just curious what the explanation is for the server taking more time, even after finishing all requests. For example, I run perf.test.js, it finishes all requests. Then I manually run the same command again, about 1 or 2 seconds after the previous finished. Yet the response time still increases. We also see a big increase upon the fourth command.

The express server is like so:

'use strict';

import {ErrorRequestHandler} from "express";
import * as express from 'express';
import haven from 'domain-haven';

const app = express();

app.use(function (req, res, next) {
  req.havenData = JSON.parse(req.query.haven || '{}');
  next();
});

if (process.env.use_haven === 'yes') {
  console.log('using haven');
  app.use(haven());
}


app.use(function(req,res,next){
  res.json({success: true});
});

app.use(function (err, req, res, next) {
  err && console.error(err.message || err);
  if (!res.headersSent) {
    setTimeout(function () {
      if (!res.headersSent) {
        res.json({error: 'hit final error middleware', value: err && err.stack || err || null})
      }
    }, 10);
  }
  
});

app.listen(6969, function () {
  console.log('app is listening.');
});

the test code is like so:

import http = require('http');
import async = require('async');
import request = require('request');
import {HavenData} from "./simple-server";
import assert = require('assert');
import util = require('util');

const tasks = Array.apply(null, Array(5000)).map(function () {
  return function (cb: Function) {
    
    const opts = {
      json: true,
      qs: {haven: null as any}
    };
    
    request.get('http://localhost:6969', opts, function (err, resp, v) {
      
      if (err) {
        return cb(err);
      }
      
      cb(null);
    });
    
  }
});

const now = Date.now();

async.parallelLimit(tasks, 15, function (err) {
  if (err) throw err;
  console.log('passed in these millis:', Date.now() - now);
});
@gireeshpunathil
Copy link
Member

Reproducible with this small code, in MAC. Tested in Linux, AIX and Windows, and did not observe this behavior. @ORESoftware - can you confirm you are on Mac to make sure we are attacking the same issue?

const h = require('http');
h.createServer((q, s) => {
  s.end('g')
}).listen(6969, () => {
  const now = Date.now()
  let count = 0
  function foo() {
    if(count++ === 5000) {
      process.stdout.write(`time in millis: ${Date.now() - now}\n`)
      process.exit(0)
    }
    h.get('http://localhost:6969', foo)
  }
  foo()
})

Remarkable difference in the profile is:

good (total ticks:7150):

   ticks parent  name
   3698   51.7%  T ___channel_get_opt

bad (total ticks: 10537):

   ticks parent  name
   7193   68.3%  T ___channel_get_opt

#find /usr/include/ | xargs grep "___channel_get_opt" 2> /dev/null
/usr/include//sys/syscall.h:#define SYS___channel_get_opt 513

Looks like this is an opaque darwin kernel routine. No documentation available.

None of the TCP system calls seem to carry the burden / reflect the delay - probably the connect and accept being non-blocking, are not spending visible time while ___channel_get_opt is taking up CPU.

The latency increases only if tested back-to-back (lot of pileups in the netstat -na | wc -l) . If you wait for it to subside, the issue is not present - hinting at some optimization in the I/O channel level.

I don't have an inference or recommendation. /cc @nodejs/performance @bnoordhuis

@bnoordhuis
Copy link
Member

Performance benchmarking on macos is a waste of time. If it reproduces on linux or windows, I'd be interested to see it.

@gibfahn
Copy link
Member

gibfahn commented Mar 23, 2018

Performance benchmarking on macos is a waste of time.

Why (if you'll excuse my curiosity)?

@bnoordhuis
Copy link
Member

Apple's kernel team spends very little effort on making things fast. That's at least the impression I get from reading the xnu sources.

@shellberg
Copy link

Unfortunate, but also been reported before too

@ORESoftware
Copy link
Author

ORESoftware commented Mar 23, 2018

yeah I am on MacOS...I can try it on my Ubuntu machine and see if I get the same thing, but haven't done that yet. The behavior seems consistent tho..It seemed strange to me that after all requests were done, it would have 0 requests. And then 1 second later, it would get a new set of 5000 requests (no more than 15 at a time), and take longer.

@gireeshpunathil
Copy link
Member

@ORESoftware - were you able to make any observations in Linux?

@ORESoftware
Copy link
Author

not yet but I will try it in a week

@gireeshpunathil
Copy link
Member

@ORESoftware - let me know if anything is outstanding on this.

@ORESoftware
Copy link
Author

let me try this on my linux machine right now

@gireeshpunathil
Copy link
Member

ping @ORESoftware - were you able to validate in Linux?

@ORESoftware
Copy link
Author

oh yes, give me another day, will test this tomorrow, I still haven't got to this but will have time tomorrow

@kingnebby
Copy link

I don't know if it helps, but I can reproduce the T ___channel_get_opt issue with this.

let intervalLimit = 100
let runCount = 0
const intervalId = setInterval(() => {
  if (runCount++ > intervalLimit) {
    clearInterval(intervalId)
    return
  }
}, 0)

Profile Output

[Summary]:
   ticks  total  nonlib   name
      4    2.4%    2.4%  JavaScript
    157   92.4%   95.7%  C++
    ...

[Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 1.0% are not shown.

   ticks parent  name
     89   52.4%  T ___channel_get_opt
  • Node.js 10.5.0
  • Darwin Kernel Version 16.7.0

@kingnebby
Copy link

kingnebby commented Jun 23, 2018

Ok, pretty convinced this is just setInterval(). I pain-stakingly when through my dependencies and excluded modules/services that contained setInterval(). This really just came down to restify (the smallest part of our app) and some custom functions that were scanning the file system or doing heartbeat kinds of stuff.

Anyhoot, in a performance run with about ~2000 ticks the two outputs are as follows:

// With Intervals
    538   20.9%   21.1%  T ___channel_get_opt

// Without Intervals
     58    3.9%  T ___channel_get_opt

Btw, ran the same script above on Ubuntu 14 and got a similar picture but with syscall instead of T ___channel_get_opt

No idea what a 'solution' would like.. if any exists.

@gireeshpunathil
Copy link
Member

@kingnebby - thanks for pitching in. However, I am not convinced on your theory around setInterval are you able to run your code and see degradation in consecutive runs? If so by how much? can you please provide that output?

mine is here (MAC):

#time node 1179.js

real	0m0.237s
user	0m0.079s
sys	0m0.022s
#time node 1179.js

real	0m0.226s
user	0m0.081s
sys	0m0.025s
#time node 1179.js

real	0m0.229s
user	0m0.075s
sys	0m0.021s
#time node 1179.js

real	0m0.231s
user	0m0.076s
sys	0m0.024s
#

Though this is little slower than Linux overall,the values are consistent across runs, and there is no cascading effect of the previous run on the next run - that was the originally reported problem.

@kingnebby
Copy link

kingnebby commented Jun 23, 2018

@gireeshpunathil Indeed- I don't know how it would impact the original issue. Stumbled across channel_get_opt in my own performance tests and was just trying to understand what that business is and whether or not it would impact our different issues.

If it doesn't help solve the original issue then feel free to ignore it :)

@ORESoftware
Copy link
Author

Working on finally replicating the problem on linux...give me 10 mins

@ORESoftware
Copy link
Author

ORESoftware commented Jun 23, 2018

Finally got to this, sorry it took so long.
The problem only seems to occur on MacOS, not Ubuntu.

I created a project that can run this test:
https://github.com/ORESoftware/node-perf

You can run:

npm install -g @oresoftware/node.perf

then:

$ node_perf_server
$ node_perf_test
$ node_perf_test
$ node_perf_test
 # ...
$ node_perf_test

etc.

On linux, each time the test is run, the time required goes down slightly over time. On MacOS it goes up a ton over time. Weird.

@andresdiaz29
Copy link

andresdiaz29 commented Oct 6, 2024

Hi everyone, I am facing the same issue on Windows and Node +20. Calling an endpoint with a long loop a few times, after the third request it is triplicating the execution time

{1DBDF78D-9AE5-4EA6-A52D-565C22A0C70C}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants