Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve allocation speed and reduce memory copying to increase HTTP pars... #199

Closed
wants to merge 1 commit into from
Closed

Improve allocation speed and reduce memory copying to increase HTTP pars... #199

wants to merge 1 commit into from

Conversation

rudi-cilibrasi
Copy link

...er performance by 1-2 (and occasionally more) percent in many cases such as benchmark/http/end-vs-write-end.js based on slab allocator todo by Ben.

@bnoordhuis
Copy link
Member

Before we go into specifics, am I right to understand that you ran the benchmarks and saw an improvement? Can you post the numbers? I would expect a difference only with benchmarks that split the request headers over multiple TCP chunks and I'm not sure if we actually have those; if not, you are more than welcome to add some. :-)

I should mention that the http benchmarks tend to have fairly high variance. I usually benchmark using median-of-three or median-of-five, with a few seconds cool-off between each run. If the difference is only 1 or 2%, it's statistically insignificant and probably just noise (because of the aforementioned variance) unless it's consistent across many (10, 20 or more) runs.

@rudi-cilibrasi
Copy link
Author

Yes I saw the highish variance. And I am guessing the new allocator is a
bit faster (1-2%) in many cases and much faster (2-10%) in a few. I would
appreciate some confirmation however. I ran dozens of times many
benchmarks however only by hand so soon I will try the automatic averaging
as you suggested. I was already guessing (without enough reading) we were
not hitting the very chunked case however I did see some improvement in
chunked.js particularly. You are right that was the original main focus
of the micro-optimization. Before I post the numbers that I saw improve I
want to explain the three sources of speedup I can imagine; the first
reason is limited like you said but I guess the other three can still apply.

  1. split request headers over multiple chunks (limited as you said)
  2. allocate 5 blocks per large block new[] allocation
  3. recycle freed blocks so they are reused rather than deallocated/allocated
  4. reduced variation in memory chunk size

I am not sure but I think the speedup is there in most cases even without
the multiple TCP chunks it is just not very significant usually. But in
certain cases it is. For example using

node benchmark/http/end-vs-write-end.js | grep buf

yields originally these measurements:
http/end-vs-write-end.js type=buf kb=64 c=100 method=write: 9277
http/end-vs-write-end.js type=buf kb=64 c=100 method=end : 9113
http/end-vs-write-end.js type=buf kb=128 c=100 method=write: 8540
http/end-vs-write-end.js type=buf kb=128 c=100 method=end : 8456

and after the patch yields

http/end-vs-write-end.js type=buf kb=64 c=100 method=write: 9830
http/end-vs-write-end.js type=buf kb=64 c=100 method=end : 9631
http/end-vs-write-end.js type=buf kb=128 c=100 method=write: 8552
http/end-vs-write-end.js type=buf kb=128 c=100 method=end : 8405

I guess that is pretty typical. It is roughly equivalent in many cases,
sometimes 1-4% better, and in some cases 5-10% better.

I should mention that the TCP fragmentation case was the main reason for
the patch so I would love to see such a benchmark but was not sure I would
know how to easily write it yet.

Thanks for the feedback.

On Tue, Dec 23, 2014 at 4:38 AM, Ben Noordhuis notifications@github.com
wrote:

Before we go into specifics, am I right to understand that you ran the
benchmarks and saw an improvement? Can you post the numbers? I would expect
a difference only with benchmarks that split the request headers over
multiple TCP chunks and I'm not sure if we actually have those; if not, you
are more than welcome to add some. :-)

I should mention that the http benchmarks tend to have fairly high
variance. I usually benchmark using median-of-three or median-of-five, with
a few seconds cool-off between each run. If the difference is only 1 or 2%,
it's statistically insignificant and probably just noise (because of the
aforementioned variance) unless it's consistent across many (10, 20 or
more) runs.


Reply to this email directly or view it on GitHub
#199 (comment).

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

@bnoordhuis
Copy link
Member

I'm running on battery power right now - not an ideal environment for benchmarking - so I can't really confirm if it's faster. I did some quick tests yesterday but the results were inconclusive.

I had a look at the existing benchmarks and I don't think any of them really stress the 'split headers' case. I would suggest adding one that:

  1. Starts a HTTP server on a UNIX socket / pipe instead of TCP to avoid loopback TCP ACK latency. On Linux, that introduces 40 ms latency per TCP packet; see [uv] Feedback about the performance orthecreedence/cl-async#98 for an example of the disastrous effect that can have on throughput.
  2. Start the client in a sub-process and use the net module to write raw HTTP. If you are feeling adventurous, you can use the raw bindings to remove the overhead of the net streams layer. There are examples in benchmark/net.

By the way, if you just want to run the buf tests, you can do so like this:

$ node benchmark/http/end-vs-write-end.js type=buf

@rudi-cilibrasi
Copy link
Author

Thank you for these great tips. I have already started reading the
benchmark/net examples you mentioned. It looks doable to me but it may
take me some time because I have not worked with node networking at the
byte level as much as C. I really appreciate the git tips you and Colin
gave me and have already practiced using them for the first time in this
pull request. Please let me know if I made any mistake in my first commit
squash operation. I will work towards getting together a better
performance test and add it to the branch.

On Wed, Dec 24, 2014 at 3:08 AM, Ben Noordhuis notifications@github.com
wrote:

I'm running on battery power right now - not an ideal environment for
benchmarking - so I can't really confirm if it's faster. I did some quick
tests yesterday but the results were inconclusive.

I had a look at the existing benchmarks and I don't think any of them
really stress the 'split headers' case. I would suggest adding one that:

Starts a HTTP server on a UNIX socket / pipe instead of TCP to avoid
loopback TCP ACK latency. On Linux, that introduces 40 ms latency per TCP
packet; see orthecreedence/cl-async#98
orthecreedence/cl-async#98 for an example
of the disastrous effect that can have on throughput.
2.

Start the client in a sub-process and use the net module to write raw
HTTP. If you are feeling adventurous, you can use the raw bindings to
remove the overhead of the net streams layer. There are examples in
benchmark/net.

By the way, if you just want to run the buf tests, you can do so like this:

$ node benchmark/http/end-vs-write-end.js type=buf


Reply to this email directly or view it on GitHub
#199 (comment).

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

@rudi-cilibrasi
Copy link
Author

I pushed the new node javascript client and server in the benchmarks
directory. I noticed something confusing and mentioned it in the commit
log. I look forward to your advice, reaction, or information. Best
regards,

Rudi

On Wed, Dec 24, 2014 at 8:58 AM, Rudi Cilibrasi cilibrar@gmail.com wrote:

Thank you for these great tips. I have already started reading the
benchmark/net examples you mentioned. It looks doable to me but it may
take me some time because I have not worked with node networking at the
byte level as much as C. I really appreciate the git tips you and Colin
gave me and have already practiced using them for the first time in this
pull request. Please let me know if I made any mistake in my first commit
squash operation. I will work towards getting together a better
performance test and add it to the branch.

On Wed, Dec 24, 2014 at 3:08 AM, Ben Noordhuis notifications@github.com
wrote:

I'm running on battery power right now - not an ideal environment for
benchmarking - so I can't really confirm if it's faster. I did some quick
tests yesterday but the results were inconclusive.

I had a look at the existing benchmarks and I don't think any of them
really stress the 'split headers' case. I would suggest adding one that:

Starts a HTTP server on a UNIX socket / pipe instead of TCP to avoid
loopback TCP ACK latency. On Linux, that introduces 40 ms latency per TCP
packet; see orthecreedence/cl-async#98
orthecreedence/cl-async#98 for an example
of the disastrous effect that can have on throughput.
2.

Start the client in a sub-process and use the net module to write raw
HTTP. If you are feeling adventurous, you can use the raw bindings to
remove the overhead of the net streams layer. There are examples in
benchmark/net.

By the way, if you just want to run the buf tests, you can do so like
this:

$ node benchmark/http/end-vs-write-end.js type=buf


Reply to this email directly or view it on GitHub
#199 (comment).

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

@novacrazy
Copy link

I think the idea of having a tiny memory pool used in one area is a bad idea, especially with the implementation in the same file. It would probably improve performance some, but I think if memory pools were to be used in io.js it should be done all around with a larger subsystem for handling it.

Perhaps something like boost::pool, which is tried and tested. Otherwise, trusting the OS not to handle memory too slowly is reasonable, or just optimizing current code to reduce copying and reallocations.

@rudi-cilibrasi
Copy link
Author

I wanted to let you know that the leak exists in v0.12 without the new code
I introduced.

In general I agree with you about memory allocation code that is over
specific. I also agree more general is better and would want to check out
and compare boost pool etc were this to go further. I am more interested
in the apparent memory leak when running the new benchmark code against the
old v0.12 release without my additions to http_parser. I found some old
bug reports against earlier versions that suggest there may be a leak in
the http_parser and that concerns me since iojs is so network oriented and
high speed enough for a memory leak to make a big difference in server
stability.

On Fri, Dec 26, 2014 at 10:27 AM, Aaron Trent notifications@github.com
wrote:

I think the idea of having a tiny memory pool used in one area is a bad
idea, especially with the implementation in the same file. It would
probably improve performance some, but I think if memory pools were to be
used in io.js it should be done all around with a larger subsystem for
handling it.

Perhaps something like boost::pool, which is tried and tested. Otherwise,
trusting the OS not to handle memory too slowly is reasonable, or just
optimizing current code to reduce copying and reallocations.


Reply to this email directly or view it on GitHub
#199 (comment).

Happy to Code with Integrity : Software Engineering Code of Ethics and
Professional Practice http://www.acm.org/about/about/se-code

@bnoordhuis
Copy link
Member

@rudi-cilibrasi Can you make the benchmarks part of the benchmark suite? Take a look at the files in (for example) benchmark/net for guidance; the goal is to make them fully automatic. I'll leave some comments on the PR itself.

if (has_keep_alive) {
WriteWithCRLF(channel, "Connection: keep-alive");
}
WriteWithCRLF(channel, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap at 80 columns and please use single quotes. (The latter is not a hard requirement but it's the norm in core and code that deviates always looks a little grating.)

This allocator attempts to increase performance by a few
percent in benchmark/http/end-vs-write-end.js and other cases.
This experimental effort was initiated by a TODO indicator from Ben.
@Fishrock123
Copy link
Contributor

ping @bnoordhuis

@rudi-cilibrasi
Copy link
Author

i'm happy to let this one go if you like since the performance boost was uncertain last i checked and it is a bit more complex. if somebody likes it or gets good test results i am happy to see it in also. my feeling is that my other PR is more quality and hopefully more likely to go in now but I'm still waiting for more feedback. #228

@mscdex mscdex added the http Issues or PRs related to the http subsystem. label Mar 22, 2015
@brendanashworth brendanashworth added the c++ Issues and PRs that require attention from people who are familiar with C++. label May 1, 2015
@Fishrock123
Copy link
Contributor

Since this doesn't look like it's going anywhere, going to close in light of #1457, let me know if it should be re-opened.

@ChALkeR ChALkeR added the performance Issues and PRs related to the performance of Node.js. label Feb 16, 2016
jasongin added a commit to jasongin/nodejs that referenced this pull request Mar 25, 2017
 - Delete the node_api_*internal.h headers
 - Don't crash when casting numbers when
   V8_ENABLE_CHECKS is defined
 - Return 0 when getting NaN value as an integer
 - Always set callback data internal field to prevent
   crashing when V8_ENABLE_CHECKS is defined
 - Style: replace `if (ptr)` with `if (ptr != nullptr)`
 - Fix extern "C" in node_api_async.h
 - Minor doc changes
boingoing pushed a commit to boingoing/node that referenced this pull request Apr 6, 2017
 - Delete the node_api_*internal.h headers
 - Don't crash when casting numbers when
   V8_ENABLE_CHECKS is defined
 - Return 0 when getting NaN value as an integer
 - Always set callback data internal field to prevent
   crashing when V8_ENABLE_CHECKS is defined
 - Style: replace `if (ptr)` with `if (ptr != nullptr)`
 - Fix extern "C" in node_api_async.h
 - Minor doc changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. http Issues or PRs related to the http subsystem. performance Issues and PRs related to the performance of Node.js.