-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tls: fix leak of WriteWrap+TLSWrap combination #9626
Conversation
Writing data to TLSWrap instance during handshake will result in it being queued in `write_item_queue_`. This queue won't get cleared up until the end of the handshake. Technically, it gets cleared on `~TLSWrap` invocation, however this won't ever happen because every `WriteWrap` holds a reference to the `TLSWrap` through JS object, meaning that they are doomed to be alive for eternity. To breach this dreadful contract a knight shall embark from the `close` function to kill the dragon of memory leak with his magic spear of `destroySSL`. `destroySSL` cleans up `write_item_queue_` and frees `SSL` structure, both are good for memory usage. PR-URL: nodejs#9586 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
I've added the ctc-agenda tag to discuss how quickly we should be cutting v4 and v6 if this lands. If there is urgency we don't exactly have time to wait until the next LTS meeting |
Thx @indutny 👍 . We are eagerly waiting for this release. It would be awesome, if there could be a possibility to release the patch at an earlier timeframe. |
@indutny you need to help us come to an understanding of the severity of this so we can judge how urgent it is for LTS release lines to get pushed out, i.e. whether we need to pull our next planned release forward from the 6th of December. My guess here is that it's not super urgent as you'd need to either be making lots of TLS client connections with this pattern of write-before-handshake-complete during the lifetime of a process or writing lots of data in that brief window or a combination of both situations. So it's either going to be unlikely to be experienced by a lot of people or if it is the leak will be small enough to not require urgent action on our end. |
@rvagg this is a pretty serious bug in my opinion. Doing TLS client requests with some ETIMEOUT (which are usually inevitable) will result in constant leak of memory. This resulted in many "out of memory" crashes for us. I don't have any data about either leaks or crashes at other companies. |
@nodejs/lts we need to make a decision here. I'm of the opinion that we continue with our current schedule and roll this in to updates for v4 and v6 on the 6th as planned. That's only 3 weeks away. Doing it earlier will either mean rushing it out with less time for RCs or overlapping with Node Interactive. @thealphanerd @jasnell @Fishrock123 what think you? |
@thealphanerd @jasnell @Fishrock123 I would like to emphasize how critical this is for my company. It practically stops adoption of v4, because it crashes in production. |
I personally would like to see this land in a v7 release ASAP. We can then cut an R.C. with this commit included next week. This would then give 2 weeks for the change to bake in both the R.C. and the v7 release before we officially release. @indutny does that seem reasonable? |
@thealphanerd RC should be fine for our use case. @subeeshcbabu could you confirm this? |
Yeah |
We are constantly able to reproduce this leak, in the event of any connect timeout. I am surprised that others have not reported this yet. |
+1 This leak is a huge problem for our app. It can't stay alive more than 1 day... app starts up ~300mb and grows to 1.4gb with just a few hours of traffic until it's restarted. Really looking forward to landing this fix! It'll be huge for those using node at scale! Thanks @indutny. |
Thanks for the fix, this is causing huge problems in our app with restarts every hour. Looking forward to trying out the patch. |
+1 very useful for stability and nicely implemented change. |
👍 could use this as well. Thanks! |
+1 useful one! |
}); | ||
|
||
c.write('hello', common.mustCall((err) => { | ||
assert.equal(err.code, 'ECANCELED'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer strictEqual
+1 |
Just want to reiterate the impact of the current behavior. Across all our application pools we are seeing roughly 400-700 restarts a day due to this. If we could get a rc by early next week it could smooth our holiday traffic considerably. |
CI Looks good. BSD Failure expected @thefourtheye as this is a backport I'm not going to land the nit, but have someone submitting a first PR against master to get it done 🎉 bf09078028 |
This is a security release impacting Windows 10 users. Notable changes: * crypto: The `Decipher` methods `setAuthTag()` and `setAAD` now return `this`. (Kirill Fomichev) #9398 * dns: Implemented `{ttl: true}` for `resolve4()` and `resolve6()`. (Ben Noordhuis) #9296 & #9296 * libuv: Upgrade to v1.10.1 (cjihrig) #9647 - Fixed a potential buffer overflow when writing data to console on Windows 10. (CVE-2016-9551) * process: Added a new `external` property to the data returned by `memoryUsage()`. (Fedor Indutny) #9587 * tls: Fixed a memory leak when writes were queued on TLS connection that was destroyed during handshake. (Fedor Indutny) #9626 * V8 (dep): Upgrade to v5.4.500.43 (Michaël Zasso) #9697 * v8: The data returned by `getHeapStatistics()` now includes three new fields: `malloced_memory`, `peak_malloced_memory`, and `does_zap_garbage`. (Gareth Ellis) #8610 PR-URL: #9745
This is a security release impacting Windows 10 users. Notable changes: * crypto: The `Decipher` methods `setAuthTag()` and `setAAD` now return `this`. (Kirill Fomichev) #9398 * dns: Implemented `{ttl: true}` for `resolve4()` and `resolve6()`. (Ben Noordhuis) #9296 & #9296 * libuv: Upgrade to v1.10.1 (cjihrig) #9647 - Fixed a potential buffer overflow when writing data to console on Windows 10. (CVE-2016-9551) * process: Added a new `external` property to the data returned by `memoryUsage()`. (Fedor Indutny) #9587 * tls: Fixed a memory leak when writes were queued on TLS connection that was destroyed during handshake. (Fedor Indutny) #9626 * V8 (dep): Upgrade to v5.4.500.43 (Michaël Zasso) #9697 * v8: The data returned by `getHeapStatistics()` now includes three new fields: `malloced_memory`, `peak_malloced_memory`, and `does_zap_garbage`. (Gareth Ellis) #8610 PR-URL: #9745
Is there an ETA on when this will land on v4? |
the rc went out yesterday. the intent is to ship it on the 6th of December On Wed, Nov 23, 2016, 2:24 PM Adri Van Houdt notifications@github.com
|
@AdriVanHoudt it landed on v4 already, and is released in the latest RC. |
Ok cool! 👌 I'm seeing a memory leak in production only (yey) and hoping this will fix it |
Please let us know if it will or won't |
@indutny I can try running the RC this weekend, problem is only production and high amounts of traffic. Is the RC stable, I know what it stands for but there are no like major things that are not working right? |
the rc included a handful of semver minor changes… but none that should cause an issue.
We have not seen anything worrisome in CI yet
|
I looked at them and seems fine indeed, I even found a commit I did in there haha, let me run in staging for a few hours and then production and I'll come back if everything is on fire or my memory is smooth sailing ;) |
I think it is fair to say that the issue is fixed (graph is memory, last drop is reboot with new rc version) Thank you so much for finding and fixing this @indutny! And for the backport to v4 @thealphanerd ! 🚀 |
Hooray! |
This is a security release impacting Windows 10 users. Notable changes: * crypto: The `Decipher` methods `setAuthTag()` and `setAAD` now return `this`. (Kirill Fomichev) nodejs/node#9398 * dns: Implemented `{ttl: true}` for `resolve4()` and `resolve6()`. (Ben Noordhuis) nodejs/node#9296 & nodejs/node#9296 * libuv: Upgrade to v1.10.1 (cjihrig) nodejs/node#9647 - Fixed a potential buffer overflow when writing data to console on Windows 10. (CVE-2016-9551) * process: Added a new `external` property to the data returned by `memoryUsage()`. (Fedor Indutny) nodejs/node#9587 * tls: Fixed a memory leak when writes were queued on TLS connection that was destroyed during handshake. (Fedor Indutny) nodejs/node#9626 * V8 (dep): Upgrade to v5.4.500.43 (Michaël Zasso) nodejs/node#9697 * v8: The data returned by `getHeapStatistics()` now includes three new fields: `malloced_memory`, `peak_malloced_memory`, and `does_zap_garbage`. (Gareth Ellis) nodejs/node#8610 Signed-off-by: Ilkka Myller <ilkka.myller@nodefield.com>
Checklist
make -j8 test
(UNIX), orvcbuild test nosign
(Windows) passesAffected core subsystem(s)
tls
Description of change
Writing data to TLSWrap instance during handshake will result in it
being queued in
write_item_queue_
. This queue won't get cleared upuntil the end of the handshake.
Technically, it gets cleared on
~TLSWrap
invocation, however thiswon't ever happen because every
WriteWrap
holds a reference to theTLSWrap
through JS object, meaning that they are doomed to be alivefor eternity.
To breach this dreadful contract a knight shall embark from the
close
function to kill the dragon of memory leak with his magicspear of
destroySSL
.destroySSL
cleans upwrite_item_queue_
and freesSSL
structure,both are good for memory usage.
PR-URL: #9586
Reviewed-By: Ben Noordhuis info@bnoordhuis.nl
Backport of #9586, same PR should apply to v6.x too.
cc @thealphanerd @nodejs/lts