Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Getting constant Client network socket disconnected before secure TLS connection was established #4652

Open
2 tasks done
hanzlamateen opened this issue Mar 31, 2022 · 20 comments
Labels
Bug thing that needs fixing Needs Triage needs review for next steps Release 8.x work is associated with a specific npm 8 release

Comments

@hanzlamateen
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

This issue exists in the latest npm version

  • I am using the latest npm

Current Behavior

When trying to install packages using npm install --production=false --loglevel notice --legacy-peer-deps, we get constantly getting mentioned error. The package that fails is always different, so there isn't a specific package causing the issue.

npm WARN deprecated read-package-tree@5.3.1: The functionality that this package provided is now in @npmcli/arborist
npm WARN deprecated har-validator@5.1.5: this library is no longer supported
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142
npm WARN deprecated rollup-plugin-inject@3.0.2: This package has been deprecated and is no longer maintained. Please use @rollup/plugin-inject.
npm WARN deprecated querystring@0.2.0: The querystring API is considered Legacy. new code should use the URLSearchParams API instead.
npm WARN deprecated feathers-logger@0.3.2: This module is no longer maintained. See https://github.com/feathersjs-ecosystem/feathers-logger#migrating for more information on using a standard logger
npm WARN deprecated emailjs-com@3.2.0: The SDK name changed to @emailjs/browser
npm WARN deprecated superagent@7.1.2: Deprecated due to bug in CI build https://github.com/visionmedia/superagent/pull/1677\#issuecomment-1081361876
npm WARN deprecated uuid@3.3.2: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated core-js@2.6.12: core-js@<3.4 is no longer maintained and not recommended for usage due to the number of issues. Because of the V8 engine whims, feature detection in old core-js versions could cause a slowdown up to 100x even if nothing is polyfilled. Please, upgrade your dependencies to the actual version of core-js.
npm ERR! code ECONNRESET
npm ERR! errno ECONNRESET
npm ERR! network request to https://registry.npmjs.org/remove-trailing-separator/-/remove-trailing-separator-1.1.0.tgz failed, reason: Client network socket disconnected before secure TLS connection was established
npm ERR! network This is a problem related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network settings.
npm ERR! network 
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly.  See: 'npm help config'

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2022-03-31T11_49_10_091Z-debug-0.log

Here are the changes I have tried so far:

  1. Added npm install -g npm, still not working

  2. Added following before npm i, still not working
    npm config set fetch-retry-maxtimeout 9999999 -g

  3. Updated npm i command to include timeout, still not working
    npm install --production=false --loglevel notice --legacy-peer-deps --timeout=9999999

  4. Installed VPN and connected to other regions, still not working.

Expected Behavior

Install packages without failing

Steps To Reproduce

Using a project with workspaces
No cache available
npm install --production=false --loglevel notice --legacy-peer-deps

Environment

  • npm: 8.5.5
  • Node.js: 17.8.0
  • OS Name: Ubunutu 20.04
@hanzlamateen hanzlamateen added Bug thing that needs fixing Needs Triage needs review for next steps Release 8.x work is associated with a specific npm 8 release labels Mar 31, 2022
@nigams
Copy link

nigams commented Apr 1, 2022

Same issue here. Started seeing it after upgrading to Node 16.14.x. The URL reported in the ECONNRESET failure message is random (changes).

I see this on MacOS as well as Linux (docker image).
No cache involved.
NPM: 8.3.1, 8.5.0

In case it helps someone, there are two potential workarounds (the second one being sketchy).

  1. If you can connect to the NPM registry without a proxy server, then set NO_PROXY env var to registry.npmjs.org.
  2. If you have to use a proxy server to reach the registry.npmjs.org, then the only workaround that works is to go back to NPM 6.14.11. I am not sure though of the full impact of such a downgrade.

I would think this is being experienced by many people and therefore would request that be looked at urgently.

@soulchild
Copy link

soulchild commented Apr 4, 2022

A couple of months ago we started seeing network errors as well, mostly ECONNRESET, when using npm behind a proxy. I was able to reproduce the behaviour on my local development machine when using the proxy as well (Squid 4.6.1).

Recent npm upgrades received via Docker Node.js base images are a probable cause, because our infrastructure hasn't changed significantly. It started in February with only 1-2 scheduled builds per week failing, now 9/10 builds are failing with network errors.

I'm not 100% sure, but I feel that npm ci instead of npm install has a higher rate of success. The whole thing feels like some sort of timing issue. Maybe ci just doesn't make as many network requests as install or something like that?

@nlf
Copy link
Contributor

nlf commented Apr 8, 2022

if all of you who are experiencing this error are proxy users, then i have a good idea of what's going on.

is anyone seeing this not using a proxy?

@hanzlamateen
Copy link
Author

if all of you who are experiencing this error are proxy users, then i have a good idea of what's going on.

is anyone seeing this not using a proxy?

I am not using any proxy

@nigams
Copy link

nigams commented Apr 8, 2022

I have not seen this issue even once when not using a proxy.

By not using a proxy I mean, "proxy" and "https-proxy" are not set in the npm config, and importantly, HTTP_PROXY/HTTPS_PROXY/http_proxy/https_proxy variables are not set in the system environment.

@gvolluz
Copy link

gvolluz commented Apr 20, 2022

can confirm having had that issue for a while now, not using any proxy, and it seems to be sometimes fixed by setting the NO_PROXY env so e.g. so

before_script:
- npm config set NO_PROXY=registry.npmjs.org

or

before_script:
- npm -i some packages --noproxy registry.npmjs.org

but not every time ... or I just was lucky with 6 jobs this morning and now a 7th is failing always with the usual error:

npm ERR! code ECONNRESET
npm ERR! errno ECONNRESET
npm ERR! network request to https://registry.npmjs.org/debug/-/debug-4.3.4.tgz failed, reason: socket hang up

@swinster
Copy link

Just found this issue - I posted to #3078 (comment) although that issue is closed so perhaps here is better.

I'm seeing this issue without a proxy. More detail in the post above.

@swinster
Copy link

swinster commented May 13, 2022

I re-ran things, this time using npm ci --noproxy registry.npmjs.org, but still no go:

pi@raspberrypi:/opt/zigbee2mqtt $ npm ci  --noproxy registry.npmjs.org
npm ERR! code ECONNRESET
npm ERR! errno ECONNRESET
npm ERR! network request to https://registry.npmjs.org/ms/-/ms-2.0.0.tgz failed, reason: Client network socket disconnected before secure TLS connection was established
npm ERR! network This is a problem related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network settings.
npm ERR! network 
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly.  See: 'npm help config'

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/pi/.npm/_logs/2022-05-13T14_15_13_187Z-debug-0.log

I decided to leave a PCAP running on my edge router during this attempt. The PCAP was taken on the LAN side just so I would filter on the originating IP address of the Raspberry Pi.

The initial connection started just fine. We see DNS queries for registry.npmjs.org, and answers of:

Answers
    registry.npmjs.org: type A, class IN, addr 104.16.19.35
    registry.npmjs.org: type A, class IN, addr 104.16.24.35
    registry.npmjs.org: type A, class IN, addr 104.16.20.35
    registry.npmjs.org: type A, class IN, addr 104.16.18.35
    registry.npmjs.org: type A, class IN, addr 104.16.27.35
    registry.npmjs.org: type A, class IN, addr 104.16.23.35
    registry.npmjs.org: type A, class IN, addr 104.16.25.35
    registry.npmjs.org: type A, class IN, addr 104.16.17.35
    registry.npmjs.org: type A, class IN, addr 104.16.22.35
    registry.npmjs.org: type A, class IN, addr 104.16.21.35
    registry.npmjs.org: type A, class IN, addr 104.16.16.35
    registry.npmjs.org: type A, class IN, addr 104.16.26.35

npm makes a single TLS connection to the first host on the list, namely 104.16.19.35. All seems to be OK for a bit.:

e.g:

image

After about 5 mins, there are multiple other TLS connection attempts to the same host using different sockets. Each of these attempts is refused and rest (RST) by the host, 104.16.19.35, however, data flows over the original socket.

![image](https://user-images.githubusercontent.com/8142964/168335835-45d31ad9-48d5-4f22-850e-7ead97684edc.png)

Soon after, we then see a bunch more DNS requests for the same thing (and answers):

image

As then, the Pi (npm) opens multiple TLS connections in a "second wave" to all these hosts (based on the DNS answers):

image

And even multiple TLS connections to the same host. e.g. filter onip.addr == 104.16.18.35, we see the following, and Data seems to flow over these multiple connections/sockets:

image

Concentrating on just this host (104.16.18.35), we see a little bit later a socket opened, but no TLS connection attempted, so is then closed a few seconds later:

image

However, other sockets are opened and TLS connections are attempted but are then reset by the host:

image

image

image

image

This pattern seems to be similar to all the hosts where connections were made during the "second wave".

The initial good TLS connection to 104.16.19.35 is eventually reset by npm on the Pi rather than the host after keeping to socket open for 1 minute with no activity.

image

I'm not sure if these data connections and management within npm, or on the npm registry host servers, is by design or not.

@swinster
Copy link

swinster commented May 14, 2022

FWIW, I finally managed to get npm ci to complete - unfortunately, I didn't PCAP this working process.

I decided to reduce the number of concurrent sockets used by the npm client via the --maxsockets x parameter, however, rerunning npm ci straight after failure simply caused the process to hang and the process was eventually killed automatically by the OS. I had to clear the npm cache, and then rerun npm ci (with my params) to get things to work. E.g.:

pi@raspberrypi:/opt/zigbee2mqtt $ npm cache clean -f                                           
npm WARN using --force Recommended protections disabled.

pi@raspberrypi:/opt/zigbee2mqtt $ npm ci  --noproxy registry.npmjs.org --maxsockets 1          

added 821 packages, and audited 822 packages in 29m

77 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

Maybe this helps someone, and maybe it helps the devs figure out why the TLS connections keep getting reset by the registry server on TLS Client Hello.

@gerardabello
Copy link

gerardabello commented Jun 2, 2022

I had the same problem, tried both npm i and npm ci multiple times, getting the same error.

Using npm ci --maxsockets 1 worked on the first try.

npm 7.24.0
node v16.13.0
macOS 11.6

@soulchild
Copy link

Somehow the situation improved greatly in the last couple of weeks in my case. I have had only a single build breaking due to "network problems" (ECONNRESET). I suspect newer versions of npm have had some sort of fix for this issue…?

@FlorianSW
Copy link

We're seeing a similar issue happening in our environment, as well, and from the symptoms I assume this is the same issue as described here. Some more context, as it may help to investigate:

  • we used node 14 with npm 6.x so far in this environment without seeing any connection resets so far.
  • we use a http proxy to connect to https://registry.npmjs.org (using HTTP CONNECT), set by the https_proxy environment variable

After upgrading nodejs to 16 with npm 8.11.x we see builds failing (up to 90% of the times, but at least 50% of the times) with the mentioned error message.
We tried setting maxsockets to 1, which did not solve this issue unfortunately.
We also tried setting the fetch-timeout to 30 seconds (which was the non-configurable default in npm 6 if I'm not mistaken). This was an attempt to fix this after going through the changes of some components of the npm network libraries.

To get more visibility into this issue, I setup an intermediate proxy using tinyproxy, so that the connection flow looks like this:
npm -> tinyproxy -> upstream proxy (the one we usually use) -> npmjs.org

After running npm ci again with this setup, I found that tinyproxy is rejecting new connections pretty early with the reason that the maximum number of connections were already reached (which is 100 by default). I already set the tinyproxy Timeout setting to 1 second, meaning that idle connections should be closed after 1 second. I'm not sure if that is of any help, though. Such an idle timeout does exist for our upstream proxy as well, but is set to slightly over 60 seconds.
I'm confused about that, as maxsockets should, from my point of view, prevent having more than 1 concurrent connection to the same host/port, isn't it?

As a result, however, the builds with having tinyproxy as a proxy between npm and the upstream proxy, succeeds more often (needless to say that this takes a significant amount of additional time, as connections need to be retried after the current 100 connections drained). Our upstream proxy does not have a quota (apart from a hard limit of connections the proxy can open) for concurrent connections. This might indicate, that the issue is related to the number of connections opened and held open by npm or the underlying network stack? Sadly I do not have enough knowledge of the underlying code of npm and node to make such an assumption of even prove that it is right :(

I'm a bit out of ideas on what the issue here might be. So far, we did not see this issue either without a proxy or with npm 6. If I can provide any more information that might be of help to investigate this issue, please let me know.

@FlorianSW
Copy link

We further investigated the issue. In our case, when using a downstream proxy which supports https for the connection from npm to the first proxy, we don't see this issue happening, anymore. We used mitmproxy (mitmdump) which results in the following network chain:
npm -> mitmdump(https) -> upstream proxy(https) -> npmjs.org(https)

Running npm ci without any changed settings resulted in a successful run.

After that, I traced the packets on the machine where npm runs (while directly using the upstream proxy, so no mitmdump running anymore) to find more information when connections are aborted by a reset. Here an excerpt of a connection which was made successfully and was later resetted:
image

From my not so deep understanding, this looks complicated in a way that the connection seems to be closed by the proxy while the client wants to send an encrypted alert (which contents we're unfortunately not aware of).

In a trace with npm 6 this is, with the same proxy setup, this is not happening. In this case, the connection is correctly closed with a FIN, FIN ACK flow, as one would expect. It hink there are two issues here: The proxy server, who is rejecting any more traffic after the connection was closed from their side, however, I'm not sure why npm is behaving so differently in these two versions.

With this behaviour in mind, I searched a bit and were able to find issue nodejs/node#23128 in nodejs. I'm not sure if that is more related to the issue we are seeing or not, just wanted to mention it. The issue is pretty old as well.

This comment will probably confuse more than it really helps, however, I wanted to give our findings here in this issue as well :)

@melnikovdv
Copy link

Had the same problem with npm install in docker lst-alpine (3.16). Btw in osX everything was ok.

Using --maxsockets 1 worked on the first try.

npm 8.11.0
node v16.15.1

@thecrowkeep
Copy link

also getting this issue on

software version
windows 11 10.0.22621 Build 22621
nvm-windows 1.1.11
node 20.0.0
npm 9.6.4

no proxy, sending a curl to the relevant requests works fine.
i tried npm i -g yarn --maxsockets 1 to no avail.
i've also tried mirror registries like https://registry.yarnpkg.com but that shouldn't matter since like i said, i can open the requests just fine in curl.

i did find a solution for my needs, which is just downgrading node to 14.20.1 and using npm v6.14.17
but that obviously doesn't resolve the bug in the latest version, so not a real solution.

debug log
0 verbose cli C:\Program Files\nodejs\node.exe C:\Program Files\nodejs\node_modules\npm\bin\npm-cli.js
1 info using npm@9.6.4
2 info using node@v20.0.0
3 timing npm:load:whichnode Completed in 1ms
4 timing config:load:defaults Completed in 1ms
5 timing config:load:file:C:\Users\{user}\AppData\Roaming\nvm\v20.0.0\node_modules\npm\npmrc Completed in 0ms
6 timing config:load:builtin Completed in 0ms
7 timing config:load:cli Completed in 1ms
8 timing config:load:env Completed in 1ms
9 timing config:load:project Completed in 0ms
10 timing config:load:file:C:\Users\{user}\.npmrc Completed in 1ms
11 timing config:load:user Completed in 1ms
12 timing config:load:file:C:\Program Files\nodejs\etc\npmrc Completed in 0ms
13 timing config:load:global Completed in 0ms
14 timing config:load:setEnvs Completed in 1ms
15 timing config:load Completed in 5ms
16 timing npm:load:configload Completed in 5ms
17 timing npm:load:mkdirpcache Completed in 0ms
18 timing npm:load:mkdirplogs Completed in 0ms
19 verbose title npm i yarn
20 verbose argv "i" "--global" "yarn"
21 timing npm:load:setTitle Completed in 1ms
22 timing config:load:flatten Completed in 2ms
23 timing npm:load:display Completed in 2ms
24 verbose logfile logs-max:10 dir:C:\Users\{user}\AppData\Local\npm-cache\_logs\2023-04-19T19_37_33_000Z-
25 verbose logfile C:\Users\{user}\AppData\Local\npm-cache\_logs\2023-04-19T19_37_33_000Z-debug-0.log
26 timing npm:load:logFile Completed in 6ms
27 timing npm:load:timers Completed in 0ms
28 timing npm:load:configScope Completed in 0ms
29 timing npm:load Completed in 15ms
30 timing config:load:flatten Completed in 0ms
31 timing arborist:ctor Completed in 0ms
32 silly logfile start cleaning logs, removing 2 files
33 timing idealTree:init Completed in 8ms
34 timing idealTree:userRequests Completed in 3ms
35 silly idealTree buildDeps
36 silly fetch manifest yarn@*
37 silly logfile done cleaning log files
38 silly placeDep ROOT yarn@ OK for:  want: *
39 timing idealTree:#root Completed in 88956ms
40 timing idealTree:node_modules/yarn Completed in 0ms
41 timing idealTree:buildDeps Completed in 88957ms
42 timing idealTree:fixDepFlags Completed in 0ms
43 timing idealTree Completed in 88968ms
44 timing command:i Completed in 88975ms
45 verbose type system
46 verbose stack FetchError: request to https://registry.npmjs.org/yarn failed, reason: Socket connection timeout
46 verbose stack     at ClientRequest. (C:\Users\{user}\AppData\Roaming\nvm\v20.0.0\node_modules\npm\node_modules\minipass-fetch\lib\index.js:130:14)
46 verbose stack     at ClientRequest.emit (node:events:511:28)
46 verbose stack     at TLSSocket.socketErrorListener (node:_http_client:495:9)
46 verbose stack     at TLSSocket.emit (node:events:523:35)
46 verbose stack     at emitErrorNT (node:internal/streams/destroy:151:8)
46 verbose stack     at emitErrorCloseNT (node:internal/streams/destroy:116:3)
46 verbose stack     at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
47 verbose cwd E:\projects\{project}\{branch}
48 verbose Windows_NT 10.0.22621
49 verbose node v20.0.0
50 verbose npm  v9.6.4
51 error code ERR_SOCKET_CONNECTION_TIMEOUT
52 error errno ERR_SOCKET_CONNECTION_TIMEOUT
53 error request to https://registry.npmjs.org/yarn failed, reason: Socket connection timeout
54 verbose exit 1
55 timing npm Completed in 89007ms
56 verbose unfinished npm timer reify 1681933053032
57 verbose unfinished npm timer reify:loadTrees 1681933053035
58 verbose code 1
59 error A complete log of this run can be found in: C:\Users\{user}\AppData\Local\npm-cache\_logs\2023-04-19T19_37_33_000Z-debug-0.log

@thecrowkeep
Copy link

nevermind, the solution is to cancel your service with spectrum/charter. turned out i couldn't ping the ipv6 route for registry.npmjs.org when using their internet service.

lucky for me a competitor finally became available in my area

though it would be nice if npm used ipv4 as fallback if ipv6 fails

@len0xx
Copy link

len0xx commented Jun 6, 2023

I closed all of the ports (except for 22, 80 and 443) on my Ubuntu server and npm stopped working. Thank you @swinster for your WireShark screenshots which showed that npm uses 59376 port. Opening it on my server helped and npm started working again.

@stasberkov
Copy link

I am using nodejs 18.19.1 npm 10.2.4 and also getting this network socket error. It is reproduced on every clean install. If you retry then install is ok. --maxsockets 1 helps for first install. It would be nice to have proper solution to avoid such kludges. I don't have any proxies in my network.

@petersutter5
Copy link

Updating Docker Desktop to 4.29.0 fixed the issue for me. I didn't need --max-sockets

@corwestermaniddink
Copy link

This is still an issue, in the core.

The NPM dev team should have a look at it with all the information above.

I encounter this issue within for example on an Azure Container Registry build agent with a QEMU linux/arm64 docker buildx image build. I tried the following approaches:

  1. npm config set registry http://registry.npmjs.org/ (but you get redirected by npmjs.org to the HTTPS because from october 2021 it's mandatory to connect through https/tls2.1)
  2. npm ci --maxsockets 1 is maby a workaround for this issue, but not THE solution. This workaround also impacts the duration of the total npm ci execution time.
  3. Make sure that no devdepencies are not retrieved, to have minimale connections to npmjs.org with --production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug thing that needs fixing Needs Triage needs review for next steps Release 8.x work is associated with a specific npm 8 release
Projects
None yet
Development

No branches or pull requests