Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Transport needs a limit on total connections (per host) #6785

Closed
calmh opened this issue Nov 18, 2013 · 28 comments
Closed

net/http: Transport needs a limit on total connections (per host) #6785

calmh opened this issue Nov 18, 2013 · 28 comments
Labels
FeatureRequest Issues asking for a new feature that does not need a proposal. FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Suggested Issues that may be good for new contributors looking for work to do.
Milestone

Comments

@calmh
Copy link
Contributor

calmh commented Nov 18, 2013

(go1.2rc4)

We're working on a reverse HTTP proxy that does some "stuff" (that's not
relevant) and proxies towards one or more backend HTTP servers. During testing I ran
into connection failures apparently due to running out of file descriptors. In narrowing
this down I reduced the problem to;

The "null" reverse proxy (doing nothing but proxying, using the
httputil.ReverseProxy):

http://play.golang.org/p/p1g4bpTZ_g

A trivial HTTP server acting as the backend behind the above proxy; it simply counts the
number of connections and responds successfully:

http://play.golang.org/p/F7W-vbXEUt

Running both of these, I have the proxy on :8080 forwarding to the server at :8090. To
test it, I use wrk (https://github.com/wg/wrk);

jb@jborg-mbp:~ $ wrk -c 1 -t 1 http://localhost:8080/
...

Granted, this isn't a super realistic reproduction of the real world since the latency
on localhost is minimal. I can't prove the same thing _can't_ happen in production
though.

Using one connection (-c 1) up to about three, this works perfectly. The server side
sees a bunch of requests over one to three connections, i.e. the number of backend
connections from the proxy matches the number of incoming connections. At around -c 4
and upwards, it blows up. The proxy doesn't manage to recycle connections quickly enough
and starts doing regular Dial:s at a rate of thousands per second, resulting in

...
2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested
address
2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested
address
2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested
address
2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested
address
...

from the proxy code and of course HTTP errors as seen by wrk. 

My theory, after going through the http.Transport, is that when the number of requests/s
to the proxy goes up, the small amount of bookkeeping that is required to recycle a
connection (putIdleConn etc) starts taking just long enough that the next request in the
pipeline gets in before a connection is idle. The Transport starts Dial:ing, adding more
connections to take care of, and it explodes chaotically. I would prefer if it blocked
and awaited an idle connection instead of Dial:ing at some point.

Line 92, "// TODO: tunable on global max cached connections" seems relevant,
although in my case the tunable should probably be max connections per host instead of
globally.

I'll probably take a stab at implementing something like that to fix this (i.e.
rewriting Transport to limit connections), since I couldn't figure out a way around it
by just using the available API:s. Unless I seem to have misunderstood something
obvious...

//jb
@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 1:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 2:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 3:

Labels changed: added repo-main.

@davecheney
Copy link
Contributor

Comment 4:

Labels changed: added suggested, removed priority-triage.

Status changed to LongTerm.

@bradfitz
Copy link
Contributor

Comment 5:

Issue #8536 has been merged into this issue.

@gopherbot
Copy link
Contributor

Comment 6 by surajn.vnit:

Just wanted to check if this is planned anytime soon. Ran into similar issue - at scale
looks like there is high contention on the semaphore for PutIdleConn and GetIdleConn. 
316 @ 0x415989 0x415a0b 0x429eee 0x42a130 0x49ad56 0x45056e 0x450ab8 0x44ef26 0x43390d
0x4330db 0x43456f 0x433bbf 0x400e4f 0x415c20
#   0x42a130    sync.runtime_Semacquire+0x30            /usr/local/go/src/pkg/runtime/sema.goc:199
#   0x49ad56    sync.(*Mutex).Lock+0xd6             /usr/local/go/src/pkg/sync/mutex.go:66
#   0x45056e    net/http.(*Transport).getIdleConn+0x9e      /usr/local/go/src/pkg/net/http/transport.go:402
#   0x450ab8    net/http.(*Transport).getConn+0xa8      /usr/local/go/src/pkg/net/http/transport.go:452
#   0x44ef26    net/http.(*Transport).RoundTrip+0x416       /usr/local/go/src/pkg/net/http/transport.go:201
#   0x43390d    net/http.send+0x43d             /usr/local/go/src/pkg/net/http/client.go:195
#   0x4330db    net/http.(*Client).send+0x15b           /usr/local/go/src/pkg/net/http/client.go:118
#   0x43456f    net/http.(*Client).doFollowingRedirects+0x97f   /usr/local/go/src/pkg/net/http/client.go:343
#   0x433bbf    net/http.(*Client).Get+0xaf         /usr/local/go/src/pkg/net/http/client.go:275
#   0x400e4f    main.startClient+0x16f              /home/adsteam/go_workspace/src/suraj.narkhede/stresstester/main.go:48
347 @ 0x415989 0x415a0b 0x429eee 0x42a130 0x49ad56 0x44fd5e 0x455b87 0x454c1b 0x454931
0x483291 0x480f6d 0x400f0b 0x415c20
#   0x42a130    sync.runtime_Semacquire+0x30        /usr/local/go/src/pkg/runtime/sema.goc:199
#   0x49ad56    sync.(*Mutex).Lock+0xd6         /usr/local/go/src/pkg/sync/mutex.go:66
#   0x44fd5e    net/http.(*Transport).putIdleConn+0xce  /usr/local/go/src/pkg/net/http/transport.go:342
#   0x455b87    net/http.func·023+0x97         /usr/local/go/src/pkg/net/http/transport.go:853
#   0x454c1b    net/http.(*bodyEOFSignal).condfn+0xab   /usr/local/go/src/pkg/net/http/transport.go:1161
#   0x454931    net/http.(*bodyEOFSignal).Read+0x321    /usr/local/go/src/pkg/net/http/transport.go:1133
#   0x483291    io/ioutil.devNull.ReadFrom+0xb1     /usr/local/go/src/pkg/io/ioutil/ioutil.go:151
#   0x480f6d    io.Copy+0x13d               /usr/local/go/src/pkg/io/io.go:349
#   0x400f0b    main.startClient+0x22b          /home/adsteam/go_workspace/src/suraj.narkhede/stresstester/main.go:55
I have written a simple load tool in go (to evaluate go's http client performance) -
http://play.golang.org/p/MA-a2ZAVNe.

@gopherbot
Copy link
Contributor

Comment 7 by bgmerrell:

jb, it sounds like you were working on something very similar to what I am trying to do.
 Did you ever come up with a work around?

@calmh calmh added longterm Suggested Issues that may be good for new contributors looking for work to do. labels Sep 29, 2014
@marbemac
Copy link

Wondering if any progress has been made on this? I'm seeing similar results when running synthetic benchmarks - does this performance profile hold in production/real scenarios? If so, does using go as a serious reverse proxy become impractical?

@bradfitz
Copy link
Contributor

@marbemac, are you asking about max connections per host (what this bug is about), or the separate issue of mutex contention brought up in Comment 6? If the latter, I'd prefer to move it to a new bug.

@marbemac
Copy link

@bradfitz I'm definitely seeing the first issue, I'm not sure if the issue in comment 6 is related. Basically I'm relatively new to go, and am trying to evaluate wether I can use it to build a decently fast reverse proxy. Running wrk against a go http server that simply returns a hello world string, I get the following (very unscientific) results:

With no proxy
wrk -t2 -c100 -d15s http://127.0.0.1:3000
29k requests/second, 0% failure

With nginx in front
wrk -t2 -c100 -d15s http://127.0.0.1:4205
28k requests/second, 0% failure

With NewSingleHostReverseProxy in front (https://play.golang.org/p/vk33Eg4QOS)
wrk -t2 -c100 -d15s http://127.0.0.1:4204
800 requests/second, 20% failure

Is this typical, or am I doing something wrong in the go code / testing setup. I'm running this on a Macbook Pro, max file descriptors have been increased.

@mentat
Copy link

mentat commented Dec 24, 2014

I'm seeing a similar issue with a go program on both OSX and Linux. With only one concurrent request the proxy is stable, but when adding more it starts to break. My implementation is very simple.

@marbemac
Copy link

@mentat Try playing with these two settings. I'm seeing some improvement with:

runtime.GOMAXPROCS(runtime.NumCPU())
http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 100

I'm curious to hear if it helps in your case, let me know.

@mentat
Copy link

mentat commented Dec 25, 2014

Thanks @marbemac I've pushed out a new version with these settings and will let you know how it does with real-world traffic.

@tj
Copy link

tj commented Jan 27, 2015

Hit this issue in prod, blocking would definitely be preferable to exhausting sockets

@cmdrkeene
Copy link

Hitting this issue as well in production. Was there a better solution here than crank up your MaxIdleConns to fit your workload?

@buth
Copy link

buth commented Oct 8, 2015

The persistConn objects used by the Transport object define a closech to listen for the closing of the underlying connections.

https://github.com/golang/go/blob/master/src/net/http/transport.go#L821

If this is reliable, it seems like you could use it as the basis for a release on a (per-host) semaphore that dialConn would need to aquire in order to create a new connection. As a start on a proposal for limiting total per-host connections, does this sound way off?

@alexellis
Copy link

alexellis commented Jan 27, 2017

I think I'm running into the same issue with a proxy lsof is showing hundreds of "ESTABLISHED" connections even when a response has been sent back to the client:

https://github.com/alexellis/faas/blob/master/watchdog/main.go

Can anyone confirm? @bradfitz

@tleyden
Copy link

tleyden commented Jan 27, 2017

@alexellis don't you just need to add an r.Body.Close() to close the io.ReadCloser? Sent you a PR: openfaas/faas#13

alexellis pushed a commit to openfaas/faas that referenced this issue Jan 27, 2017
@rogchap
Copy link

rogchap commented Feb 2, 2017

@tleyden From the Go http docs:

https://golang.org/pkg/net/http/#Request

// The Server will close the request body. The ServeHTTP
// Handler does not need to.
Body io.ReadCloser

@alexellis
Copy link

@tleyden @rogchap I added a Close call to both ends (client and server were both Golang) which fixed the issue.

@rogchap
Copy link

rogchap commented Feb 2, 2017

@alexellis Yup, you have to call Close on the client. Glad you got it working.

@bradfitz bradfitz added FeatureRequest Issues asking for a new feature that does not need a proposal. and removed LongTerm labels May 24, 2017
@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 May 24, 2017
@ldemailly
Copy link

ldemailly commented Jul 14, 2017

Should I file a separate issue or is this the place to mention: I have code that does:

	ht := http.DefaultTransport.(*http.Transport)
	ht.MaxIdleConns = numConnections
	ht.MaxIdleConnsPerHost = numConnections

and then establishes numConnections simultaneous (go routines) back to back http connections (load testing tool)

if I check the number of sockets, it works as expected where I have numConnections established sockets for numConnections input with smaller numbers (between 1 and 9 or 10) but for 11 and up I have more sockets than the limit:

# with up to 10 it works
$ fortio -qps 0 -c 10 http://localhost:8080
$ netstat -na |awk ' $4 == "::1.8080" { print $0}'|wc -l
10
# with 16 there are more sockets:
$ echo `netstat -na |awk ' $4 == "::1.8080" { print $5}'| sort`
::1.56603 ::1.56604 ::1.56605 ::1.56606 ::1.56607 ::1.56608 ::1.56609 ::1.56610 ::1.56611 ::1.56612 ::1.56613 ::1.56615 ::1.56616 ::1.56617 ::1.56618 ::1.56619 ::1.56621 ::1.56622 ::1.56623 ::1.56624
$ netstat -na |awk ' $4 == "::1.8080" { print $0}'|wc -l
      20

this happens both on go version go1.8.3 darwin/amd64 and go version go1.8.1 linux/amd64
the full code to repro is go get istio.io/istio/devel/fortio/cmd/fortio (branch ldemailly-bench6 )

beside doing my own connection pooling is there a way to guarantee a maximum/exact number of socket used ? (and why do I get a few more than I ask for)

@rsc
Copy link
Contributor

rsc commented Nov 22, 2017

Given that this has been like this since Go 1.2 and still not fixed it seems likely we should just close it. But moving to Go 1.11 at least.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@bradfitz
Copy link
Contributor

I don't want to close this. There are a number of related bugs I'd like to all address.

@luciferous
Copy link

Looks like Transport.MaxConnsPerHost (new in Go1.11) addresses this.

@rsc
Copy link
Contributor

rsc commented Jun 28, 2019

Agree that Transport.MaxConnsPerHost fixes this.

@rsc rsc closed this as completed Jun 28, 2019
@golang golang locked and limited conversation to collaborators Jun 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FeatureRequest Issues asking for a new feature that does not need a proposal. FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Suggested Issues that may be good for new contributors looking for work to do.
Projects
None yet
Development

No branches or pull requests