Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Connections not fully closed #21

Closed
Stebalien opened this issue Sep 29, 2017 · 8 comments · Fixed by libp2p/go-peerstream#17
Closed

Connections not fully closed #21

Stebalien opened this issue Sep 29, 2017 · 8 comments · Fixed by libp2p/go-peerstream#17
Assignees
Labels
P0 Critical: Tackled by core team ASAP

Comments

@Stebalien
Copy link
Member

  • Lots of ServeHTTP go routines are getting left around.
  • Lots of half-closed TCP connections aren't getting closed.
@Stebalien Stebalien self-assigned this Sep 29, 2017
@whyrusleeping whyrusleeping added P0 Critical: Tackled by core team ASAP status/ready Ready to be worked labels Oct 17, 2017
@Stebalien
Copy link
Member Author

Note: Unlike the other connection closing issue, this was not a symptom of that secio change and/or a lack of connection closing. This is still a big.

@ghost
Copy link

ghost commented Nov 26, 2017

This is after 1.5w uptime on mercury:

> curl ipfs.io/ipfs/QmNV4836azrGSWYmoJKk3bFFUsDA26fEBqyPiMJDMaRZ7X | stackparse --summary | tac
gx/ipfs/QmdicniYXy49AFCSVTZeYmo997kT5XJRzZLokzJCrywR2U/go-ws-transport.(*listener).ServeHTTP                   40477
gx/ipfs/QmVxf27kucSvCLiCq6dAXjDU2WG3xZN9ae7Ny6osroP28u/yamux.(*Stream).Read                                    1823
internal/poll.runtime_pollWait                                                                                 1299
gx/ipfs/QmVxf27kucSvCLiCq6dAXjDU2WG3xZN9ae7Ny6osroP28u/yamux.(*Session).AcceptStream                           1262
gx/ipfs/QmVxf27kucSvCLiCq6dAXjDU2WG3xZN9ae7Ny6osroP28u/yamux.(*Session).send                                   1262
github.com/ipfs/go-ipfs/exchange/bitswap.(*msgQueue).runQueue                                                  1260
gx/ipfs/QmVxf27kucSvCLiCq6dAXjDU2WG3xZN9ae7Ny6osroP28u/yamux.(*Session).keepalive                              1238
gx/ipfs/QmTKsRYeY4simJyf37K93juSq75Lo8MVCDJ7owjmf46u8W/go-context/io.(*ctxReader).Read                         997
gx/ipfs/QmUUSLfvihARhCxxgnjW4hmycJpPvzNu12Aaz6JWVdfnLg/go-libp2p-floodsub.(*PubSub).handleSendingMessages      437
gx/ipfs/QmVxf27kucSvCLiCq6dAXjDU2WG3xZN9ae7Ny6osroP28u/yamux.(*Session).Ping                                   24
gx/ipfs/Qmd7RwckBhtLfLCNDLgUrxmi6qs4Ldxjb6qwX7XmKZ8VCm/go-multiplex.(*Multiplex).Accept                        22
gx/ipfs/Qmd7RwckBhtLfLCNDLgUrxmi6qs4Ldxjb6qwX7XmKZ8VCm/go-multiplex.(*Stream).waitForData                      14
github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/briantigerchow/pubsub.(*PubSub).start                 11
gx/ipfs/QmSF8fPo3jgVBAy8fpdjjYqgG87dkJgUprRBHRd2tmfgpP/goprocess/context.CloseAfterContext.func1               10
github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).taskWorker                                                 8
gx/ipfs/QmdQFrFnPrKRQtpeHKjZ3cVNwxmGKKS2TvhJTuN9C9yduh/go-libp2p-swarm.(*Swarm).addConnListener.func2          4
gx/ipfs/QmTi4629yyHJ8qW9sXFjvxJpYcN499tHhERLZYdUqwRU9i/go-libp2p-conn.(*listener).Accept                       4
sync.runtime_SemacquireMutex                                                                                   4
main.merge.func1                                                                                               3
github.com/ipfs/go-ipfs/core/corehttp.Serve                                                                    2
gx/ipfs/QmX3QZ5jHEPidwUrymXV1iSCSUhdGxj15sm2gP4jKMef7B/client_golang/prometheus.computeApproximateRequestSize  2
gx/ipfs/QmSF8fPo3jgVBAy8fpdjjYqgG87dkJgUprRBHRd2tmfgpP/goprocess/periodic.callOnTicker.func1                   2
main.daemonFunc                                                                                                1
gx/ipfs/QmPgDWmTmuzvP7QE5zwo1TmjbJme9pmZHNujB2453jkCTr/go-libp2p-peerstore/queue.(*ChanQueue).process.func1    1
gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).compactionError                 1
gx/ipfs/QmWRBYr99v8sjrpbyNWMuGkQekn7b9ELoLSCe8Ny7Nxain/go-libp2p-kad-dht.(*dhtQueryRunner).spawnWorkers        1
gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).mCompaction                     1
github.com/ipfs/go-ipfs/namesys/republisher.(*Republisher).Run                                                 1
github.com/ipfs/go-ipfs/core/corerepo.PeriodicGC                                                               1
github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).provideWorker                                              1
github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).provideCollector                                           1
gx/ipfs/QmUTshC2PP4ZDqkrFfDU4JGJFMWjYnunxPgkQ6ZCA2hGqh/go-ds-flatfs.(*Datastore).walk                          1
gx/ipfs/QmS4L8WB9RLZLu9YbS19cHJVjQnuvTyGaGKs75DtmX4Jyo/go-reuseport/singlepoll.worker                          1
gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).tCompaction                     1
gx/ipfs/QmWRBYr99v8sjrpbyNWMuGkQekn7b9ELoLSCe8Ny7Nxain/go-libp2p-kad-dht.(*dhtQueryRunner).Run                 1
github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).providerQueryManager                                       1
gx/ipfs/QmWRBYr99v8sjrpbyNWMuGkQekn7b9ELoLSCe8Ny7Nxain/go-libp2p-kad-dht.(*IpfsDHT).Provide                    1
github.com/ipfs/go-ipfs/blocks/blockstore.bloomCached.func1                                                    1
gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).mpoolDrain                      1
gx/ipfs/QmVSase1JP7cq9QkPT46oNwdp9pT6kBkG3oqS14y3QcZjG/go-datastore/query.ResultsWithChan.func1                1
gx/ipfs/QmZfwmhbcgSDGqGaoMMYx8jxBGauZw75zPjnZAyfwPso7M/go-libp2p-secio.readMsgCtx                              1
syscall.Syscall6                                                                                               1
github.com/ipfs/go-ipfs/mfs.(*Republisher).Run                                                                 1
runtime/pprof.writeGoroutineStacks                                                                             1
syscall.Syscall                                                                                                1
github.com/ipfs/go-ipfs/commands.(*ChannelMarshaler).Read                                                      1
github.com/ipfs/go-ipfs/commands/http.internalHandler.ServeHTTP.func2                                          1
sync.runtime_Semacquire                                                                                        1
github.com/ipfs/go-ipfs/exchange/reprovide.(*Reprovider).muteTrigger.func1                                     1
gx/ipfs/QmUUSLfvihARhCxxgnjW4hmycJpPvzNu12Aaz6JWVdfnLg/go-libp2p-floodsub.(*PubSub).processLoop                1
gx/ipfs/QmdicniYXy49AFCSVTZeYmo997kT5XJRzZLokzJCrywR2U/go-ws-transport.(*listener).Accept                      1
github.com/ipfs/go-ipfs/exchange/bitswap.New.func2                                                             1
gx/ipfs/Qmf7GSJ4omRJsvA9uzTqzbnVhq4RWLPzjzW4xJzUta4dKE/go-libp2p-circuit.(*RelayListener).Accept               1
github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).rebroadcastWorker                                          1
gx/ipfs/QmWRBYr99v8sjrpbyNWMuGkQekn7b9ELoLSCe8Ny7Nxain/go-libp2p-kad-dht.(*IpfsDHT).Bootstrap.func1            1
gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb.(*DB).jWriter                         1
runtime.gopark                                                                                                 1
github.com/ipfs/go-ipfs/exchange/bitswap.(*WantManager).Run                                                    1
github.com/ipfs/go-ipfs/exchange/bitswap/decision.(*Engine).nextEnvelope                                       1
gx/ipfs/QmWRBYr99v8sjrpbyNWMuGkQekn7b9ELoLSCe8Ny7Nxain/go-libp2p-kad-dht/providers.(*ProviderManager).run      1
main.daemonFunc.func1                                                                                          1
gx/ipfs/QmXZYkfBN1cABhBZRaEwLzgEB5B3nAGiJYCmhWbDW3cDus/go-peerstream.(*Swarm).connGarbageCollect               1
main.(*IntrHandler).Handle.func1                                                                               1
gx/ipfs/QmSF8fPo3jgVBAy8fpdjjYqgG87dkJgUprRBHRd2tmfgpP/goprocess.(*process).CloseAfterChildren                 1
gx/ipfs/QmSpJByNKFX1sCsHBEp3R73FL4NF6FnQTEGyNAXHm2GS52/go-log.(*MirrorWriter).logRoutine                       1
gx/ipfs/QmbBhyDKsY4mbY6xsKt3qu9Y7FPvMJ6qbD8AMjYYvPRw1g/goleveldb/leveldb/util.(*BufferPool).drain              1
os/signal.signal_recv                                                                                          1
github.com/ipfs/go-ipfs/blocks/blockstore.(*blockstore).AllKeysChan.func1                                      1

@Stebalien
Copy link
Member Author

This is really an infuriating bug.

  1. I expect all of these connections must be using the go-multiplex multiplexer (because JS doesn't support yamux).
  2. I don't see any active go-multiplex sessions.

This means that either:

  1. They were never started.
  2. They were successfully closed.

If no multiplex session had been started, I would have expected something to call Close on the connection (killing the ServeHTTP loop).

Otherwise, if the sessions had been closed, they should have called Close on the underlying connection which should also have stopped this loop.


Is there anything in the logs about "failed to convert nconn.{Local,Remote}Addr: ..."?

Stebalien added a commit that referenced this issue Nov 29, 2017
Maybe related to #21. Probably not.
@Stebalien
Copy link
Member Author

So I have to keep checking this..., this is stuck waiting on the context, not Upgrade (so it does successfully accept the connection).

@ghost
Copy link

ghost commented Dec 3, 2017

Is there anything in the logs about "failed to convert nconn.{Local,Remote}Addr: ..."?

Yes a few, it's all just earth dialing out though. Right now there's no significant leak happening though. We should look again when there is.

$ ./provsn exec all 'docker logs ipfs |& grep nconn'
earth: 06:20:32.570  INFO     swarm2: got error on dial to /ip4/218.93.29.234/tcp/42180: <peer.ID SoLer2> --> <peer.ID dY2iJ6> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 08:00:49.849  INFO     swarm2: got error on dial to /ip4/89.2.169.174/tcp/43502: <peer.ID SoLer2> --> <peer.ID RejvYq> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 11:26:13.003  INFO     swarm2: got error on dial to /ip4/188.163.24.134/tcp/10882: <peer.ID SoLer2> --> <peer.ID b9cFYH> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 13:13:50.852  INFO     swarm2: got error on dial to /ip4/172.58.139.223/tcp/51105: <peer.ID SoLer2> --> <peer.ID THM9m1> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 13:13:51.351  INFO     swarm2: got error on dial to /ip4/24.5.84.9/tcp/4001: <peer.ID SoLer2> --> <peer.ID bQbGCL> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 14:26:45.124  INFO     swarm2: got error on dial to /ip4/36.83.133.118/tcp/55257: <peer.ID SoLer2> --> <peer.ID NYMf7r> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 14:55:59.680  INFO     swarm2: got error on dial to /ip4/91.213.44.235/tcp/52835: <peer.ID SoLer2> --> <peer.ID SVsNvD> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 14:55:59.680  INFO     swarm2: got error on dial to /ip4/91.213.44.235/tcp/60505: <peer.ID SoLer2> --> <peer.ID SVsNvD> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 16:51:12.488  INFO     swarm2: got error on dial to /ip4/2.93.45.11/tcp/59933: <peer.ID SoLer2> --> <peer.ID Wp8xkw> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 16:51:58.897  INFO     swarm2: got error on dial to /ip4/60.181.14.82/tcp/7922: <peer.ID SoLer2> --> <peer.ID Ww4ezg> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318
earth: 22:03:58.286  INFO     swarm2: got error on dial to /ip4/52.52.198.66/tcp/4001: <peer.ID SoLer2> --> <peer.ID Py8YyK> dial attempt failed: failed to convert nconn.RemoteAddr: nil multiaddr swarm_dial.go:318

@Stebalien
Copy link
Member Author

Stebalien commented Dec 3, 2017

So, this appears to happen when a node connects, disconnects, and then reconnects.

To reproduce:

> iptb init --ws -n 2
> iptb start 1
# repeatedly call:
> iptb start 0 && iptb connect 0 1 && iptb run 0 ipfs swarm disconnect /ip4/127.0.0.1/tcp/43015/ws/ipfs/QmQaebe5jbtHxVbCgRQfkeNLeqPThDPBcV9FLmS2Vd7c7R && iptb connect 0 1 && iptb stop 0

IPFS node 1 will leak ServeHTTP calls.

@Stebalien
Copy link
Member Author

Yes a few, it's all just earth dialing out though. Right now there's no significant leak happening though. We should look again when there is.

I should have fixed any leak there in #27.

@Stebalien
Copy link
Member Author

So, this was actually a bug in go-peerstream. It was failing to close connections after failing to negotiate a stream muxer.

I tried to also fix this in this package by modifying ServeHTTP to select on r.Context().Done() and the local ctx but that doesn't work because the request's context doesn't get canceled until after ServeHTTP ends. Unfortunately, I'm pretty sure that there is no way to fix this from this package.

So, the new rule is, if you get an error on a connection, you must close it.

@ghost ghost removed the status/ready Ready to be worked label Dec 4, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P0 Critical: Tackled by core team ASAP
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants