Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Weave (1.0.3) ran out of memory #1452

Closed
MaximeHeckel opened this issue Sep 21, 2015 · 5 comments
Closed

Weave (1.0.3) ran out of memory #1452

MaximeHeckel opened this issue Sep 21, 2015 · 5 comments
Assignees
Milestone

Comments

@MaximeHeckel
Copy link

We recently encountered a out of memory issue with weave 1.0.3, which was supposed to fix it.
Here are the full logs of what happened:

https://gist.github.com/MaximeHeckel/e551bf4648b9539f46fb

It happened twice today while trying to implement the -initpeercount to our weave-daemon to handle connection issues

@rade
Copy link
Member

rade commented Sep 21, 2015

Any idea for how long weave was running before it died?

Is the (dead) container still around? If so, please run docker inspect on it.

@MaximeHeckel
Copy link
Author

It just restarted, so it was pretty new.
No, sadly the node has been cleaned and terminated almost an hour ago.

@rade
Copy link
Member

rade commented Sep 23, 2015

Some more info which may be relevant:

That host had a peer count value of 7 and 7 nodes were deployed.
This happened another time on another node (exact same logs) who had a peer count value of 5 (because only 5 nodes were launched at that time)

I’m testing on DO instances of 1GB of memory (what our users are using the most), so I guess it must have reached the limit.

@rade rade added the bug label Sep 23, 2015
@rade rade added this to the 1.1.1 milestone Sep 23, 2015
@bboreham
Copy link
Contributor

Some things are evident from inspection of the log:

  • the allocation which fails is for 320 bytes, so
  • some goroutine numbers are in the 600,000s, but there are only 70 goroutines listed, which suggests a lot of churn
  • there are only 8 connection attempts listed in this log excerpt
  • none of the other goroutines appear to be doing anything surprising

@bboreham
Copy link
Contributor

Places where weave 1.0.3 creates goroutines:

  • LocalConnection.Start() - logs "completed handshake; using protocol version ..."
  • LocalConnection.Shutdown() - no logging
  • LocalConnection.run() - always comes after "completed handshake" log as above
  • ConnectionMaker.Start() - no logging; only called once
  • ConnectionMaker.connectToTargets() - logs "attempting connection"
  • Forwarder.Start() - no logging
  • ForwarderDF.Start() - no logging
  • GossipSender.Start() - no logging
  • LocalPeer.Start() - no logging; never exits; only called once
  • Router.sniff() - logs "Sniffing traffic"; never exits
  • Router.listenTCP() - no logging; never exits; only called once
  • Router.listenUDP() - no logging; only called once
  • Routes.Start() - no logging; never exits; only called once
  • Allocator.Start() - no logging; never exits; only called once

So, if something is causing a lot of goroutines to be created and destroyed with no logging, it must be one of:

  • LocalConnection.Shutdown()
  • Forwarder.Start()
  • ForwarderDF.Start()
  • GossipSender.Start()

@rade rade modified the milestone: 1.1.1 Oct 4, 2015
@rade rade modified the milestone: 1.3.0 Oct 29, 2015
@rade rade self-assigned this Nov 18, 2015
@rade rade modified the milestones: 1.3.1, 1.4.0 Nov 18, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants