Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent wantmanager from leaking goroutines (and memory) #1356

Merged
merged 2 commits into from
Jun 12, 2015

Conversation

whyrusleeping
Copy link
Member

got tired of not being able to figure out the spdy problem. So I went to something that I could solve.

the context for the runQueue is actually just the context for the entire wantmanager. It never really gets cancelled before the daemon shuts down. I added a timeout to the Connect and Send calls within the runQueue to prevent this from happening.

This still leaves one problem unsolved, what do we do if we cannot connect to a given peer, but bitswap insists on sending them a message? (line 157)

@GitCop
Copy link

GitCop commented Jun 11, 2015

There were the following issues with your Pull Request

  • Commit: c6b9422
    • Invalid signoff. Commit message must end with License: MIT
      Signed-off-by: .* <.*>

Guidelines are available at: https://github.com/ipfs/community/blob/master/docs/commit-message.md Your feedback on GitCop is welcome on the following issue: ipfs/infra#23


This message was auto-generated by https://gitcop.com

@whyrusleeping whyrusleeping added the status/in-progress In progress label Jun 11, 2015
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
@whyrusleeping
Copy link
Member Author

this failure: https://travis-ci.org/ipfs/go-ipfs/jobs/66410801 (ive seen it before, its unrelated) worries me. I'm not sure what it means.

// this includes looking them up in the dht
// dialing them, and handshaking
conctx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood that it's important to have a timeout here, but 1min is way too short. 15-30min is more ok. these timeouts affect people in the worst of situations, where a pageload could take >5min. It's awful, yes, but that's the rest of the world. We must work there too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, ten minutes then? tcp times out at 5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Thu, Jun 11, 2015 at 01:26:40PM -0700, Jeromy Johnson wrote:

+func (mq *msgQueue) doWork(ctx context.Context) {

  • // allow a minute for connections
  • // this includes looking them up in the dht
  • // dialing them, and handshaking
  • conctx, cancel := context.WithTimeout(ctx, time.Minute)
  • defer cancel()

alright, ten minutes then? tcp times out at 5.

Do we need explicit sub-part timeouts at all? Can't the caller
specify their own timeout when they submit a task? Looking into the
context handling here, runQueue gets launched with one context, and it
passes that context down to doWork. But maybe the doWork context
should be a union of the runQueue context and a second context that
gets passed in via the channel? With a check in runQueue to see if
any context-related doWork errors were due to a msgQueue-context
timeout/cancel (in which msgQueue should close down) or the
channel-passed context (in which case pass that error back to the
caller?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need explicit sub-part timeouts at all? Can't the caller
specify their own timeout when they submit a task?

we could if all the callers/roots had timeouts enforced somehow. we dont do this perfectly yet.

But maybe the doWork context
should be a union of the runQueue context and a second context that
gets passed in via the channel?

could use https://github.com/jbenet/go-context/blob/master/ext/dagctx.go for this.

@whyrusleeping whyrusleeping force-pushed the fix/bitswap-leak branch 2 times, most recently from 9e8e6e5 to e014a66 Compare June 11, 2015 23:43
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
jbenet added a commit that referenced this pull request Jun 12, 2015
prevent wantmanager from leaking goroutines (and memory)
@jbenet jbenet merged commit 41785f3 into master Jun 12, 2015
@jbenet jbenet removed the status/in-progress In progress label Jun 12, 2015
@jbenet jbenet deleted the fix/bitswap-leak branch June 12, 2015 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants