Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FindProviders needs a timeout #400

Closed
whyrusleeping opened this issue Dec 2, 2014 · 10 comments
Closed

FindProviders needs a timeout #400

whyrusleeping opened this issue Dec 2, 2014 · 10 comments
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@whyrusleeping
Copy link
Member

In some 'real world' testing ive been find a large number of goroutines stuck waiting for 'FindProviders' requests to come back, looking into this, There is no timeout set on those requests (sent off into goroutines) so they wait forever for a request from the service. This causes an ugly memory leak of goroutines. What do you guys think an appropriate way to handle this is?

@whyrusleeping whyrusleeping added the kind/bug A bug in existing code (including security flaws) label Dec 2, 2014
@btc
Copy link
Contributor

btc commented Dec 2, 2014

Would passing the context down into the function work? Or does the callstack hit a wall where a message is passed over a channel?

@whyrusleeping
Copy link
Member Author

Well, we do have a context passed down. but the passed in context is not guaranteed to ever be cancelled

@btc
Copy link
Contributor

btc commented Dec 2, 2014

The language won't prevent you from extending FindProviders with its own timeout, but early termination is not really its call to make. That's the caller's responsibility.

There are three types of contexts

  1. request-scoped contexts that only exist during a single request

  2. system-scoped contexts that persist throughout the lifetime of the program (async workers respect these)

  3. orphaned TODO/Background contexts These have no rightful place in the system except for the top of main.

Are we seeing instances of 2 and 3 along the callpath to this subsystem where they shouldn't be?

@whyrusleeping
Copy link
Member Author

yeah, we are seeing 2 and 3 in the FindProviders calls

@btc
Copy link
Contributor

btc commented Dec 2, 2014

yeah, we are seeing 2 and 3 in the FindProviders calls

I didn't see 2 or 3 in the FindProviders/GetProviders calls. Maybe I'm misunderstanding what you're seeing. In any case, I took a look at code and fixed a couple things that stood out.

1
Functions called by FindProviderAsync were not respecting the context. These two commits address this by ensuring all functions inside of FPAsync respect it to the best of their abilities:

2
GetProviders was implemented using the promise pattern but with an unbuffered response channel. This causes the sender to hang when receivers aren't around to pick up the value. When receivers respect a context, this is a problem. A subtle detail, but one that is important for avoiding leaked goroutines. Addressed here:

image

@jbenet
Copy link
Member

jbenet commented Dec 5, 2014

The language won't prevent you from extending FindProviders with its own timeout, but early termination is not really its call to make. That's the caller's responsibility.

Agreed fully. This is particularly important when wanting to work in high-latency-low-bandwidth networks (mobile everywhere outside the US).

GetProviders was implemented using the promise pattern but with an unbuffered response channel.

Shouldn't this really be:

select {
case pm.getprovs <- gp:
  select {
  case resp := <- gp.resp:
    return resp
  case <-ctx.Done():
    return nil
  }

case <-ctx.Done():
  return nil
}

Or to make it more readable:

// send out async get providers request
select {
case <-ctx.Done():
  return nil
case pm.getprovs <- gp:
}

// receive a request response
select {
case <-ctx.Done():
  return nil

case resp := <- gp.resp:
  return resp
}

@btc
Copy link
Contributor

btc commented Dec 5, 2014

Shouldn't this really be

resp is buffered and is locally/stack-allocated with this function as the only sender, so it acts like a dropbox. sender will never block. that's the crux of the change. i wish i knew how to make this clearer. perhaps through some conventional name? perhaps with another comment?

@jbenet
Copy link
Member

jbenet commented Dec 5, 2014

resp is buffered and is locally/stack-allocated with this function as the only sender

no this function is the only reader. whatever is at the other side of pm.getprovs decides when to send into this channel. If it never sends anything, this goroutine blocks forever.

@btc
Copy link
Contributor

btc commented Dec 6, 2014

no this function is the only reader. whatever is at the other side of pm.getprovs decides when to send into this channel. If it never sends anything, this goroutine blocks forever.

my mistake. will fix using this one:

// send out async get providers request
select {
case <-ctx.Done():
  return nil
case pm.getprovs <- gp:
}

// receive a request response
select {
case <-ctx.Done():
  return nil

case resp := <- gp.resp:
  return resp
}

@jbenet
Copy link
Member

jbenet commented Jan 5, 2015

Is this fixed on master? pls reopen if not.

@jbenet jbenet closed this as completed Jan 5, 2015
ariescodescream pushed a commit to ariescodescream/go-ipfs that referenced this issue Oct 23, 2021
feat: allow disabling value and provider storage/messages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

3 participants