-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bitswap: clean up ledgers when disconnecting #3437
Conversation
// TODO: release ledger | ||
e.lock.Lock() | ||
defer e.lock.Unlock() | ||
l, ok := e.ledgerMap[p] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not locking l.lk here, and again we have situation with two locks. It shouts to me deadlock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the ledger lock. Re locking concerns, these ones are well scoped. the engine lock is either always held first, or not held while taking the ledger lock. And the engine lock is never taken while holding a ledger lock.
73be519
to
3c34085
Compare
Other concern: what if PeerConnected gets the instance but can't acquire lock for ledger as it is locked by PeerDisconnected. Then PeerConnected will increase value on ledger that is not longer in ledger map. |
@Kubuxu hrm... for that to happen, One option is to not use |
I know that it is something that might happen very rarely or never but those edge cases add up and create hard to track down bugs and instability. If chance of this bug occuring is 0.00001% then chance that it will occur across 100000 runs is more than 60% and if we don't stop possibly introducing bugs like that go-ipfs will be always unstable and unreliable. |
@Kubuxu Right, So i think the solution is to make |
3c34085
to
efb2e39
Compare
So now it is thread safe, but does function |
Also I am still not a fan of those two locks as some not really connected change can introduce deadlock (locking for engine while holding ledger some ledger) and we might not catch it when we introduce it. We should really look into Actor oriented communication and how bad/good it will be. |
efb2e39
to
22ba9bb
Compare
I rebased it to run coverage on it. |
It isn't tested anywhere, it might be worth to do that. |
@Kubuxu this part of the code used to be actor oriented,and was vastly more
complicated, and much more difficult to get working properly. In addition
to requiring a very large amount of coroutines to get running.
…On Fri, Dec 9, 2016, 09:46 Jakub Sztandera ***@***.***> wrote:
It isn't tested anywhere, it might be worth to do that.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3437 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABL4HPVEbO65g29cnw4M7ZJxKLHcftjJks5rGZPtgaJpZM4K-d__>
.
|
I am positive that we can make it clean and not so complicated with enough layers of sugarcoating. I am just almost sure that we will introduce deadlock around this place sooner or later and it won't be diagnosed for a long time as reproduction of this will be almost impossible. Also for someone to report deadlock like this one he would have to 1. encounter this deadlock 2. don't try resetting the node 3. capture goroutine dump 4. have us find those blocked routines on this lock. I miss Java's features in this regard. This change LGTM if I get some tests. In case of not directly sharness tested features I would like the |
Ok, it shows as if there was no coverage due to lack of cross package cover testing. |
22ba9bb
to
5064976
Compare
Can I add the RFM label here? Let's continue the locking discussion in #3506. |
License: MIT Signed-off-by: Jeromy <why@ipfs.io>
License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>
5064976
to
331e60b
Compare
License: MIT
Signed-off-by: Jeromy why@ipfs.io