Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single mode sync #93

Merged
merged 14 commits into from
Jul 31, 2019
Merged

Single mode sync #93

merged 14 commits into from
Jul 31, 2019

Conversation

magik6k
Copy link
Contributor

@magik6k magik6k commented Jul 26, 2019

WIP, DO NOT MERGE yet.

This PR will remove SyncMode distinction from Syncer

This is a hack to try improve the situation when we need to catch up with other peers for whatever reason

Issues this has:


Case:

  • Spawn 3 nodes (M, A, B)
  • Start mining on M
  • Connect A-B (Syncer enters CaughtUp mode)
  • With few blocks mined, connect M-A
  • Sometimes on B block sync will fail with:
2019-07-26T18:04:34.489+0200	ERROR	chain	chain/sync.go:596	failed to get blocks: no usable connection to peer
github.com/filecoin-project/go-lotus/chain.(*Syncer).collectChainCaughtUp
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/sync.go:596
github.com/filecoin-project/go-lotus/chain.(*Syncer).SyncCaughtUp
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/sync.go:481
github.com/filecoin-project/go-lotus/chain.(*Syncer).InformNewHead.func1
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/sync.go:162
2019-07-26T18:04:34.489+0200	ERROR	chain	chain/blocksync.go:109	encountered error while responding to block sync request: blockstore: block not found
github.com/filecoin-project/go-lotus/chain.(*BlockSyncService).processRequest
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/blocksync.go:109
github.com/filecoin-project/go-lotus/chain.(*BlockSyncService).HandleStream
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/blocksync.go:93
github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1
	/home/magik6k/.opt/go/pkg/mod/github.com/libp2p/go-libp2p@v0.2.0/p2p/host/basic/basic_host.go:393
panic: aaa

This is probably because we try to fetch these blocks from A which didn't get them from M yet. We probably shouldn't forward info about new tipsets without fetching them first

@magik6k magik6k changed the title Fix/catch up many Fix some catch-up mode sync issues Jul 26, 2019
@magik6k magik6k changed the title Fix some catch-up mode sync issues Single mode sync Jul 26, 2019
@magik6k magik6k added the chain label Jul 27, 2019
@magik6k magik6k force-pushed the fix/catch-up-many branch 3 times, most recently from a7acc97 to 5c216fe Compare July 30, 2019 17:27
To: cg.receivers[m],
From: cg.banker,

Nonce: atomic.AddUint64(&cg.bankerNonce, 1) - 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.>

Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new sync looks correct to me. I'll review the tests more in a bit to see if everything i'd be worried about is covered

@whyrusleeping
Copy link
Member

So that was probably the most annoying bug i've been the cause and fixer of in a long while.

We were putting messages to the datastore using the cborIpldStore, and then fetching them by the Cid we get from message.Cid(). The problem was, the Cid returned by cst.Put() was different than the one returned by message.Cid(). The cborIpldStore puts things to disk using the cbor-ipld codec only, no matter what the thing we're putting wants to use. The problem with that is that we're currently (in lotus) using the filecoin compact serialization proposal, which has since been changed in the spec. So if we had fixed the types serialization first, we wouldnt have hit this bug. If I had been lazy and not implemented FCS when I did, we wouldnt have hit this bug... sigh.

In any case, I'm really sorry @magik6k that you had to beat your head against that one...

@whyrusleeping
Copy link
Member

Oh, bonus points: The different codecs only caused a few characters to change at the beginning of the Cids, the ends were the same. And I was using the ends to pick out the cid where it was being printed in debug logs in different places. So i thought i was going crazy, when i searched for the end of the cids, they would show up in all the right places (and what i was missing was that the prefixes were different...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants