Skip to content
This repository has been archived by the owner on Oct 4, 2019. It is now read-only.

frankenstein2.1: core,eth,p2p #619

Merged
merged 156 commits into from
Jun 8, 2018
Merged

frankenstein2.1: core,eth,p2p #619

merged 156 commits into from
Jun 8, 2018

Conversation

whilei
Copy link
Contributor

@whilei whilei commented Jun 4, 2018

A huge amount of these changes are integrations "merged" by hand from ethereum/go-ethereum. Unfortunately, cherry-picking and merging (which would have credited original authors appropriately), was impossible because the code bases have diverged significantly.

So, thanks to the fine folks that did the brunt of the heavy lifting. Included by not limited to karalabe, fjl, arachnid,... I'll do my best to document and cite primary sources as much as possible.


Rel at least #603, #604, #605.

A rough reference for sources cited:

reviewed? original author change summary notes link
karalabe eth/dwonloader: moving pivot and capped memory usage ethereum/go-ethereum@62de04c
karalabe eth/downloader: termiante state sync cleanly ethereum/go-ethereum@4cf1ece
fjl eth/downloader: separate state sync from queue ethereum/go-ethereum@0042f13#diff-0a2825161004a664237e6781f3ed25df
rjl493456442 eth/downloader: wait for all fetcher goroutines to exit before terminating ethereum/go-ethereum@de2a7bb
rjl493456442 eth/downloader: flush state sync data before exit ethereum/go-ethereum@1100e8b#diff-0a2825161004a664237e6781f3ed25df
karalabe eth/downloader: save and load trie sync progress this PR includes code from, except this last line where it actually gets stored. ethereum/go-ethereum@ca64a12#diff-c09439241eaa8e1f9a1f085817441925R470. AFAIK this is a nice-to-have but not critical; we could think about doing the same thing but with just a slightly different db key (since we don't have rawdb).
karalabe eth/downloader: don't require state for ancestor lookups this PR does not include this. ethereum/go-ethereum@96dad6b#diff-0a2825161004a664237e6781f3ed25df
fjl eth/downloader: improve deliverNodeData ethereum/go-ethereum@f1069a3#diff-0a2825161004a664237e6781f3ed25df
karalabe eth: propagate blocks and transactions async ethereum/go-ethereum@d9cee2c#diff-74cea5b79d38ad655dd2c0b1279bc7e7
shazow eth, les: allow exceeding maxPeers for trusted peers ethereum/go-ethereum@2e9c8fd#diff-74cea5b79d38ad655dd2c0b1279bc7e7
karalabe core, trie: intermediate mempool between trie and database 👓 ethereum/go-ethereum@55599ee#diff-74cea5b79d38ad655dd2c0b1279bc7e7
tailingchen eth/fetcher: check the origin of filter tasks ethereum/go-ethereum@40a3856#diff-74cea5b79d38ad655dd2c0b1279bc7e7
karalabe eth: use maxpeers from p2p layer instead of extra config ethereum/go-ethereum@b0ca1b6#diff-74cea5b79d38ad655dd2c0b1279bc7e7
tailingchen core, light: send chain events using event.Feed` only bits and pieces of this one ethereum/go-ethereum@bf1e263#diff-74cea5b79d38ad655dd2c0b1279bc7e7. may be worth cutting out?
karalabe eth: don't import propagated blocks during fastsync ethereum/go-ethereum@afdd23b#diff-74cea5b79d38ad655dd2c0b1279bc7e7
Smilenator eth/fetcher: reuse variables for hash and number ethereum/go-ethereum@40a2c52#diff-8d1ac8d3b29ddc2fd9af1a5401a6165f

And a few other other changes I had made earlier that were off-topic, but not worth throwing into their own PRs, IMO.

whilei and others added 30 commits May 28, 2018 14:25
solution: if block number is lower than local with same td, use it

otherwise ok to flip coin if same
solution: use 'bc' instead of 'self'
NOTE: bc.HasHeader, bc.GetHeader, ... etc funcs SHOULD use hash, uint64 signature
b/c more safer. HOWEVER, I just made the incoming implementation use hash
because it was easier and fast and a POC, so far.
solution: fix interfaces and adapt to compromise between
incoming and existing patterns
via karalabe's un-cherry-pickable 4cf1ece5ba7c28e9ef7edabe0f53e5ae1fe37b76
solution: remove that part of test, matches other Unsubscribe tests now
=== RUN   TestDeliverHeadersHang/protocol_64_mode_LIGHT
fatal error: concurrent map iteration and map write

goroutine 984 [running]:
runtime.throw(0x459e978, 0x26)
    /usr/local/Cellar/go/1.9.2/libexec/src/runtime/panic.go:605 +0x95 fp=0xc420497c70 sp=0xc420497c50 pc=0x402dc25
runtime.mapiternext(0xc420497de8)
    /usr/local/Cellar/go/1.9.2/libexec/src/runtime/hashmap.go:778 +0x6f1 fp=0xc420497d08 sp=0xc420497c70 pc=0x400c2d1
runtime.mapiterinit(0x44f3f00, 0xc4207b3590, 0xc420497de8)
    /usr/local/Cellar/go/1.9.2/libexec/src/runtime/hashmap.go:768 +0x270 fp=0xc420497d70 sp=0xc420497d08 pc=0x400b980
github.com/ethereumproject/go-ethereum/eth/downloader.(*peerSet).medianRTT(0xc420138d00, 0x0)
    /Users/ia/gocode/src/github.com/ethereumproject/go-ethereum/eth/downloader/peer.go:566 +0x120 fp=0xc420497e58 sp=0xc420497d70 pc=0x4431fa0
github.com/ethereumproject/go-ethereum/eth/downloader.(*Downloader).qosTuner(0xc420800180)
    /Users/ia/gocode/src/github.com/ethereumproject/go-ethereum/eth/downloader/downloader.go:1624 +0x50 fp=0xc420497fd8 sp=0xc420497e58 pc=0x442e400
runtime.goexit()
    /usr/local/Cellar/go/1.9.2/libexec/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc420497fe0 sp=0xc420497fd8 pc=0x405dc01
created by github.com/ethereumproject/go-ethereum/eth/downloader.New
    /Users/ia/gocode/src/github.com/ethereumproject/go-ethereum/eth/downloader/downloader.go:250 +0xa16

goroutine 1 [chan receive]:

solution: add a peerset un/lock in test
solution: use go test fatal instead of test-killing panic
which means that a client beyond subsequent forks will attempt
to sync with nodes that have not proceeded beyond forks
which our client assumes, causing invalid peers and more-than-necessary
attempts to sync incompatible chains

solution: included requiredHash's for hard forks which have
already occurred
solution: add ephemeral people-name aliases for each peer
solution: check for dl syncing as condition for prefix
defer peerSub.Unsubscribe()
defer func() {
cerr := s.commit(true)
if err == nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I love this, b/c an error can be silently dropped.

👓 PTAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there are a lot of different kind of errors that it could be, including a commit error. Which seems important.

return
}
// Quickly validate the header and propagate the block if it passes
switch err := f.validateBlock(block, parent); err {
switch err := f.verifyHeader(block.Header()); err {
Copy link
Contributor Author

@whilei whilei Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick note that in case you were wondering about the where the hell the parent went from this signature -- the validator fn passed in from the handler sets this per the blockchain validator, moving the responsibility for parent-aware validation logic back to the chain validator itself.

	validator := func(header *types.Header) error {
		return manager.blockchain.Validator().ValidateHeader(header, manager.blockchain.GetHeader(header.ParentHash), true)
	}

eth/handler.go#168

)

var (
forkChallengeTimeout = 15 * time.Second // Time allowance for a node to reply to the DAO handshake challenge
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a new idea (it used to just be hardcoded in the early handler handle() fn requiredHash check piece, just now set to more visible global var. However, it used to be 5 seconds. And you're right, 15 seconds does seem a little high. Open idea.

if err != nil && pm.badBlockReportingEnabled && core.IsValidateError(err) {
go sendBadBlockReport(blocks[i], err)
go sendBadBlockReport(blocks[res.Index], err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open idea on this. Do we actually ever use or need this? If not, we can get rid of this whole function.

)
if next <= current {
infos, _ := json.MarshalIndent(p.Peer.Info(), "", " ")
glog.V(logger.Warn).Infof("%v: GetBlockHeaders skip overflow attack (current %v, skip %v, next %v)\nMalicious peer infos: %s", p, current, query.Skip, next, infos)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👁 Here's worth noticing.


// peerSet represents the collection of active peers currently participating in
// the Ethereum sub-protocol.
type peerSet struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peerSet stuff just moved to own file.

eth/sync.go Outdated
break
case <-forceSync.C:
// Force a sync even if not enough peers are present
go pm.synchronise(pm.peers.BestPeer())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is harmless to call this, even though the downloader might already by syncing. Not sure if it warrants or should have a conditional on if !pm.downloader.Synchronising() or not.

And but the forceSync thing is note worthy too.

eth/sync.go Outdated
// bad block) rolled back a fast sync node below the sync point. In this case
// however it's safe to reenable fast sync.
atomic.StoreUint32(&pm.fastSync, 1)
mode = downloader.FastSync
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a new behavior; if you run geth --fast and don't progress thru downloading full blocks before restarting, then if you run just plain geth, fast sync will resume even if you don't pass the flag. This got me. ⚡️ 👀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange behaviour. So in such case to reastart geth with full sync I need to delete db?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As-is, yes. Or just continue syncing until you reach a pivot point, then it will still automatically switch to full sync and then, like usual, always full sync from there on out.

I agree that it's kind of strange, but maybe that's because we've become so used to being very careful about the --fast flag.

And it is kind of a nice thing to have; for example last week I accidentally made a friend have to restart syncing from 0 because my command instructions for him left the --fast off and he started accidentally full syncing around block 3mm.

Overall, I guess there would be fewer cases when someone wants to start full syncing from a random block in the middle of the chain, compared to when someone starts with --fast and then forgets to use the flag on a restart later.

We could consider adding a --full flag later, maybe? 😜

for intervaled status logs

solution: condtional on mode
Copy link
Member

@r8d8 r8d8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM;
with some minor issues

@whilei
Copy link
Contributor Author

whilei commented Jun 7, 2018

@r8d8 and @tzdybal These ones in particular, please double double check ;)

Mostly I want to be extra sure that state is definitely persisted. At the pivot, during fast, all the times. 👓 Anything with the word trie in it.


@whilei
Copy link
Contributor Author

whilei commented Jun 7, 2018

And @r8d8 did you have other comments or issues besides #619 (review)?

solution: add 'gitlike' to avaible list, and refactor parser
@tzdybal
Copy link
Contributor

tzdybal commented Jun 7, 2018

I think we should exclude the #612 related code from this PR - it's not strictly related, and it's still under discussion.

@tzdybal
Copy link
Contributor

tzdybal commented Jun 8, 2018

Reviewed 3 of 10 files at r1, 11 of 145 files at r3, 43 of 156 files at r5, 2 of 2 files at r6, 2 of 2 files at r7, 1 of 1 files at r8, 2 of 2 files at r9.
Review status: all files reviewed at latest revision, 49 unresolved discussions.


cmd/geth/main.go, line 257 at r9 (raw file):

		if ctx.IsSet(SputnikVMFlag.Name) {
			if core.SputnikVMExists {
				log.Printf("Using the SputnikVM Ethereum Virtual Machine implementation ...")

Why it's removed?


core/blockchain.go, line 1503 at r9 (raw file):

	// If TDs are the same, randomize.
	if tdCompare == 0 {
		// Reduces the vulnerability to bcish mining.

Lol, so it was changed with simple find and replace, not a refactoring tool :D


core/state/sync.go, line 30 at r7 (raw file):

Previously, whilei (ia) wrote…

AFAIK had been a purely vestigal unnecessary casting from an earlier set of changes, which probably should have been removed at the time of that change.

TODO(whilei): grep the history and point to relevant change. Because we want to be very sure that state sync is working properly.

👓 PTAL.

This seems like a result of series of simple, safe, refactorings (rename: trie.TrieSync to trie.Sync, remove unnecessary alias).


eth/handler.go, line 54 at r7 (raw file):

Previously, whilei (ia) wrote…

This is not a new idea (it used to just be hardcoded in the early handler handle() fn requiredHash check piece, just now set to more visible global var. However, it used to be 5 seconds. And you're right, 15 seconds does seem a little high. Open idea.

15 seconds seems a bit crazy for me, even for nodes with crazy slow network and IO.


eth/handler.go, line 195 at r7 (raw file):

Previously, whilei (ia) wrote…

Open idea on this. Do we actually ever use or need this? If not, we can get rid of this whole function.

I'm opting for removing this. It's not needed, the target URL is dead (at least I cant resolve DNS for it).


eth/handler.go, line 757 at r9 (raw file):

				if !pm.downloader.Synchronising() {
					glog.V(logger.Info).Infof("Peer %s: localTD=%v (<) peerTrueTD=%v, synchronising", p.id, localTd, trueTD)
					//go pm.synchronise(pm.peers.BestPeer())

Safe to remove this commented out line?


eth/sync.go, line 152 at r7 (raw file):

Previously, whilei (ia) wrote…

It is harmless to call this, even though the downloader might already by syncing. Not sure if it warrants or should have a conditional on if !pm.downloader.Synchronising() or not.

And but the forceSync thing is note worthy too.

pm.synchronise is not synchronised. There is a lot of atomic operations in it, on multiple variables, and it leads to various data hazards. Maybe conditions make it actually safe, but it's too complex to analyse.

IMHO we should add if !pm.downloader.Synchronising() guard.


eth/sync.go, line 212 at r9 (raw file):

		glog.V(logger.Info).Infoln("Fast sync complete, auto disabling")
		glog.D(logger.Info).Infoln("Fast sync complete, auto disabling")
		atomic.StoreUint32(&pm.fastSync, 0)

IMHO this is wrong use of atomic variable.

This should be corrected to:

if atomic.CompareAndSwapUint32(&pm.fastSync, 1, 0) {
    glog.V(logger.Info).Infoln("Fast sync complete, auto disabling")
    glog.D(logger.Info).Infoln("Fast sync complete, auto disabling")
}

eth/sync.go, line 214 at r9 (raw file):

		atomic.StoreUint32(&pm.fastSync, 0)
	}
	atomic.StoreUint32(&pm.acceptsTxs, 1) // Mark initial sync done

say twice, say twice?

atomic.StoreUint32(&pm.acceptsTxs, 1) // Mark initial sync done


eth/downloader/downloader.go, line 393 at r7 (raw file):

Previously, whilei (ia) wrote…

IMO it's almost always better to return an error when possible over a bool. It just keeps options open for descriptive and aware error handling and state management.

👍


eth/downloader/downloader.go, line 1029 at r7 (raw file):

Previously, whilei (ia) wrote…

Here's something to maybe look at later. Massive signatures like this are becoming a red flag for me.

👍 refactoring required (may be in separate PR)


eth/downloader/statesync.go, line 284 at r7 (raw file):

Previously, whilei (ia) wrote…

And there are a lot of different kind of errors that it could be, including a commit error. Which seems important.

IMHO only commit error can be dropped.

Do we really need to commit, even if err != nil?


Comments from Reviewable

@tzdybal
Copy link
Contributor

tzdybal commented Jun 8, 2018

Made my comments. I think we can make fixes after merging, in separate PR, if you prefer that.

@whilei
Copy link
Contributor Author

whilei commented Jun 8, 2018

cmd/geth/main.go, line 257 at r9 (raw file):

Previously, tzdybal (Tomasz Zdybał) wrote…

Why it's removed?

I just herded the log line to the place where the other configuration log line ares.

@@ -776,8 +781,23 @@ func logChainConfiguration(ctx *cli.Context, config *core.SufficientChainConfig)

        glog.V(logger.Info).Infof("Using %d configured bootnodes", len(config.ParsedBootstrap))
        glog.D(logger.Warn).Infof("Using %d configured bootnodes", len(config.ParsedBootstrap))
+
+       glog.V(logger.Info).Infof("Use Sputnik EVM: %s", logger.ColorGreen(fmt.Sprintf("%v", core.UseSputnikVM)))
+       glog.D(logger.Warn).Infof("Use Sputnik EVM: %s", logger.ColorGreen(fmt.Sprintf("%v", core.UseSputnikVM)))


Comments from Reviewable

@whilei
Copy link
Contributor Author

whilei commented Jun 8, 2018

core/blockchain.go, line 1503 at r9 (raw file):

Previously, tzdybal (Tomasz Zdybał) wrote…

Lol, so it was changed with simple find and replace, not a refactoring tool :D

Not to worry, there was a tool. I was just moving fast and didn't take the time to exclude all proper comments (turns out self shows up a fair amount in prose ;)


Comments from Reviewable

@whilei
Copy link
Contributor Author

whilei commented Jun 8, 2018

eth/handler.go, line 54 at r7 (raw file):

Previously, tzdybal (Tomasz Zdybał) wrote…

15 seconds seems a bit crazy for me, even for nodes with crazy slow network and IO.

Agreed. IMO this is directly related to the requiredHash question... because if we're going to possibly invest large chunks of time waiting for this information, it seems like we'd want to use all available information to only invest in that check when it's directly relevant to us.

I'm going to leave this as-is for now, and then we'll just integrate a change or leave-be with that PR #612.


Comments from Reviewable

@whilei
Copy link
Contributor Author

whilei commented Jun 8, 2018

eth/handler.go, line 195 at r7 (raw file):

Previously, tzdybal (Tomasz Zdybał) wrote…

I'm opting for removing this. It's not needed, the target URL is dead (at least I cant resolve DNS for it).

Great. I will make a separate PR ;)


Comments from Reviewable

solution: check if dler is already syncing. just log debug if busy
solution: use CompareAndSwap instead of load and store
@whilei whilei merged commit ad99cbf into master Jun 8, 2018
@soc1c soc1c deleted the frankenstein/sync branch June 19, 2019 12:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants