Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: remove unnecessary fields in log #17106

Merged
merged 5 commits into from
Feb 21, 2019

Conversation

rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented Jun 30, 2018

This PR drop some unnecessary fields in Recepit, TxLookup and TransactionLog structs to save database storage.

@rjl493456442 rjl493456442 changed the title core: remove unnecessary fields in log WIP core: remove unnecessary fields in log Jun 30, 2018
@rjl493456442 rjl493456442 force-pushed the downsize_log branch 2 times, most recently from 19c7446 to 87f5938 Compare July 2, 2018 07:34
@karalabe
Copy link
Member

karalabe commented Jul 2, 2018

The interesting question we need to figure out is how to do the upgrade path. The code as is in theory works, but is neither forward nor backward compatible.

The way we did seamless upgrades until now was to support both formats (for a few geth releases) and run a background thread that makes the conversion. I think it would be important to at least implement support for the combo-format. The database upgrade is not that important since pruning might require a resync anyway, but its still important for auto-updating nodes to remain operational.

So, what would be essential for this PR is to expand the rawdb.ReadReceipts method, so it tries to decode the receipt in the new format... but if it fails, it will try to decode the old format too before erroring out. This would ensure that a node which starts running this PR on top of an old database will remain operational.

Downgrade of course is not possible, so this PR needs a major version bump, but seamless upgrade (or at least continuous operation) is essential.

@karalabe karalabe added this to the 1.9.0 milestone Jul 2, 2018
core/types/log.go Outdated Show resolved Hide resolved
@AlexeyAkhunov
Copy link
Contributor

Is there a corresponding PR for receipts? Because you can remove some fields there too, including bloom filters. Here is what I have done in Turbo-geth: AlexeyAkhunov@017a9e8

@rjl493456442
Copy link
Member Author

@AlexeyAkhunov Thank you for the reference. Will add the optimization to my PR!

@rjl493456442 rjl493456442 force-pushed the downsize_log branch 4 times, most recently from aa441f0 to bfad48f Compare July 6, 2018 12:30
eth/backend.go Outdated Show resolved Hide resolved
@holiman
Copy link
Contributor

holiman commented Jan 9, 2019

This is problematic..
So with the db versioning,

  • If I start this PR with 4 on top of old db, 0, it will not show me a log that it does indeed update the db, but it will write an rlp 4. Subsequent tries will discover the same id.
  • If I then go back and use an older version, it will for some reason read that 4 as a 0, and just try to overwrite it again with a 3.
  • If I then start this PR again, it will read that attempted 3 as a 0, and again just silently overwrite it.

So the db versioning does not work, at all :(

@rjl493456442 rjl493456442 changed the title WIP core: remove unnecessary fields in log core: remove unnecessary fields in log Jan 22, 2019
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to include the db version in the log output, even if skipDbVersionCheck is used -- having it in the logs will help us debug potential errors if people switch back and forth between the broken one which overwrites the version number with nil.

I suggest the following modification:

diff --git a/eth/backend.go b/eth/backend.go
index 2a9d56c5c..0b3625c41 100644
--- a/eth/backend.go
+++ b/eth/backend.go
@@ -139,16 +139,20 @@ func New(ctx *node.ServiceContext, config *Config) (*Ethereum, error) {
 		bloomIndexer:   NewBloomIndexer(chainDb, params.BloomBitsBlocks, params.BloomConfirms),
 	}
 
-	log.Info("Initialising Ethereum protocol", "versions", ProtocolVersions, "network", config.NetworkId)
+	bcVersion := rawdb.ReadDatabaseVersion(chainDb)
+	var dbVer = "<nil>"
+	if bcVersion != nil {
+		dbVer = fmt.Sprintf("%d", *bcVersion)
+	}
+	log.Info("Initialising Ethereum protocol", "versions", ProtocolVersions, "network", config.NetworkId, "db version", dbVer)
 
 	if !config.SkipBcVersionCheck {
-		bcVersion := rawdb.ReadDatabaseVersion(chainDb)
 		if bcVersion != nil && *bcVersion > core.BlockChainVersion {
 			return nil, fmt.Errorf("database version is v%d, Geth %s only supports v%d", *bcVersion, params.VersionWithMeta, core.BlockChainVersion)
-		} else if bcVersion != nil && *bcVersion < core.BlockChainVersion {
-			log.Warn("Upgrade blockchain database version", "from", *bcVersion, "to", core.BlockChainVersion)
+		}else if bcVersion == nil || *bcVersion < core.BlockChainVersion{
+			log.Warn("Upgrading blockchain database version", "from", dbVer, "to", core.BlockChainVersion)
+			rawdb.WriteDatabaseVersion(chainDb, core.BlockChainVersion)
 		}
-		rawdb.WriteDatabaseVersion(chainDb, core.BlockChainVersion)
 	}
 	var (
 		vmConfig = vm.Config{

Db versioning now seems to work fine, I've tested with nil, lower, same and higher version numbers.

return nil, common.Hash{}, 0, 0
}
return receipts[receiptIndex], blockHash, blockNumber, receiptIndex
receipts := ReadReceipts(db, blockHash, *blockNumber)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReadReceipts method iterates over the data and returns an []receipts, and this method iterates over those again to pick out the one we're interested in. Would it be worthwhile to instead have a readReceipt method that only returns the one we're interested it, or is that just an unnecessary optimisation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since normally a block only contains 200 transactions, so I think it won't hit too much performance if we iterate receipt slice twice.
We can have a similar implementation as ReadRecepits(read blob from db, decode, assemble log), but kind of redundant.

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

core/rawdb/schema.go Outdated Show resolved Hide resolved
Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karalabe karalabe merged commit 7fd0cca into ethereum:master Feb 21, 2019
@Matthalp-zz
Copy link
Contributor

@karalabe @holiman Was there a reason not to remove the txHash field on storedReceipts (32 bytes) instead of the index on (Legacy)TxLookupEntry (8 bytes)?

@rjl493456442
Copy link
Member Author

@matthalp My original idea for keeping txhash in receipt is: If we delete txhash, we need to read blockBody to assemble txhash field during retrieve receipt. But I think blockBody and Receipts are not necessarily bound. For example, in some scenarios, receive may be stored, but blockBody is not stored. It is kind of weird that receipts content should rely on the blockBody to assemble.

For the txLookupEntry, BlockBody and Transaction, these three stuff is truly bound. It doesn't make any sense if we only store the txLookupEntry while relative BlockBody is missing.

But these all are my own understanding.

@Matthalp-zz
Copy link
Contributor

For example, in some scenarios, receive may be stored, but blockBody is not stored.

@rjl493456442 Could you point me to an example? I don't believe we store receipts unless we have assume we will have its corresponding body (even in light). My mental model is that if we have the block body, we should have the block header and if we have the block receipts we should have the block body. The logic in blockchain.go also makes similar assumptions at times.

It is kind of weird that receipts content should rely on the blockBody to assemble.

From what I can tell, almost all of the use cases were txhash is frequently used are in close proximity to a block body where that information could be recovered. While there could be some overhead of doing an independent lookup, this can be resolves with a good caching layer as the work in #19200 is working towards.

kiku-jw pushed a commit to kiku-jw/go-ethereum that referenced this pull request Mar 29, 2019
…ereum#17106)

* core: remove unnecessary fields in log

* core: bump blockchain database version

* core, les: remove unnecessary fields in txlookup

* eth: print db version explicitly

* core/rawdb: drop txlookup entry struct wrapper
fjl added a commit that referenced this pull request Jul 7, 2021
The encoding of Log and LogForStorage is exactly the same
now. After tracking it down it seems like #17106 changed the
storage schema of logs to be the same as the consensus
encoding.

Support for the legacy format was dropped in #22852 and if
I'm not wrong there's no reason anymore to have these two
equivalent types.

Since the RLP encoding simply contains the first three fields
of Log, we can also avoid creating a temporary struct for
encoding/decoding, and use the rlp:"-" tag in Log instead.

Note: this is an API change in core/types. We decided it's OK
to make this change because LogForStorage is an implementation
detail of go-ethereum and the type has zero uses outside of
package core/types.

Co-authored-by: Felix Lange <fjl@twurst.com>
sidhujag pushed a commit to sidhujag/go-ethereum that referenced this pull request Jul 10, 2021
The encoding of Log and LogForStorage is exactly the same
now. After tracking it down it seems like ethereum#17106 changed the
storage schema of logs to be the same as the consensus
encoding.

Support for the legacy format was dropped in ethereum#22852 and if
I'm not wrong there's no reason anymore to have these two
equivalent types.

Since the RLP encoding simply contains the first three fields
of Log, we can also avoid creating a temporary struct for
encoding/decoding, and use the rlp:"-" tag in Log instead.

Note: this is an API change in core/types. We decided it's OK
to make this change because LogForStorage is an implementation
detail of go-ethereum and the type has zero uses outside of
package core/types.

Co-authored-by: Felix Lange <fjl@twurst.com>
atif-konasl pushed a commit to frozeman/pandora-execution-engine that referenced this pull request Oct 15, 2021
The encoding of Log and LogForStorage is exactly the same
now. After tracking it down it seems like ethereum#17106 changed the
storage schema of logs to be the same as the consensus
encoding.

Support for the legacy format was dropped in ethereum#22852 and if
I'm not wrong there's no reason anymore to have these two
equivalent types.

Since the RLP encoding simply contains the first three fields
of Log, we can also avoid creating a temporary struct for
encoding/decoding, and use the rlp:"-" tag in Log instead.

Note: this is an API change in core/types. We decided it's OK
to make this change because LogForStorage is an implementation
detail of go-ethereum and the type has zero uses outside of
package core/types.

Co-authored-by: Felix Lange <fjl@twurst.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants