Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core, eth, les, light: avoid storing computable receipt metadata #19345

Merged
merged 3 commits into from
Apr 16, 2019

Conversation

Matthalp-zz
Copy link
Contributor

@Matthalp-zz Matthalp-zz commented Mar 27, 2019

These changes omit storing the transaction hash (TxHash), gas
cost (GasCost), and contract address (ContractAddress) fields
within its corresponding the transaction receipt. The big wins come
from not storing the transaction hash and contract address since it is
not amenable to the snappy compression used by LevelDB, as
@karalabe and @rjl493456442 have pointed out.

The storage savings comes at the expense of additional processing
time to load receipts from storage. Specifically this comes from
additional reads needed to load the receipts' corresponding block
body to populate some fields as well as the chain configuration
needed to compute the contract address for contract creation
transactions.

However, this performance overhead should not affect the majority
of go-ethereum users. The two main places this read is needed is
for (1) subscribers watching block data during reorganizations and
(2) users querying JSON-RPC for transaction receipts. Substantial
block reorganizations do not happen often enough for (1) to be
significantly affected and while there are some use cases that
heavily use (2), the overheads should not be too horrible even when
traversing the entire historical chain.

Note that there here are some situations where all of the receipt
metadata is not populated, such as in light clients. The new code
supports this behavior by not failing when all of the metadata
cannot be constructed.

@Matthalp-zz
Copy link
Contributor Author

@karalabe @holiman I did the bare minimum to exclude these fields. It appears to be able to sync just fine through block 1M, so figured we could run it on the benchmarking systems and see what happens. If things look favorable, I'll move on to the complete integration (and making all the tests that now fail actually pass).

@fjl
Copy link
Contributor

fjl commented Mar 28, 2019

I think this won't do much for storage size because the TxHash will still be stored. The diff so far just makes it all-zero. Would be better to delete those fields from receiptStorageRLP.

@Matthalp-zz
Copy link
Contributor Author

@fjl This actually still reduces the storage size because a singe byte (the RLP empty string representation) is stored instead of the 32 additional bytes for the actual hash (and variable number of bytes for the gas cost). This was done as a "quick an dirty" test to see what the savings will be. If the savings is seen to be considerable then I'll put forth the effort to do this the right way.

@holiman
Copy link
Contributor

holiman commented Mar 28, 2019

@fjl nope, @matthalp is correct! It's not quite finished, but 138G vs 149G.

Screenshot_2019-03-28 EXP (52 56 200 108) VS Master (52 56 85 219) Datadog(3)

@Matthalp-zz Matthalp-zz force-pushed the optimize-receipt-storage branch from e65b8f2 to b1a232c Compare March 29, 2019 00:37
@Matthalp-zz Matthalp-zz changed the title [BENCHMARK] Store transaction receipts without txHash and gasCost core/*: omit txHash and gasCost when storing receipts Mar 29, 2019
@Matthalp-zz Matthalp-zz force-pushed the optimize-receipt-storage branch from b1a232c to f87136c Compare March 29, 2019 00:54
@Matthalp-zz
Copy link
Contributor Author

I want to compare eth_getTransactionReceipt for every transaction on the canonical chain against an unmodified client before giving the go ahead to review.

@Matthalp-zz Matthalp-zz force-pushed the optimize-receipt-storage branch 2 times, most recently from bac14a6 to d774042 Compare March 29, 2019 02:53
@Matthalp-zz Matthalp-zz requested a review from zsfelfoldi as a code owner March 29, 2019 02:53
@Matthalp-zz Matthalp-zz force-pushed the optimize-receipt-storage branch from d774042 to 0930195 Compare March 29, 2019 03:00
@Matthalp-zz
Copy link
Contributor Author

@zsfelfoldi It looks like the light protocol complicates this change a bit. I plan to debug this last test case tomorrow, but would appreciate any feedback you have.

core/types/receipt.go Outdated Show resolved Hide resolved
@karalabe
Copy link
Member

FYI Screenshot from 2019-03-29 09-38-50

@Matthalp-zz
Copy link
Contributor Author

@karable Interpolating the y-axis of the graph looks like a savings of about 12 GB. Is that right?

@karalabe
Copy link
Member

Ah, sorry, didn't leave the markers in. The saving it ~14GB.

@Matthalp-zz Matthalp-zz force-pushed the optimize-receipt-storage branch 11 times, most recently from 5efcc1c to efcce69 Compare April 1, 2019 03:14
@karalabe karalabe force-pushed the optimize-receipt-storage branch from f3ccca9 to 6235665 Compare April 15, 2019 09:37
@Matthalp-zz Matthalp-zz requested a review from gballet as a code owner April 15, 2019 09:37
@karalabe
Copy link
Member

karalabe commented Apr 15, 2019

Hey all, so I've pushed a followup PR (figured it was easier than to go back and forth for a number of days for nitpicks cleanups). In essence, I've made the following fixes:

  • Originally in this PR, reading the receipts via rawdb and deriving the computed fields also entailed accessing the canonical hash of the genesis block and the chain configs from leveldb. These were pretty pointless data to access, since almost all our code is aware of the chain config, so I modified the code to pass this as an argument, not pull it from disk.
  • Deriving the computed fields (originally SetReceiptsData, renamed to Receipts.DeriveFields) was moved from rawdb to types at Gary's suggestion, since really it's not a data access functionality, rather tightly coupled to the receipts. I've also changed the signature to use only types.Transactions, not a full types.Body + I attached it to the types.Receipts instead of a dangling method.
  • There was an accidental bug introduced in the PR: when an empty block's receipts were read, the original code on master returned an empty slice, the new code returned nil. This is wrong, because nil represents the concept of missing receipts, whereas the empty one obviously represents receipts being available, but no transactions in the block. This cause les to fail some test. Fixed.
  • The fakepeer in downloader used the old ReadReceipts to deliver full receipts, but AFAIK we should only deliver consensus receipts there. Fixed to ReadRawReceipts, but we should test that it works correctly.

@karalabe karalabe force-pushed the optimize-receipt-storage branch from 6235665 to cd96c8e Compare April 15, 2019 10:16
@karalabe karalabe changed the title core/*: avoid storing computable receipt metadata core, eth, les, light: avoid storing computable receipt metadata Apr 15, 2019
@karalabe karalabe added this to the 1.9.0 milestone Apr 15, 2019
@karalabe karalabe force-pushed the optimize-receipt-storage branch from cd96c8e to b2b131d Compare April 15, 2019 10:25
les/handler.go Outdated Show resolved Hide resolved
@karalabe karalabe force-pushed the optimize-receipt-storage branch from b2b131d to 2a937ce Compare April 15, 2019 10:42
Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems kind of good to me now. Would be nice to have some more eyes on it before merging.

@karalabe karalabe requested a review from fjl April 15, 2019 10:50
Copy link
Member

@rjl493456442 rjl493456442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Matthalp-zz
Copy link
Contributor Author

Thanks @karalabe and @rjl493456442 ! The follow up commits make sense to me.

I just noticed DeriveFields was not setting receipt bloom filters, so I added a follow up commit on that.

There was an accidental bug introduced in the PR: when an empty block's receipts were read, the original code on master returned an empty slice, the new code returned nil. This is wrong, because nil represents the concept of missing receipts, whereas the empty one obviously represents receipts being available, but no transactions in the block.

@karalabe Thank you for catching this. As an aside, it's more idiomatic go to use nil slices instead of empty slices, but I understand the signaling problem here. I feel like this will not be the last time this kind of issue/bug comes up on the code base, so I would recommend modifying these methods to return two return values: ReadFoo() (<Foo>, bool) and then doing a check on the bool to know if this exists. I'm more than happy to do this.

@@ -310,6 +310,8 @@ func (r Receipts) DeriveFields(config *params.ChainConfig, hash common.Hash, num
r[i].BlockNumber = new(big.Int).SetUint64(number)
r[i].TransactionIndex = uint(i)

r[i].Bloom = CreateBloom(Receipts{r[i]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the blooms already handled during decoding?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see it those are already covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karalabe You are right -- my bad. I hadn't had my morning ☕️ yet.

This does raise the question: where should we be computing the bloom? It's kind of weird to be computing these fields in different places, but at the same time the bloom filter is immediately computable when decoding and the overhead is negligible. What would you prefer?

@karalabe
Copy link
Member

karalabe commented Apr 15, 2019 via email

@Matthalp-zz
Copy link
Contributor Author

@karalabe Just pushed a commit to revert the bloom population in Receipt.DeriveFields and its associated test logic.

@karalabe karalabe force-pushed the optimize-receipt-storage branch from ed464e7 to ce9a289 Compare April 16, 2019 06:50
@karalabe karalabe merged commit 78d90c4 into ethereum:master Apr 16, 2019
@karalabe
Copy link
Member

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants