-
Notifications
You must be signed in to change notification settings - Fork 20.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core, eth, les, light: avoid storing computable receipt metadata #19345
core, eth, les, light: avoid storing computable receipt metadata #19345
Conversation
@karalabe @holiman I did the bare minimum to exclude these fields. It appears to be able to sync just fine through block 1M, so figured we could run it on the benchmarking systems and see what happens. If things look favorable, I'll move on to the complete integration (and making all the tests that now fail actually pass). |
I think this won't do much for storage size because the TxHash will still be stored. The diff so far just makes it all-zero. Would be better to delete those fields from receiptStorageRLP. |
@fjl This actually still reduces the storage size because a singe byte (the RLP empty string representation) is stored instead of the 32 additional bytes for the actual hash (and variable number of bytes for the gas cost). This was done as a "quick an dirty" test to see what the savings will be. If the savings is seen to be considerable then I'll put forth the effort to do this the right way. |
e65b8f2
to
b1a232c
Compare
b1a232c
to
f87136c
Compare
I want to compare |
bac14a6
to
d774042
Compare
d774042
to
0930195
Compare
@zsfelfoldi It looks like the light protocol complicates this change a bit. I plan to debug this last test case tomorrow, but would appreciate any feedback you have. |
@karable Interpolating the y-axis of the graph looks like a savings of about 12 GB. Is that right? |
Ah, sorry, didn't leave the markers in. The saving it ~14GB. |
5efcc1c
to
efcce69
Compare
f3ccca9
to
6235665
Compare
Hey all, so I've pushed a followup PR (figured it was easier than to go back and forth for a number of days for nitpicks cleanups). In essence, I've made the following fixes:
|
6235665
to
cd96c8e
Compare
cd96c8e
to
b2b131d
Compare
b2b131d
to
2a937ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR seems kind of good to me now. Would be nice to have some more eyes on it before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @karalabe and @rjl493456442 ! The follow up commits make sense to me. I just noticed
@karalabe Thank you for catching this. As an aside, it's more idiomatic go to use |
core/types/receipt.go
Outdated
@@ -310,6 +310,8 @@ func (r Receipts) DeriveFields(config *params.ChainConfig, hash common.Hash, num | |||
r[i].BlockNumber = new(big.Int).SetUint64(number) | |||
r[i].TransactionIndex = uint(i) | |||
|
|||
r[i].Bloom = CreateBloom(Receipts{r[i]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the blooms already handled during decoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I see it those are already covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karalabe You are right -- my bad. I hadn't had my morning ☕️ yet.
This does raise the question: where should we be computing the bloom? It's kind of weird to be computing these fields in different places, but at the same time the bloom filter is immediately computable when decoding and the overhead is negligible. What would you prefer?
Bloom seems fit in decoding because it's consensus. Safer not to screw up.
…On Mon, Apr 15, 2019, 19:16 Matthew Halpern ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In core/types/receipt.go
<#19345 (comment)>
:
> @@ -310,6 +310,8 @@ func (r Receipts) DeriveFields(config *params.ChainConfig, hash common.Hash, num
r[i].BlockNumber = new(big.Int).SetUint64(number)
r[i].TransactionIndex = uint(i)
+ r[i].Bloom = CreateBloom(Receipts{r[i]})
@karalabe <https://github.com/karalabe> You are right -- my bad. I hadn't
had my morning ☕️ yet.
This does raise the question: where should we be computing the bloom? It's
kind of weird to be computing these fields in different places, but at the
same time the bloom filter is immediately computable when decoding and the
overhead is negligible. What would you prefer?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19345 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH6GeBkozRg9P2daXF5300c-tsa3K8Kks5vhKX0gaJpZM4cONof>
.
|
@karalabe Just pushed a commit to revert the bloom population in |
ed464e7
to
ce9a289
Compare
🎉 |
These changes omit storing the transaction hash (
TxHash
), gascost (
GasCost
), and contract address (ContractAddress
) fieldswithin its corresponding the transaction receipt. The big wins come
from not storing the transaction hash and contract address since it is
not amenable to the snappy compression used by LevelDB, as
@karalabe and @rjl493456442 have pointed out.
The storage savings comes at the expense of additional processing
time to load receipts from storage. Specifically this comes from
additional reads needed to load the receipts' corresponding block
body to populate some fields as well as the chain configuration
needed to compute the contract address for contract creation
transactions.
However, this performance overhead should not affect the majority
of go-ethereum users. The two main places this read is needed is
for (1) subscribers watching block data during reorganizations and
(2) users querying JSON-RPC for transaction receipts. Substantial
block reorganizations do not happen often enough for (1) to be
significantly affected and while there are some use cases that
heavily use (2), the overheads should not be too horrible even when
traversing the entire historical chain.
Note that there here are some situations where all of the receipt
metadata is not populated, such as in light clients. The new code
supports this behavior by not failing when all of the metadata
cannot be constructed.