-
Notifications
You must be signed in to change notification settings - Fork 20.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core, core/rawdb, eth/sync: no tx indexing during snap sync #28703
Conversation
8235485
to
156e633
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from one minor flaw (?), this looks good to me. I have only eyed through it, not activel tested it.
eth/api_backend.go
Outdated
return tx, blockHash, blockNumber, index, nil | ||
lookup, tx, err := b.eth.blockchain.GetTransactionLookup(txHash) | ||
if err != nil { | ||
return nil, common.Hash{}, 0, 0, fmt.Errorf("tx is not existent or not indexed, %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is a behavorial change for the eth_getTransactionByHash
RPC, which previously returned null
in this case. Which has a couple ramifications that should be considered:
- I imagine fair number of upstream projects will be "surprised" by this RPC now potentially returning errors. For example I'm pretty sure most wallets use this method to determine whether a newly broadcast tx has reached the public mempool.
- The error "leaks" configuration settings of the node (the index head and tail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, @fjl do you have any suggestion about the RPC behavioral change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example I'm pretty sure most wallets use this method to determine whether a newly broadcast tx has reached the public mempool.
If the tx is in the txpool, then it will be found for sure,
If the tx is not in the txpool yet, the error will be returned instead of returning nil, as you said.
I will ask team about their opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can't have this error. getTransactionByHash has to return null
when the tx is not known to the node. It can only return an error if there is an internal error accessing the database or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, it is an internal error from one perspective: the indexing is not done so the results are not reliable. From that perspective, the error message feels correct.
Tho I wouldn't add that much info since it's hard to interpret. Probably "transactions still indexing" or something along those lines is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible solution we could do is to make indexing part of the sync. I.e. after snap sync is "done", we could have a next phase with indexing, whilst eth_syncing does not return false.
Still, the quirk is that the current block currently kind of means that everything that's needed has been done. On the other hand, IMO a syncing node should not be just dropped into prod serving RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main point is, we can't just return an error when the tx doesn't exist. We can return an error while we are still indexing, but the code here doesn't do that. If we decide to add an error during indexing, it would have to be identifiable by error code. This a decision affecting all API consumers and also proxy/LB implementations like Infura's. They all need to change their behavior to retry the lookup later or on a different node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed the code.
The error is only returned if the transactions are not fully indexed.
it would have to be identifiable by error code
It's on my todo list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I still don't like about it, is that this would be a geth-specific error that all API consumers must pay attention to if they want to react to it properly. It really only makes sense if we can make eth_syncing return a sync status at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
9e73805
to
81f6cef
Compare
FYI I confirmed here that this fixes the original issue: #28673 (comment) |
Triage discussion
|
91cca24
to
64196f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
…#28703) This change simplifies the logic for indexing transactions and enhances the UX when transaction is not found by returning more information to users. Transaction indexing is now considered as a part of the initial sync, and `eth.syncing` will thus be `true` if transaction indexing is not yet finished. API consumers can use the syncing status to determine if the node is ready to serve users.
return tx.MarshalBinary() | ||
} | ||
|
||
// GetTransactionReceipt returns the transaction receipt for the given transaction hash. | ||
func (s *TransactionAPI) GetTransactionReceipt(ctx context.Context, hash common.Hash) (map[string]interface{}, error) { | ||
tx, blockHash, blockNumber, index, err := s.b.GetTransaction(ctx, hash) | ||
if tx == nil || err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could geth team reconsider this change?
It breaks hardhat.
An unexpected error occurred:
ProviderError: transaction indexing is in progress
at HttpProvider.request (/workspace/node_modules/hardhat/src/internal/core/providers/http.ts:96:21)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at async EIP1193JsonRpcClient.getTransactionReceipt (/workspace/node_modules/@nomicfoundation/ignition-core/src/internal/execution/jsonrpc-client.ts:569:22)
at async Promise.all (index 1)
at async monitorOnchainInteraction (/workspace/node_modules/@nomicfoundation/ignition-core/src/internal/execution/future-processor/handlers/monitor-onchain-interaction.ts:110:28)
at async FutureProcessor.processFuture (/workspace/node_modules/@nomicfoundation/ignition-core/src/internal/execution/future-processor/future-processor.ts:116:9)
at async ExecutionEngine._executeBatch (/workspace/node_modules/@nomicfoundation/ignition-core/src/internal/execution/execution-engine.ts:153:30)
at async ExecutionEngine.executeModule (/workspace/node_modules/@nomicfoundation/ignition-core/src/internal/execution/execution-engine.ts:114:25)
at async Deployer.deploy (/workspace/node_modules/@nomicfoundation/ignition-core/src/internal/deployer.ts:194:25)
at async SimpleTaskDefinition.action (/workspace/node_modules/@nomicfoundation/hardhat-ignition/src/index.ts:302:24)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also unhappy with this, and there have been quite a few reports from across the ecosystem where this PR caused subtle issues. However, we ultimately decided to go ahead with the PR because we also changed eth_syncing to take indexing into account. Not sure if that works for hardhat...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better way is to return nil as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must to distinguish two different scenarios here:
- The requested transaction/receipt is existent but not indexed yet
- The requested transaction/receipt is unknown to node
Therefore, we explicitly return an error if the background indexing is still in progress
and the requested transaction is not indexed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
We must
It's a breaking change, geth is a dominant ethereum client, every breaking change needs to consider backward compatibility.
The requested transaction/receipt is existent but not indexed yet
The code just checks if the index is running, it means geth can't know if the transaction is indexing or not, what did I miss?
tx, blockHash, blockNumber, txIndex := rawdb.ReadTransaction(bc.db, hash)
if tx == nil {
progress, err := bc.TxIndexProgress()
if err != nil {
return nil, nil, nil
}
// The transaction indexing is not finished yet, returning an
// error to explicitly indicate it.
if !progress.Done() {
return nil, nil, errors.New("transaction indexing still in progress")
}
// The transaction is already indexed, the transaction is either
// not existent or not in the range of index, returning null.
return nil, nil, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both scenarios mentioned, no transaction index will be found. The error will simply state, Please try the RPC later once background indexing is fully complete.
Since Geth is a dominant Ethereum client, every breaking change must consider backward compatibility.
That's why we've modified the eth_syncing endpoint. Transaction indexing is also taken into account and will only return "synced" once all indexing is complete.
This pull request simplifies the logic for indexing transactions and enhances the UX when transaction is not found by returning more information to users.
Originally Geth will construct the transaction indexes when node is still in snap sync. This approach can ensure all the required transaction indexes are available once the sync is finished. However, it introduces lots of complexity to index transaction correctly.
In ticket #28673 an issue is found that when a node is initialized with an external ancient store and runs a snap sync with
history.transactions = 0
, the indexes of ancient part will be lost, or another word, these ancient blocks are never indexed.In order to simplify the entire process, this pull request changes the strategy to construct transaction indexes only once the snap sync is finished.
Although we lose the guarantee that transaction indexes are available immediately once the snap sync is finished, but Geth can return a message to users if the specified transaction is not found and transaction indexing is still going on in the RPC response to reduce confusion.
Besides, the transaction indexing is considered as a part of the initial sync,
eth.syncing
wil still be true if the background transaction indexing is not finished yet. API consumers can following the syncing status to determine if the node is usable.e.g. If the node is still syncing, the transaction indexing progress will also be attached.
I have tested the performance for indexing the entire ethereum mainnet chain, it will take roughly 2 hours. And for default setting(2.35m blocks), it takes 20 minutes. It means for the first ~2h time after initial sync, the transaction might not be available from RPC.
Transaction index performance
Index 2.35M blocks needs 20 minutes.
INFO [12-20|05:34:00.558] Indexed transactions blocks=2,350,000 txs=345,795,538 tail=16,474,969 elapsed=20m26.970s
Index the first 16M blocks needs 1h35m.
INFO [12-20|09:31:10.757] Indexed transactions blocks=16,475,763 txs=1,850,658,458 tail=0 elapsed=1h35m48.611s
Index the entire mainnet chain needs ~2h
Index progress info when tx is not found