-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A new ChainIndexer that subsumes that existing MsgIndex, EventIndex and TransactionIndex #12453
Comments
🧵 From Slack conversations: "It will take 9-10 days to backfill the ChainIdexer all the way back to FEVM, but it is a one time cost, and you can copy the index over to other nodes, so you only need to run the backfill operation on one node." A few questions/thoughts on this:
|
Some notes from 2024-10-09 Lotus standup focused on the "~9 days to backfill a FEVM-archival node" topic:
|
I would like to strongly push back against the idea of using the old Index (which suffers from multiple problems which prompted this workstream in the first place) to build the new one. Also replied at: |
Another archival node provider is Zondax, let me share details later today with Jenni. |
When you say "backfilling" do you specifically mean backfilling the FEVM indexes only would take 9 days? Does this assume the node has already loaded all FEVM archival data since FEVM launch and is fully synced? |
@eshon Yes this assumes that the node has already loaded all FEVM archival data since FEVM launch and is fully synced. |
Results from testing on a dedicated Protofire FEVM Archival node. This node is doing nothing other than syncing the chain. 1) Backfilling 1 month of epochs backwards from the current chain head. Takes ~12 hours.
2) Backfilling 1 month of epochs post FEVM launch . Takes ~10 hours.
3) Backfilling 1 month of epochs mid-way between FEVM launch and the current chain head. Takes ~13 hours
I am now running the index "doctor"/validation on these to sanity check that the backfilled data is in line with the chain state. |
@aarshkshah1992 : can we get final numbers on chainindex.db size for the full archival node? I know there were some numbers here, but I'm not sure how many tipsets that is and I'd also like to get a larger time range. I want to be able to make a statement like "As of 202410, ChainIndexer will accumulate approximately XMiB per day of data, or XGiB per month" in #12600 |
I'm seeing our docs already had a statement that "The ChainIndex will consume ~10GB of storage per month of tipsets (e.g., ~86400 epochs)". I guess that's all I need but it would be good to have an official record of it in here like you have with backfill times in #12453 (comment) |
Talked with Eva and the summary (in notion) is shared with the team |
@BigLep We have yet to index the entire history all the way upto FEVM launch. We were waiting on the reviews to land/get addressed so we can be sure that we're using the same indexing code as users. Looks like the PR will be ready tomorrow (all reviews will have been addressed) -> will then kick-off an indexing of the entire state and also get all the numbers you need here. |
That does not sound correct. Where did you get it from ? Please can we wait on the next round of archival node testing to get the final numbers ? I'll make sure to document them here once we have them. |
Ack, good to know. I can't recall / find where I got these numbers from. I was surprised to see them, so maybe I put them in as fillers. I don't remember. Anyways, I will put X placeholders for now and we'll update once official results have been published here. |
Please see https://filecoinproject.slack.com/archives/CP50PPW2X/p1729413621133599. ~10G growth in the Index DB size per month is actually correct. |
The |
Summary
This issue is for the implementation of a new
ChainIndexer
in Lotus that will replace and subsume the existingMsgIndex
,EventsIndex
, andEthTxHashIndex
, which are currently fragmented across multiple databases and have several known issues documented in filecoin-project/lotus#12293.Key Features
The
ChainIndexer
offers the following key features:Note: while the ChainIndexer is primarily focused on events and ETH RPC usecases, it also benefits pre-FEVM as well. For example,
StateSearchMsg
and its various dependents will now have a shortcut to find the message.Implementation Items
Tasks
Switch RPC APIs to use the Chain Index
ChainIndexer
instead of theMsgIndex
,EthTxHashIndex
andEventsIndex
.EventFilterManager
will read events from theChainIndexer
and prefill all registered filters rather than depending on the Indexer to do the pre-filling of filters.ChainIndexer
will listen to Mpool message addition updates to index the corresponding ETH Tx Hash. TheEthTxHashManager
will no longer be used for this.Read APIs Should Account for the Async Nature of Indexing
T
only indexes events inT-1
because of deferred execution.ETH RPC APIs Should Only Expose Executed Tipsets and Messages
T
are executed in tipsetT + 1
.T
are also executed in tipsetT
.Removing Re-orged Tipsets That Are No Longer Part of the Canonical Chain
ChainIndexer
will periodically prune all permanently re-orged/reverted tipsets from the index. It can do this by simply pruning all tipsets at a height less than(current head - finality policy - some buffer)
.Garbage Collection
ChainIndexer
can perform periodic GC based on this configuration.ChainIndexer
because of the use ofFOREIGN KEY ON CASCADE DELETES
, as described in SQLite Foreign Keys.Snapshot Hydration
Automated Backfilling
ChainIndex
for which the corresponding state exists in the statestore.Observer
with that tipset as the current head.Observer
.ChainIndexer
will observe the(Apply, Revert)
path between its last non-reverted indexed tipset and the current heaviest tipset in the chainstore before processing real-time updates, effectively performing automated backfilling.Simplify Indexing Config
Migration from Old Indices to the New ChainIndex
lotus-shed
utility that allows users to migrate existing indices to the newChainIndexer
database. This command should only be executed when the Lotus node is offline to ensure data consistency and avoid potential conflicts.ChainIndexer
. This approach offers several benefits:The text was updated successfully, but these errors were encountered: