-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block explorer data #297
Comments
@tarrencev expressed the wish to call contracts from the graphql client, so that he could get the latest state of the contract without having to index it. This can be done Etherscan, seems reasonable to put it on the wishlist for our block explorer functionality. |
Rationale / Use CasesI think Yaniv's original description covers this. We need this for users to be able to query blockchain-intrinsic data from Ethereum (blocks, transactions, accounts, transaction receipts). We also want subgraphs to be able to reference this kind of information from their subgraphs. TODO: add use cases. Requirements
Proposed User ExperienceQuerying block explorer dataAfter started with Users can access the data by going to {
blocks(where: { number_gte: 0, number_lt: 1000 }, orderBy: number) {
hash
transactions(orderBy: gas, orderDirection: desc) {
hash
receipt { ... }
}
}
accounts(where: { address: "0x..." }) {
balance
}
} Referencing block explorer dataFrom a user's perspective, whether data comes from one subgraph or another should not matter. Assuming a field {
domains {
owner {
balance
}
}
} should just work™. This includes being able to introspect the From a subgraph developer's perspective, the main novelty is subgraph composition. Given a subgraph name or deployment ID, types from the subgraph with that name or deployment ID can be imported and referenced in the GraphQL schema using a new @import(
from: {
name: 'ethereum' # or id: 'Qm...'
}
as: 'Ethereum' # required prefix
)
type User @entity {
account: Ethereum__Account!
} Open Questions
Proposed ImplementationGraph Node
Open Questions
Proposed Documentation Updates
Proposed Tests / Acceptance Criteria
Tasks
|
Nice job putting this together! My input on some of the open questions:
We could make double underscore reserved in type names, so that we always have it available to prefix imported types with a namespace. i.e.,
Other feedback/questions:
|
@Zerim Thanks for all the comments, I've incorporated them into the About the other feedback/questions:
|
This here causes me headaches:
To not blow up storage, we can only store entities when they change; for time-travel queries, we need an efficient way to find the latest version of a given entity before some point in time. That's easy if we only store whatever we think the main chain is at any point in time, since blocks then have a total order. In the presence of uncled blocks, there's a bunch of detail to be worked out, and we have to carefully look at the kinds of queries we need to support for uncled blocks and see if there are simpler ways to support time-travel in the presence of uncled blocks. |
@lutter I think there's two separate questions here:
I think we can follow my recommendation for 2, w/o it forcing a specific design on 1. |
@Zerim one thing I don't understand about uncles is that they only need to have valid block headers, which means to me that that is all you can reliably query about uncled blocks. That to me means that they are not full blocks, and we should treat them as additional data attached to a block on the main chain. That wouldn't preclude us from supporting queries by block number that return information about uncles, but it does mean that uncles and blocks on the main chain are treated differently. |
@lutter That's correct, unless we had seen an uncle block when it was published, or a forked block before it was reorged, we would only have header information. Which is why, for example, that's all you see on etherscan for forked blocks: https://etherscan.io/blocks_forked There's a question as to whether if we have all the information for a block that is later forked/uncled, we should retroactively remove everything but the header to keep consistent with other uncled blocks we know of. For very recent blocks, I'm sort of partial to keeping around as much information as possible, and then maybe pruning the remaining data when were confident that the block would be permanently forked/uncled. |
I'm concerned about the proposed implementation strategy of basically indexing all of the data contained in an archive Ethereum node. That's currently 3 TBs of data when stored as compressed RLP in key-value storage. If we store this as heavily indexed relational data, that will be over 10 TBs. That is a serious operational cost. I'd instead suggest that this subgraph is implemented by leveraging an archive node, instead duplicating the data from it, and exposes only the queries that we can efficiently resolve through the JSON-RPC interface. Edit: I overstated the storage because most of that probably corresponds to historical contract state, which we won't need. Still I think the tradeoffs here are worth considering, the storage required will still be an order of magnitude above even the most demanding subgraphs that currently exist. |
@leoyvens I'll come up with an estimate of the storage this would occupy. A rough guess based on 10M blocks with 150 transactions each would involve maybe
That doesn't feel too excessive. Having this data available in the local database would mean
The GraphQL API built into geth will take a while to make it into Parity (if it ever will). It would help with query and query-composition performance, but we can't wait for it. Unless block explorer data requires TBs of data, IMHO the benefits we get from ingesting all this data outweighs the storage cost. |
I agree that blocks and accounts are something we should just ingest and have great query performance for. However I'd like to make a point that transaction receipts may be a step too far. Right now we have all transaction receipts loaded in our Graph nodes, storing a total of 330GB. We don't need to do this and I intend to get rid of virtually all of those by doing what is described in this comment, 'Proposed Implementation' section. This opens the question of whether we should have transaction receipts as part of the subgraph discussed in this issue. First, I'd like to separate the concept of a full block explorer subgraph from a blessed 'Ethereum subgraph' for subgraphs to compose with. For a full block explorer entities such as For the Ethereum subgraph to be widely composed with I agree it should be featureful and fast to query, but we also need to keep indexing costs down for there to be a good supply of index nodes and low query prices. I believe the entities By not including receipts, we could have every index node sync this data by default, allowing us to assume and leverage the data in the internals of graph-node. |
@leoyvens I agree with that, although I expect a few of the transaction receipts fields to be crucial enough so that we have to pull them in (thinking about the gas info for instance, which is split across the tx and the tx receipt). |
@leoyvens There is one argument for storing logs though: almost every subgraph today defines entities that correspond to the event types and series the events almost 1:1. If we can allow developers to just reference already existing event entities in the block explorer data, then that would save everyone a ton of time and work. One aspect that slightly weakens this argument is that subgraphs typically only store a subset of events as entities, not all of them. |
@Jannis Having the spent gas is fine, my concern is the logs. The logs in a block explorer are not decoded, so having all logs is not the same thing as having a subgraph that ingests events with a proper schema. |
Revised plan without subgraph composition. Requirements
Proposed User ExperienceQuerying block explorer dataAfter started with Users can access the data by going to {
blocks(where: { number_gte: 0, number_lt: 1000 }, orderBy: number) {
hash
transactions(orderBy: gas, orderDirection: desc) {
hash
receipt { ... }
}
}
accounts(where: { address: "0x..." }) {
balance
}
} Referencing block explorer dataFrom a user's perspective, whether data comes from one subgraph or another should not matter. Assuming a field {
domains {
owner {
balance
}
}
} should just work™. This includes being able to introspect the Proposed ImplementationGraph Node
Open Questions
Proposed Documentation Updates
Proposed Tests / Acceptance Criteria
TasksThe plan is to implement this in different phases:
The estimates below are in days. Phase 1 (Basic) [~7 days, target: Nov 27]
Phase 2 (Transactions) [~4.5 days, target: Dec 4]
Phase 2 (Accounts) [~12.5 days, target: Dec 20]
Phase 3 (Ingestion) [~5 days, target: Dec 31]Details tbd. |
@Jannis from that plan, it sounds like we're back to having raw logs in the standard Ethereum subgraph, is that the case? |
@Jannis Has this project been tabled/abandoned. Right now I'm working on a project to retrieve all uniswap transaction history for taxes and having the ability to automatically query gas fees paid on each transaction would be incredibly valuable. If so, is there a workaround or alternate path forward to retrieve the gas used on each transaction without having to also make separate calls to Etherscan or another API? Thanks! |
It has been paused but not abandoned permanently. Adding transaction indexing to the existing codebase wouldn't actually be that hard. The difficult part (more difficult than it looks in the plan) is having accurate account balances, because it likely will require replaying block rewards and internal transactions, which will make things extremely slow. |
Thanks for the quick reply @Jannis. For our use case (and really all accounting applications), having access to the fees paid across all subgraphs is necessary. Is there a path forward to retrieve gas fees for transactions on a given subgraph within the graph ecosystem right now? |
@Jannis Are you able to share what this has been paused in favor of? This seems fundamental to the success of TheGraph. Transactions are where the action happens on chain. TheGraph is being compared as the "Google of Blockchains". Keeping with the analogy, returning only blocks is akin to Google only returning domains instead of actual webpages. |
@Jannis Wanting to follow up here if there is an ETA or if there is any workaround for retrieving transaction gas fees via The Graph. I did come across this sub-graph which claims to return fees but the transactions don't return anything when querying: https://thegraph.com/explorer/subgraph/sistemico/eth-gas Thanks! |
Very disappointing we can't even get a response to a simple question here. Does not instill a lot of trust in the graph ecosystem. We will be pursuing alternatives. |
Will this allow querying transactions for a specific address, with a given data parameter? {
transactions(where: { to: "0x...", data: "0x1234abcd" }){
value
from
hash
}
} |
You don't need to maintain the merkle-patricia tree for the state, since you are consuming the data from a trusted source. So it's just EVM plus a flat state storage. Is it really extremely slow? It will be way faster than syncing a geth node in --syncmode=full. |
When is this going to be live? |
Block explorer data deserves first class support in The Graph. Not every node needs to index block explorer data but those that want to should be able to efficiently. Block explorer data includes indexing:
This should be possible without having to define a subgraph specifically for this data. Rather this should be specified as a CLI argument when starting up the node.
Questions:
The text was updated successfully, but these errors were encountered: