-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smarter use of eth_getLogs #1260
Conversation
As it is this goes from a single call to way too many calls because `eth_log_filters` is naive, but already avoids false positives.
Do a clever graph algorithm to make not too few and too many eth_getLogs calls.
This did not cause any bugs, it's just for the logs to look nice.
I was concerned the reduced parallel block ranges might make scanning logs slower for subgraphs with e.g. just one data source, so I benchmarked this PR with both Gravity (1 data source, 2 events) and Melon (9 data sources, 16 events if I've counted correctly). Here are the results: GravityScanning past logs for Gravity against Alchemy, before and after: Before
Time: 02:41 mins After
Time: 02:53 mins MelonScanning past logs for Melon against Alchemy, before and after: Before
Time: 3:36 mins After
Time: 3:10 mins ConclusionThis is just a small sample, but it suggests the performance has degraded slightly for subgraphs with just one data source and few events and improved slightly for subgraphs with many data sources and events. Overall the differences are small enough to not be a concern, especially since subgraphs with just one data source are rare and so we're likely going to see an improvement with most subgraphs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! I've got a few small comments but this is close to ready.
if self.contracts.len() == 1 { | ||
write!( | ||
f, | ||
"contract {}, {} events", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed in the logs that the contract address is ellipsized in the middle (e.g. contract 0x1bfd…59ba
). Could we make logs contain the complete address so it's easier to copy and paste it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yhea that's annoying.
} else if self.event_sigs.len() == 1 { | ||
write!( | ||
f, | ||
"event {}, {} contracts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes with the event signature here, that's also shortened.
use tiny_keccak::keccak256; | ||
use web3::types::*; | ||
|
||
use super::types::*; | ||
use crate::prelude::*; | ||
|
||
pub type EventSig = H256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to shorten this name to EventSig
? How about EventSignature
? 😉
#[derive(Clone)] | ||
pub struct EthGetLogsFilter { | ||
pub contracts: Vec<Address>, | ||
pub event_sigs: Vec<EventSig>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here – I'd prefer event_signatures
.
contracts_and_events_graph: GraphMap<LogFilterNode, (), petgraph::Undirected>, | ||
|
||
// Event sigs with no associated address, matching on all addresses. | ||
wildcard_events: HashSet<EventSig>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if these could be represented as another LogFilterNode
variant to avoid the special casing. It might even make the filter splitting algorithm more efficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preferred to keep this separate because imo it makes the code more obvious and readable.
I considered putting it into the graph but we'd also need a special contract vertex representing all contracts. There might be an optimization there if a wildcard event also appears tied to a contract, but the ideal behaviour is unclear. Wildcards are a niche feature anyhow, people usually want to control what contract is emitting an event so that random contracts can't put data into their subgraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. It was just an idea.
self.contracts_and_events_graph | ||
.all_edges() | ||
.any(|(s, t, ())| { | ||
let contract = LogFilterNode::Contract(log.address.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd create contract
and event
upfront to avoid creating it over and over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
@Jannis Great that you did some benchmarking! It does seem like Melon might have benefited from the parallelization, I do not know about Gravity since nothing should have changed there. Previously we'd only parallelize if we hit an Alchemy timeout or Infura log cap, mostly it was just single requests. |
Currently we merge all log filters for a subgraph in a single large filter. This is can be bad for two reasons:
This PR splits the
eth_getLogs
calls so that we don't request contract-event combinations we don't actually need, and even split it up a bit more than necessary for more parallel requests.I tested with the NuoNetwork subgraph for speed and correctness, and we no longer get timeouts on it, probably because we're splitting up the call for the thousands of dynamic data sources per event, which splits it in five parallel requests. I Also tested wildcard event signatures still work.
The ERC20 subgraph didn't spend too much time getting logs, though I didn't wait for it to get very far and I'm not sure what the situation was before. At least on a cold DB the slowest thing was downloading receipts, as usual.
I expected we'll be tweaking this based on the behaviour of specific subgraphs on specific Ethereum node providers, but for now I'll consider that this fixes #1130.