-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pruning of subgraphs #3898
Pruning of subgraphs #3898
Conversation
42c86f4
to
d1c28d8
Compare
484da76
to
aec2f55
Compare
I'm not sure this is ready to review, but the CI seems to be failing on compilation issues to |
Not sure what caused those issues - I reran the jobs and they now passed. And yes, this is ready for review. I've run it on the integration cluster and used it to prune all the subgraphs there to 10k blocks of history; some of them were pruned when they were still syncing, some of them when they were synced, and after letting the resulting set of subgraphs run for ~ 3 days, there are no POI discrepancies 😃 |
store/postgres/src/copy.rs
Outdated
// batch, but don't step up batch_size by more than 2x at once | ||
pub fn adapt(&mut self, duration: Duration) { | ||
let new_batch_size = | ||
self.size as f64 * TARGET_DURATION.as_millis() as f64 / duration.as_millis() as f64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this code was just moved, but to make sure we avoid division by zero:
self.size as f64 * TARGET_DURATION.as_millis() as f64 / duration.as_millis() as f64; | |
self.size as f64 * TARGET_DURATION.as_millis() as f64 / (duration.as_millis() as f64 + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea, added
/// Lock the row for `site` in `subgraph_deployment` for update. This lock | ||
/// is used to coordinate the changes that the subgraph writer makes with | ||
/// changes that other parts of the system, in particular, pruning make | ||
// see also: deployment-lock-for-update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this doesn't return a handle, I assume it locks for the duration of the transaction? Worth adding a note or linking to relevant PG documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locks in PG are always to the end of the txn - there's no way to unlock those internal lock (pg_advisory_lock is a different story, but they can span txns) In any event, I updated the comment
|
||
/// Utility to copy relevant data out of a source table and into a new | ||
/// destination table and replace the source table with the destination | ||
/// table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth commenting somewhere on this file (maybe as a module-level comment) on the approach for the pruning implementation. In particular I'd naively expect it to be implemented by deleting rows, but I see we're copying to a new table first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I expanded the comment for prune_by_copying
to explain that better since I envision that we might have a prune_by_deleting
at some point, too which is better if we only prune a small amount of data.
That progress report looks nice! |
That was a footgun anyway and usually just caused inadvertent counting that was slow
Also, fix handling of negative n_distinct value
8f20203
to
fd13949
Compare
Also, avoid a division by zero in an edge case
fd13949
to
3253d4d
Compare
This PR resolves issue #3665 and adds a command
graphman prune <deployment>
that removes all history from a deployment before a given block. By default, that block is 10,000 blocks before the current subgraph head. After pruning, the deployment can only be queried at block numbers at least as high as that block.Here's what it looks like when pruning something:
Space savings from pruning can be dramatic, depending on the subgraph. A test database with ~ 60 subgraphs shrank by about 1/3, with the size of some subgraphs being reduced by 90%.