Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing in batches #4606

Merged
merged 27 commits into from
May 18, 2023
Merged

Writing in batches #4606

merged 27 commits into from
May 18, 2023

Conversation

lutter
Copy link
Collaborator

@lutter lutter commented May 9, 2023

This PR changes graph-node so that subgraphs write their changes in batches instead of at every block. During syncing, a batch of changes is only written if it is bigger than a certain size or older than a certain amount of time; while the batch is held, calls to transact_block_operations will append their changes to the current batch. Batching is controlled by the environment variables GRAPH_STORE_WRITE_BATCH_DURATION and GRAPH_STORE_WRITE_BATCH_SIZE. Setting either to 0 will turn off batching and force writes at every block.

The two most difficult parts of the PR are probably the logic in RowGroup.append_row, coupled with how entities are looked up in RowGroup.last_op and RowGroup.effective_ops (I am working on unit tests for that). The other complicated part of this PR is the locking logic in graph::store::postgres::writable::Queue, especially the interplay between start_writer and push_write.

It's strongly recommended to look at this PR commit by commit; the first few commits, up to 'Introduce a data structure to capture a write' just lay some groundwork to make what comes after simpler/possible. After that, commits up to 'Attempt to extend existing write requests' prepare the code base for batches that span multiple blocks; the remaining commits implement the actual logic and control for combining multiple transact_block_operations into one batch.

@lutter
Copy link
Collaborator Author

lutter commented May 11, 2023

Rebased to latest master

@lutter
Copy link
Collaborator Author

lutter commented May 11, 2023

Rebased once more

Copy link
Collaborator

@leoyvens leoyvens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful ✨

.rows
.last()
.map(|emod| emod.block() <= block)
.unwrap_or(true));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this a hard assert! since it's cheap to run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like panics in production, but here it was easy to make that a constraint_violation!

/// The tracker relies on `update` being called in the order newest request
/// in the queue to oldest request so that reverts are seen before the
/// writes that they revert.
/// The best way to use the trtacker is to use the `fold_map` and `find`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The best way to use the trtacker is to use the `fold_map` and `find`
/// The best way to use the tracker is to use the `fold_map` and `find`

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

key,
end: Some(end),
..
} if at < *end => EntityOp::Write { key, entity: data },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct, but it would be good to have a test that would fail if this were <=.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests in last_op also fail when this is turned into a <=

.filter(move |emod| seen.insert(emod.id()))
.map(EntityOp::from)
.filter(move |emod| {
if emod.block() <= at {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, a test that fails if this were <.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit with a bunch of unit tests, and checked tha changing to a < here makes them fail

@lutter
Copy link
Collaborator Author

lutter commented May 17, 2023

Rebased onto latest master

Copy link
Collaborator

@leoyvens leoyvens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is very readable, thanks for adding that

@lutter lutter force-pushed the lutter/batch-write branch 2 times, most recently from 2f7f76a to 5992b54 Compare May 18, 2023 17:19
lutter added 17 commits May 18, 2023 10:46
- Columns are now returned in the order in which they are defined, not a
  random order
- In the common case where a column is present in all entities, we don't
  have to iterate over all entities
The mapping never changes, so it can be set once for the WritableStore, and
doesn't need to be passed in for every transact_block_operations
The contents of graph::components::store::write are somewhat provisional,
mostly to plumb using a `Batch` through the rest of the code. A later
commit will change this quite a bit.
We group all pending updates into runs by block.
When we start combining batches, we will want to mutate some changes in
place to avoid the data copies that we would need if we kept them separated
by the kind of change that is needed.
When changes to entities get combined, inserting a row can either create a
new entity, or update an existing version, and the number of rows inserted
is no longer an accurate count of new entities.

Instead of relying on data coming from the database, we now use the
in-memory representation of changes to determine how applying a batch
changes the number of entities in a deployment.
lutter added 10 commits May 18, 2023 10:46
The result of the append is a batch that, when written to the database, has
the same effect as writing the two batches in two separate transactions.
When a new write request gets queued, try to append it to an existing one
whenever possible, instead of creating a new request.
So far, new write requests were appended to existing ones only
opportunistically; this commit makes it so that we actually hold back
writing to allow processing to append to an existing write request.

Only write a batch if its size is beyond WRITE_BATCH_SIZE or if it is older
than WRITE_BATCH_DURATION. Since we don't have a way to wait on the batch
growing big enough, poll for that every 2s

This also requires that stopping or flushing the WritableStore sets
`batch_writes` to `false`, mostly for tests, to ensure that all pending
writes get written out before the background writer shuts down.
@lutter lutter merged commit 76619a6 into master May 18, 2023
@lutter lutter deleted the lutter/batch-write branch May 18, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants