Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain DB: addBlock queue ineffective #655

Open
mrBliss opened this issue Jul 31, 2020 · 0 comments · May be fixed by IntersectMBO/ouroboros-network#2721
Open

Chain DB: addBlock queue ineffective #655

mrBliss opened this issue Jul 31, 2020 · 0 comments · May be fixed by IntersectMBO/ouroboros-network#2721

Comments

@mrBliss
Copy link
Contributor

mrBliss commented Jul 31, 2020

@karknu has discovered that the BlockFetch client is able to quickly add the first block of the Shelley era to the ChainDB, but the second one takes 11s!

MsgRequestRange ChainRange from At (SlotNo {unSlotNo = 4492800}) to At (SlotNo {unSlotNo = 4492800})
MsgStartBatch ChainRange from At (SlotNo {unSlotNo = 4492800}) to At (SlotNo {unSlotNo = 4492800})
MsgBlock recved
MsgBlock verified after 0.000023s
MsgBlock written to disk after 0.085022s (0.085045s)
CompletedBlockFetch responseTime 0.172265s size 1013 ChainRange from At (SlotNo {unSlotNo = 4492800}) to At (SlotNo {unSlotNo = 4492800})

MsgRequestRange ChainRange from At (SlotNo {unSlotNo = 4492840}) to At (SlotNo {unSlotNo = 4492840})
MsgStartBatch ChainRange from At (SlotNo {unSlotNo = 4492840}) to At (SlotNo {unSlotNo = 4492840})
MsgBlock recved
MsgBlock verified after 0.000019s
MsgBlock written to disk after 11.361727s (11.361746s)
CompletedBlockFetch responseTime 11.522864s size 1013 ChainRange from At (SlotNo {unSlotNo = 4492840}) to At (SlotNo {unSlotNo = 4492840})

One might think that the first block was quick to validate, but the second slow. This would be counter-intuitive because the first block triggers the transition, requiring an expensive translation of the ledger state. Validating the second block should be quick.

Note that when the BlockFetch client adds a block to the ChainDB, it should only block until the block has been written to disk, not until chain selection has been performed for that block.

With some extra tracing, we see:

addFetchedBlock At (SlotNo {unSlotNo = 4492800}) 0.096076772s
chainSelection: 16.176468239s
addFetchedBlock At (SlotNo {unSlotNo = 4492840}) 15.617731605s

This means that actually the chain selection for the first block is taking long, not the second. Adding the second block is blocked by the first block still being processed.

What's actually going on is the following:

  • The BlockFetch client adds the blocks it downloaded to a queue in the ChainDB with a maximum size of 10. This maximum size is there to provide back pressure, otherwise the number of blocks in memory could grow without bound.
  • A background thread in the ChainDB processes the blocks in this queue one by one in a loop. In each iteration of the loop, it writes the block to disk (unless we intentionally ignore it), delivers the promise the BlockFetch client is waiting on, then performs chain selection for the block, and finally delivers another promise indicating the block has been processed.
  • While the background thread is still doing chain selection for the first block, the second block has been added to the queue and the BlockFetch client is waiting for it to be written to disk, which won't happen until after the first block has been fully processed.

This means that the effective overlap or pipelining is limited to 1 block, not the configured 10.

To fix this, there could be a separate queue for each step, i.e., one for writing blocks to disk and one for doing chain selection for blocks.

However, the more queues, the more time lost on synchronising things and overhead. The shorter the actual steps (writing to disk, chain selection) take, the more overhead there will be. So adding the extra queue is not guaranteed to speed things up in all cases. Likely for this case, but for bulk chain sync of mostly empty Byron blocks, it might slow things down.

My plan: in practice we always wait for the block to be written to disk, we never want to just add the block to the queue without any extra waiting. We can synchronously add the block to the VolatileDB and only then add the block to the queue with blocks awaiting chain selection. This would also allow reordering out of order blocks in that queue using, e.g., an OrdPSQ.

mrBliss referenced this issue in IntersectMBO/ouroboros-network Jul 31, 2020
mrBliss referenced this issue in IntersectMBO/ouroboros-network Aug 10, 2020
mrBliss referenced this issue in IntersectMBO/ouroboros-network Aug 10, 2020
mrBliss referenced this issue in IntersectMBO/ouroboros-network Aug 10, 2020
mrBliss referenced this issue in IntersectMBO/ouroboros-network Nov 2, 2020
Fixes #2487.

Currently, the effective queue size when adding blocks to the ChainDB is 1 (for
why, see #2487). In this commit, we let the BlockFetch client add blocks fully
asynchronously to the ChainDB, which restores the effective queue size to the
configured value again, e.g., 10.

The BlockFetch client will no longer wait until the block has been written to
the VolatileDB (and thus also not until the block has been processed by chain
selection). The BlockFetch client can just hand over the block and continue
downloading with minimum delay. To make this possible, we change the behaviour
of `getIsFetched` and `getMaxSlotNo` to account for the blocks in the queue,
otherwise the BlockFetch client might try to redownload already-fetched blocks.

This is an alternative to #2489, which let the BlockFetch client write blocks to
the VolatileDB synchronously. The problem with that approach is that multiple
threads are writing to the VolatileDB, instead of a single background thread. We
have relied on the latter to simplify the VolatileDB w.r.t. consistency after
incomplete writes.
mrBliss referenced this issue in IntersectMBO/ouroboros-network Nov 2, 2020
Fixes #2487.

Currently, the effective queue size when adding blocks to the ChainDB is 1 (for
why, see #2487). In this commit, we let the BlockFetch client add blocks fully
asynchronously to the ChainDB, which restores the effective queue size to the
configured value again, e.g., 10.

The BlockFetch client will no longer wait until the block has been written to
the VolatileDB (and thus also not until the block has been processed by chain
selection). The BlockFetch client can just hand over the block and continue
downloading with minimum delay. To make this possible, we change the behaviour
of `getIsFetched` and `getMaxSlotNo` to account for the blocks in the queue,
otherwise the BlockFetch client might try to redownload already-fetched blocks.

This is an alternative to #2489, which let the BlockFetch client write blocks to
the VolatileDB synchronously. The problem with that approach is that multiple
threads are writing to the VolatileDB, instead of a single background thread. We
have relied on the latter to simplify the VolatileDB w.r.t. consistency after
incomplete writes.
mrBliss referenced this issue in IntersectMBO/ouroboros-network Nov 3, 2020
Fixes #2487.

Currently, the effective queue size when adding blocks to the ChainDB is 1 (for
why, see #2487). In this commit, we let the BlockFetch client add blocks fully
asynchronously to the ChainDB, which restores the effective queue size to the
configured value again, e.g., 10.

The BlockFetch client will no longer wait until the block has been written to
the VolatileDB (and thus also not until the block has been processed by chain
selection). The BlockFetch client can just hand over the block and continue
downloading with minimum delay. To make this possible, we change the behaviour
of `getIsFetched` and `getMaxSlotNo` to account for the blocks in the queue,
otherwise the BlockFetch client might try to redownload already-fetched blocks.

This is an alternative to #2489, which let the BlockFetch client write blocks to
the VolatileDB synchronously. The problem with that approach is that multiple
threads are writing to the VolatileDB, instead of a single background thread. We
have relied on the latter to simplify the VolatileDB w.r.t. consistency after
incomplete writes.
mrBliss referenced this issue in IntersectMBO/ouroboros-network Nov 3, 2020
Fixes #2487.

Currently, the effective queue size when adding blocks to the ChainDB is 1 (for
why, see #2487). In this commit, we let the BlockFetch client add blocks fully
asynchronously to the ChainDB, which restores the effective queue size to the
configured value again, e.g., 10.

The BlockFetch client will no longer wait until the block has been written to
the VolatileDB (and thus also not until the block has been processed by chain
selection). The BlockFetch client can just hand over the block and continue
downloading with minimum delay. To make this possible, we change the behaviour
of `getIsFetched` and `getMaxSlotNo` to account for the blocks in the queue,
otherwise the BlockFetch client might try to redownload already-fetched blocks.

This is an alternative to #2489, which let the BlockFetch client write blocks to
the VolatileDB synchronously. The problem with that approach is that multiple
threads are writing to the VolatileDB, instead of a single background thread. We
have relied on the latter to simplify the VolatileDB w.r.t. consistency after
incomplete writes.
mrBliss referenced this issue in IntersectMBO/ouroboros-network Nov 3, 2020
Fixes #2487.

Currently, the effective queue size when adding blocks to the ChainDB is 1 (for
why, see #2487). In this commit, we let the BlockFetch client add blocks fully
asynchronously to the ChainDB, which restores the effective queue size to the
configured value again, e.g., 10.

The BlockFetch client will no longer wait until the block has been written to
the VolatileDB (and thus also not until the block has been processed by chain
selection). The BlockFetch client can just hand over the block and continue
downloading with minimum delay. To make this possible, we change the behaviour
of `getIsFetched` and `getMaxSlotNo` to account for the blocks in the queue,
otherwise the BlockFetch client might try to redownload already-fetched blocks.

This is an alternative to #2489, which let the BlockFetch client write blocks to
the VolatileDB synchronously. The problem with that approach is that multiple
threads are writing to the VolatileDB, instead of a single background thread. We
have relied on the latter to simplify the VolatileDB w.r.t. consistency after
incomplete writes.
@dnadales dnadales transferred this issue from IntersectMBO/ouroboros-network Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants