Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - MissingBlockError for block k from tip #359

Open
karknu opened this issue Sep 21, 2023 · 15 comments · May be fixed by #1197
Open

[BUG] - MissingBlockError for block k from tip #359

karknu opened this issue Sep 21, 2023 · 15 comments · May be fixed by #1197
Assignees
Labels
bug Something isn't working known-deficiency

Comments

@karknu
Copy link
Contributor

karknu commented Sep 21, 2023

External

Summary
One of my nodes encountered a MissingBlockError error which caused the cardano-node to exit.
The “missing” block was for a block that was about k blocks from the tip which leads me to suspect that there is a race condition when you move blocks from the volatile db to the immutable db.
Log messages:

node-20230919170240.json:{"app":[],"at":"2023-09-19T17:03:33.04Z","data":{"connectionId":{"localAddress":{"address":"10.0.0.1","port":"3001"},"remoteAddress":{"address":"11.0.0.1","port":"35797"}},"kind":"MuxErrored","reason":"UnexpectedFailure (MissingBlockError (EmptySlot (RealPoint (SlotNo 103532636) d2af962b1139168ba979d74ecc32e96a566e7ce0e6c327bdfdd5665d9fa813d6)))"},"env":"8.1.2:53422","host":"miami1","loc":null,"msg":"","ns":["cardano.node.InboundGovernor"],"pid":"2365086","sev":"Info","thread":"251"}
node-20230919170240.json:{"app":[],"at":"2023-09-19T17:03:33.04Z","data":{"connectionHandler":{"command":"ShutdownNode","context":"InboundError","kind":"Error","reason":"UnexpectedFailure (MissingBlockError (EmptySlot (RealPoint (SlotNo 103532636) d2af962b1139168ba979d74ecc32e96a566e7ce0e6c327bdfdd5665d9fa813d6)))"},"connectionId":{"localAddress":{"address":"10.0.0.1","port":"3001"},"remoteAddress":{"address":"11.0.0.1","port":"35797"}},"kind":"ConnectionHandler"},"env":"8.1.2:53422","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ConnectionManager"],"pid":"2365086","sev":"Critical","thread":"62940"}
node-20230919170240.json:{"app":[],"at":"2023-09-19T17:03:49.17Z","data":{"kind":"DiffusionErrored","path":"ExceptionInHandler 11.0.0.1:35797 UnexpectedFailure (MissingBlockError (EmptySlot (RealPoint (SlotNo 103532636) d2af962b1139168ba979d74ecc32e96a566e7ce0e6c327bdfdd5665d9fa813d6)))"},"env":"8.1.2:53422","host":"miami1","loc":null,"msg":"","ns":["cardano.node.DiffusionInitializationTracer"],"pid":"2365086","sev":"Critical","thread":"7"}

Data from dbsync:

select block_no,time,slot_no,hash from block where block_no=(select block_no+180*12 from block where hash='\xd2af962b1139168ba979d74ecc32e96a566e7ce0e6c327bdfdd5665d9fa813d6');
 block_no |        time         |  slot_no  |                                hash                                
----------+---------------------+-----------+--------------------------------------------------------------------
  9310372 | 2023-09-19 17:03:27 | 103576716 | \xad1fb1233729b1ba3ee93fa315b299805e0d381af039367edcb596f236fdc67f

The block k blocks ahead of the missing block was forged at 2023-09-19 17:03:27 , that is 6 seconds from the log entry .

Steps to reproduce
Not known, I've verified that the node normally manages to serve blocks around k blocks from the tip. I suspect this is a race condition so it may be easier to trigger in a simulation.

Expected behavior
I expect the cardano-node to reliably be able to server blocks from the chain even if the block happens to be k from the tip.

System info (please complete the following information):

  • OS Name: Debian
  • OS Version: 12
  • Consensus version: cardano-node 8.1.2 .
@dnadales dnadales added the bug Something isn't working label Sep 21, 2023
@amesgen
Copy link
Member

amesgen commented Sep 21, 2023

As the node did shut down, I guess this means that it was not thrown in eg the BlockFetch server, but rather in the ChainDB? Our ErrorPolicy explicitly tells the node to shut down on this error.

, ErrorPolicy $ \(e :: VolatileDBError blk) ->
case e of
VolatileDB.ApiMisuse{} -> Just ourBug
VolatileDB.UnexpectedFailure{} -> Just shutdownNode
, ErrorPolicy $ \(e :: ImmutableDBError blk) ->
case e of
ImmutableDB.ApiMisuse{} -> Just ourBug
ImmutableDB.UnexpectedFailure{} -> Just shutdownNode

The only I know that might be related is this: We have one race condition related to copying stuff to the ImmutableDB:

| IteratorBlockGCed (RealPoint blk)
-- ^ The block that was supposed to be streamed was garbage-collected from
-- the VolatileDB, but not added to the ImmutableDB.
--
-- This will only happen when streaming very old forks very slowly.
--
-- The following example illustrates a situation in which an iterator result
-- could be a 'IteratorBlockGCed' value. Suppose we start with an iterator
-- positioned at block @c@, where @[[ x ]]@ denotes a block in the
-- immutable DB:
--
-- @
-- iterator i
--
-- ... ⟶ [[ a ]] → [[ b ]] → [ c ] -> [ d ]
-- ──────────────────────╯ ╰────────────╯
-- Immutable DB Current chain
-- @
--
-- Suppose we switch to a longer fork that branches off from the immutable
-- tip ('[[b]]').
--
-- @
-- iterator i
--
-- ... ⟶ [[ a ]] → [[ b ]] → [ c ] -> [ d ]
-- ──────────────────────╯│
-- Immutable DB │
-- ╰-→ [ e ] -> [ f ] -> [ g ]
-- ╰─────────────────────╯
-- Current chain
-- @
--
-- Assume @k=2@. This means that block @e@ is the new immutable tip. If we
-- would call 'iteratorNext' on @i@ __after__ block @e@ is copied to the
-- immutable DB and @c@ and @d@ are garbage collected, then we will get
-- 'IteratorBlockGCed'.

But I have no idea if that is actually the cause.

@nfrisby
Copy link
Contributor

nfrisby commented Sep 21, 2023

Your block of interest is d2af962b11 in slot 103532636 at blockno 9308212.

The Cardano CLI converts the 2023-09-19T17:03:49Z timestamp from your log to slot 103576738.
The honest chain has two blocks on either side of 103576738.

fafcff9b8a0 in slot 103576729 at blockno 9310373
9923fe043 in slot 103576747 at blockno 9310374

If your node had selected fafcff9b8a0, then d2af962b11 was exactly your immutable tip (because 9310373 - 9308212 = k+1). (edit 3: this is right for the wrong reasons) Which does seem suspicious.

Edit: Actually, your first error message is 16 seconds earlier, ie slot 103576722. The youngest block as of that moment is

ad1fb12337 in slot 103576716 at blockno 9310372

for which your d2af962b11 would be the successor of the immutable tip (edit 3: this is wrong).

Edit 2: it's possible that fafcff9b8a0 had a better tiebreaker than another block with height 9310373 in a slot <= 103576722. In which case d2af962b11 might have been indeed the immutable tip. But this seems relatively unlikely. Perhaps the rest of your logs could clarify @karknu?

Edit 3: My counting above was off by one. Suppose k were 1. Then d2af962b11 would be your immutable tip when your selected block no was 9308212+k, not 9308212+k+1. Since 9308212+2160 = 9310372, then ad1fb12337 was the imm tip when ad1fb12337 (from the first Edit) was your tip.

@nfrisby
Copy link
Contributor

nfrisby commented Sep 21, 2023

Two theories:

  • Block was corrupted on disk (maybe your logs after the restart would have some indication along these lines @karknu ?). In which case this behavior is as expected (though some control-flow context on the error would have been very helpful!)
  • There is indeed some race condition. But nothing so far seems exceptional about your reported scenario, so why haven't we seen more reports of this?

That's as far as we've considered so far (triage).

@karknu
Copy link
Contributor Author

karknu commented Sep 21, 2023

@nfrisby Here is the output during startup. To me it looks like the immutable tip was d2af962b11 .

hread":"7"}
{"app":[],"at":"2023-09-19T17:09:24.39Z","data":{"kind":"TraceOpenEvent.StartedOpeningImmutableDB"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":
"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:09:24.48Z","data":{"finalChunk":"4793","initialChunk":"4793","kind":"TraceImmutableDBEvent.StartedValidatingChunk"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":[
"cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:09:24.59Z","data":{"chunkNo":"4793","kind":"TraceImmutableDBEvent.ValidatedChunk"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"23
74905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:09:24.59Z","data":{"blockNo":"9308212","chunkNo":"4793","immTip":"d2af962b1139168ba979d74ecc32e96a566e7ce0e6c327bdfdd5665d9fa813d6","kind":"TraceImmutableDBEvent.ValidatedLastLoc
ation"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:09:24.59Z","data":{"epoch":"4793","immtip":{"headerHash":"d2af962b1139168ba979d74ecc32e96a566e7ce0e6c327bdfdd5665d9fa813d6","kind":"BlockPoint","slot":103532636},"kind":"TraceOpe
nEvent.OpenedImmutableDB"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:09:24.59Z","data":{"kind":"TraceOpenEvent.StartedOpeningVolatileDB"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"
Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:09:25.56Z","data":{"kind":"TraceOpenEvent.OpenedVolatileDB"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","t
hread":"7"}
{"app":[],"at":"2023-09-19T17:09:25.56Z","data":{"kind":"TraceOpenEvent.StartedOpeningLgrDB"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info"
,"thread":"7"}
{"app":[],"at":"2023-09-19T17:10:29.64Z","data":{"kind":"TraceLedgerReplayEvent.ReplayFromSnapshot","snapshot":{"kind":"snapshot"},"tip":"RealPoint (SlotNo 103515006) c676fdd70912add8269785eb9336c015fee570d41
88597ce0096f0b2743ee0d9"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:30.15Z","data":{"kind":"TraceLedgerReplayEvent.ReplayedBlock","slot":103515018,"tip":103532636},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.Cha
inDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:31.90Z","data":{"kind":"TraceLedgerReplayEvent.ReplayedBlock","slot":103528746,"tip":103532636},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.Cha
inDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:32.10Z","data":{"kind":"TraceOpenEvent.OpenedLgrDB"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread
":"7"}
{"app":[],"at":"2023-09-19T17:10:32.10Z","data":{"kind":"TraceFollowerEvent.StartedInitChainSelection"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","s
ev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:32.34Z","data":{"currentBlock":"11cea54f6bbb54418412d602f4559a3d0f448308cda19b6b7eee8be83ddab9cc@103532676","kind":"TraceAddBlockEvent.AddBlockValidation.UpdateLedgerDbTraceEv
ent.StartedPushingBlockToTheLedgerDb","startingBlock":"11cea54f6bbb54418412d602f4559a3d0f448308cda19b6b7eee8be83ddab9cc@103532676","targetBlock":"ad1fb1233729b1ba3ee93fa315b299805e0d381af039367edcb596f236fdc6
7f@103576716"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:35.80Z","data":{"kind":"LogValue","name":"before next, messages elided","value":{"contents":103532685,"tag":"PureI"}},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","
ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:35.80Z","data":{"currentBlock":"c649429b6614a39015a894bd72e93efa009629c99dcf3e7879eeef5c57ce4213@103542656","kind":"TraceAddBlockEvent.AddBlockValidation.UpdateLedgerDbTraceEv
ent.StartedPushingBlockToTheLedgerDb","startingBlock":"11cea54f6bbb54418412d602f4559a3d0f448308cda19b6b7eee8be83ddab9cc@103532676","targetBlock":"ad1fb1233729b1ba3ee93fa315b299805e0d381af039367edcb596f236fdc6
7f@103576716"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:35.80Z","data":{"currentBlock":"5658585aa8ca52ae366cf4a13be1e9f2d0e4766a5bb35c284d4b86b44bc9ef08@103542715","kind":"TraceAddBlockEvent.AddBlockValidation.UpdateLedgerDbTraceEv
ent.StartedPushingBlockToTheLedgerDb","startingBlock":"11cea54f6bbb54418412d602f4559a3d0f448308cda19b6b7eee8be83ddab9cc@103532676","targetBlock":"ad1fb1233729b1ba3ee93fa315b299805e0d381af039367edcb596f236fdc6
7f@103576716"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:35.81Z","data":{"currentBlock":"0d9120fa821d5842c71aaef77c894bc4097342becbf8a79028e10e60fd49ab91@103542728","kind":"TraceAddBlockEvent.AddBlockValidation.UpdateLedgerDbTraceEv
ent.StartedPushingBlockToTheLedgerDb","startingBlock":"11cea54f6bbb54418412d602f4559a3d0f448308cda19b6b7eee8be83ddab9cc@103532676","targetBlock":"ad1fb1233729b1ba3ee93fa315b299805e0d381af039367edcb596f236fdc6
7f@103576716"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:39.24Z","data":{"kind":"LogValue","name":"before next, messages elided","value":{"contents":103542756,"tag":"PureI"}},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","
ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:39.24Z","data":{"currentBlock":"548885118bc1e3612b5a35bc9aaf20bbbee46eff4027f9f144d51a3d8c3b87c0@103552752","kind":"TraceAddBlockEvent.AddBlockValidation.UpdateLedgerDbTraceEv
ent.StartedPushingBlockToTheLedgerDb","startingBlock":"11cea54f6bbb54418412d602f4559a3d0f448308cda19b6b7eee8be83ddab9cc@103532676","targetBlock":"ad1fb1233729b1ba3ee93fa315b299805e0d381af039367edcb596f236fdc6
7f@103576716"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:39.24Z","data":{"currentBlock":"3c3cf7308bb2fdda88beb7c3edfb652b69c762947cb5b5607f48a8bbab6e60e6@103552763","kind":"TraceAddBlockEvent.AddBlockValidation.UpdateLedgerDbTraceEv
ent.StartedPushingBlockToTheLedgerDb","startingBlock":"11cea54f6bbb54418412d602f4559a3d0f448308cda19b6b7eee8be83ddab9cc@103532676","targetBlock":"ad1fb1233729b1ba3ee93fa315b299805e0d381af039367edcb596f236fdc67f@103576716"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:39.24Z","data":{"currentBlock":"d4ce8562f28f85b0e87d4e27118b0d721a62fe51af32826dea642afcd34fc99c@103552790","kind":"TraceAddBlockEvent.AddBlockValidation.UpdateLedgerDbTraceEvent.StartedPushingBlockToTheLedgerDb","startingBlock":"11cea54f6bbb54418412d602f4559a3d0f448308cda19b6b7eee8be83ddab9cc@103532676","targetBlock":"ad1fb1233729b1ba3ee93fa315b299805e0d381af039367edcb596f236fdc67f@103576716"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:43.23Z","data":{"kind":"LogValue","name":"before next, messages elided","value":{"contents":103552813,"tag":"PureI"}},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}
{"app":[],"at":"2023-09-19T17:10:43.23Z","data":{"currentBlock":"e44a0fca5b7f49431e6c18139c3855c59b2f79211e84646d2ad9cbc3f566d207@103562809","kind":"TraceAddBlockEvent.AddBlockValidation.UpdateLedgerDbTraceEvent.StartedPushingBlockToTheLedgerDb","startingBlock":"11cea54f6bbb54418412d602f4559a3d0f448308cda19b6b7eee8be83ddab9cc@103532676","targetBlock":"ad1fb1233729b1ba3ee93fa315b299805e0d381af039367edcb596f236fdc67f@103576716"},"env":"8.1.2:d2d90","host":"miami1","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"2374905","sev":"Info","thread":"7"}

@karknu
Copy link
Contributor Author

karknu commented Sep 21, 2023

The is no trace of any forks (TraceAddBlockEvent.SwitchedToAFork) around 17:03.

@nfrisby
Copy link
Contributor

nfrisby commented Sep 21, 2023

Hmm. This log message

{
  "at": "2023-09-19T17:09:24.59Z",
  "data": {
    "blockNo": "9308212",
    "chunkNo": "4793",
    "immTip": "d2af962b1139168ba979d74ecc32e96a566e7ce0e6c327bdfdd5665d9fa813d6",
    "kind": "TraceImmutableDBEvent.ValidatedLastLocation"
  }

indicates that

  • d2af962b1139168ba979d74ecc32e96a566e7ce0e6c327bdfdd5665d9fa813d6 was indeed your immtip at the time of the crash
  • That block was correctly written to the the ImmDB (since the chunk that contains it was successfully validated).

This suggests an hypothesis.

  • Fact: blocks are GC'd from the VolDB at least 60 seconds after they are copied to the ImmDB (I think that 60 second delay is supposed to eg justify not calling fsync). So it's unlikely that it was GC'd from the VolDB too soon.
  • But perhaps it was requested from the ImmDB (instead of from the VolDB) too soon after being copied to it.

@jasagredo
Copy link
Contributor

I was able to reproduce something similar to this error on my local setup. The setup is as follows:

  • An Ubuntu laptop running cardano-node 8.9.4 with ExperimentalProtocolsEnabled: true, with tcp interface exposed and socket for cardano-cli.
    cardano-node run ... --host-addr 0.0.0.0 --port 3001 --socket-path ./socket
    
  • A Windows workstation running cardano-node 8.9.4 in order to have an up to date immutable db.
    cardano-node run
    
  • The socket from the laption is tunneled via ssh to the workstation's WSL.
    ssh -L /home/javier/sock:/home/javier/Cardano/socket 192.168.1.50
    
  • In WSL, run a cardano-cli query to get the immutable tip:
    FROM=$(cardano-cli conway query tip --immutable-tip --socket-path ./sock --mainnet | jq .slot)
    echo "Immutable tip is $FROM"
    
  • In WSL, run multiple n2n-pg processes, which given a specific block, they request a number of subsequent blocks from the server:
    for i in {1..60}; do 
      ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10 & 
    done
    

Then the n2n-pg processes sometimes exits with:

n2n-pg: user error (Server doesn't know At (Block {blockPointSlot = SlotNo 128483330, blockPointHash = 86bebd45d35ca7e5eecf2d712179c6f62f6f4e667db579cfd47ba71c01df01a3}))

So the server is saying MsgIntersectNotFound (see

SenderEffect $ fail $ "Server doesn't know " <> show startPt
) on the exact block that the node told us that was the immutable tip. It seems if I spawn ~10 processes, things work fine:

$ FROM=$(./cardano-cli conway query tip --immutable-tip --socket-path ./socket --mainnet | jq .slot); printf "Immutable tip is $FROM\n"; for i in {1..10}; do ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10  & done
...
[1]   Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10
[2]   Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10
[3]   Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10
[4]   Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10
[5]   Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10
[6]   Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10
[7]   Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10
[8]   Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10
[9]-  Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10
[10]+  Done                    ./result/bin/n2n-pg --config config/config.json --db /mnt/c/Users/Javier/code/cardano/cardano-node/mainnet/db/immutable --server 192.168.1.50:3001 --start-from $FROM --num-blocks 10

It seems it has to do with timing, as sometimes I have been able to spawn over 30 instances and it would still work.

It also seems that the more processes I spawn, the more it takes for any of them to start streaming blocks. Perhaps there is some bound on the resources available in the server to attend all those requests? As I increase the number of processes it becomes more likely for the error to happen.

@jasagredo
Copy link
Contributor

I'm trying to see if this is to do with exhausting resources, by tracing the exceptions thrown by the node when attacked like this. I am being unable to reproduce it because it seems between the point at which I make the query and the moment the profiled (slow) node starts answering, enough time has passed for everything to work.

@jasagredo
Copy link
Contributor

I provided @johnalotoski a patched node (https://github.com/IntersectMBO/cardano-node/tree/js/clarify-missing-block) to try to shed some light by adding more data to the printed message. Will wait for results

@jasagredo
Copy link
Contributor

With the patch I provided, we can see the following information:

Jul 13 16:46:36 boot4-rel-us-west-2a-1 cardano[1650672]: TrMuxErrored (ConnectionId {localAddress = 172.31.27.105:3001, remoteAddress = 79.207.3.136:50232}) (UnexpectedFailure (MissingBlockError (EmptySlot (RealPoint (SlotNo 129279922) 67a1c9d987097961207b4c073907ef3f89181fd85db16ab831b3907510b21255) 5985 [RelativeSlot {relativeSlotChunkNo = 5985, relativeSlotChunkSize = ChunkSize {chunkCanContainEBB = True, numRegularBlocks = 21600}, relativeSlotIndex = 3923}] (Just (StrictSeq {fromStrict = fromList [0,0,0,56,56,56,56,56,56,56,56,56,56,56,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,112,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,168,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,224,280,280,280,280,280,280,280,336,336,336,336,392,392,392,448,448,504,504,504,504,504,560,560,560,560,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,672,672,672,672,672,672,672,672,672,672,672,672,672,672,672,672,672,672,672,728,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,784,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,840,896,896,896,896,896,896,896,896,896,896,896,896,896,896,896,896,896,896,896,896,896,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,952,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1008,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1064,1120,1120,1120,1120,1120,1120,1120,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1176,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1232,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1288,1344,1344,1344,1344,1344,1344,1344,1344,1344,1344,1344,1344,1400,1400,1400,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1456,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1512,1568,1568,1568,1568,1568,1568,1568,1624,1624,1624,1680,1680,1680,1680,1736,1736,1736,1736,1736,1736,1736,1736,1736,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1792,1848,1848,1848,1848,1848,1848,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1904,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,2016,2016,2016,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2072,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2128,2184,2184,2184,2184,2184,2184,2184,2184,2184,2240,2240,2240,2240,2240,2240,2240,2240,2240,2240,2240,2240,2240,2240,2240,2240,2296,2296,2296,2296,2296,2296,2296,2296,2296,2296,2296,2296,2296,2296,2352,2352,2352,2352,2352,2352,2352,2352,2352,2352,2352,2352,2352,2352,2408,2408,2408,2408,2408,2408,2408,2408,2408,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2464,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2520,2576,2576,2576,2576,2576,2576,2576,2576,2576,2576,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2632,2688,2688,2688,2688,2688,2688,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2744,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2800,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2856,2912,2912,2912,2912,2912,2968,2968,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3024,3080,3080,3080,3080,3080,3080,3136,3136,3136,3136,3136,3136,3136,3136,3136,3136,3136,3192,3192,3192,3192,3192,3192,3192,3192,3192,3192,3248,3304,3304,3304,3304,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3360,3416,3416,3472,3472,3472,3472,3472,3472,3472,3472,3528,3528,3528,3528,3528,3528,3528,3528,3528,3528,3528,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3584,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3640,3696,3696,3696,3696,3696,3696,3752,3752,3752,3752,3752,3752,3808,3808,3808,3808,3808,3808,3808,3808,3808,3808,3808,3808,3808,3808,3864,3864,3864,3864,3864,3864,3864,3864,3864,3864,3864,3864,3864,3864,3864,3864,3864,3920,3920,3920,3920,3920,3920,3920,3920,3920,3920,3920,3920,3920,3920,3920,3920,3920,3920,3976,3976,3976,3976,4032,4032,4032,4088,4088,4088,4144,4144,4144,4144,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4200,4256,4256,4256,4256,4256,4256,4256,4256,4256,4256,4256,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4312,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4368,4424,4424,4424,4424,4424,4424,4424,4424,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4480,4536,4536,4536,4536,4536,4536,4536,4536,4536,4536,4536,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4592,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4648,4704,4704,4704,4760,4760,4760,4760,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4816,4872,4872,4872,4872,4872,4872,4872,4872,4872,4872,4872,4872,4872,4872,4872,4872,4872,4872,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4928,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,4984,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5040,5096,5096,5152,5152,5152,5208,5208,5208,5208,5208,5208,5208,5208,5208,5208,5208,5208,5208,5208,5208,5208,5208,5208,5264,5264,5264,5320,5376,5376,5376,5376,5432,5432,5432,5432,5432,5488,5488,5488,5488,5544,5544,5544,5544,5544,5544,5544,5544,5544,5544,5544,5544,5544,5544,5544,5544,5544,5544,5600,5656,5656,5656,5656,5712,5712,5712,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5768,5824,5824,5824,5824,5824,5824,5824,5824,5824,5824,5880,5880,5936,5936,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,5992,6048,6048,6048,6104,6104,6104,6104,6104,6104,6104,6104,6104,6104,6104,6104,6104,6104,6104,6104,6104,6160,6160,6160,6160,6160,6160,6160,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6216,6272,6272,6272,6272,6272,6272,6272,6272,6272,6272,6272,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6328,6384,6384,6384,6384,6384,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6440,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6496,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6552,6608,6608,6608,6608,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6664,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6720,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6776,6832,6832,6832,6832,6832,6832,6832,6832,6832,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6888,6944,6944,6944,6944,6944,6944,7000,7000,7000,7000,7000,7000,7000,7000,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7056,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7112,7168,7168,7168,7168,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7224,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7280,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7336,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7392,7448,7448,7448,7448,7504,7504,7504,7504,7504,7504,7504,7504,7504,7504,7504,7560,7560,7560,7560,7560,7560,7560,7560,7560,7560,7616,7616,7616,7616,7616,7616,7616,7616,7672,7672,7672,7672,7672,7672,7672,7672,7672,7672,7672,7672,7672,7672,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7728,7784,7784,7784,7784,7840,7840,7840,7840,7840,7840,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7896,7952,7952,8008,8008,8008,8008,8008,8008,8008,8008,8064,8064,8064,8064,8064,8064,8064,8064,8064,8064,8064,8064,8064,8064,8064,8064,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8120,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8176,8232,8232,8232,8232,8232,8232,8232,8232,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8288,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8344,8400,8400,8400,8400,8400,8400,8456,8456,8456,8456,8456,8456,8456,8456,8456,8456,8456,8456,8456,8456,8456,8512,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8568,8624,8624,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8680,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8736,8792,8792,8792,8792,8848,8848,8848,8848,8848,8848,8848,8904,8904,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,8960,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9016,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9072,9128,9128,9128,9128,9128,9128,9128,9128,9128,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9184,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9240,9296,9296,9296,9296,9296,9296,9296,9296,9296,9296,9296,9296,9296,9296,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9352,9408,9408,9408,9408,9408,9408,9464,9464,9464,9464,9464,9464,9464,9464,9464,9464,9520,9520,9520,9520,9520,9520,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9576,9632,9632,9632,9632,9632,9632,9632,9632,9632,9632,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9688,9744,9744,9744,9744,9744,9744,9744,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9800,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9856,9912,9968,10024,10024,10080,10080,10080,10080,10080,10080,10080,10080,10080,10080,10080,10080,10080,10080,10136,10136,10136,10136,10136,10136,10136,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10192,10248,10248,10248,10304,10304,10304,10304,10304,10304,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10360,10416,10416,10416,10416,10416,10416,10416,10416,10416,10416,10416,10416,10416,10416,10416,10416,10472,10472,10472,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10528,10584,10640]})))))

Just a first eyeball over it already shows several things:

  1. This is the error thrown
  2. We are in the CurrentChunkInfo case here
  3. 3923 offset is NOT in the list of current offsets.

@jasagredo
Copy link
Contributor

Correction, it is probably not 3923 that has to be on the sequence, but instead it has to have 3923 items in it. It has 3920.

@jasagredo
Copy link
Contributor

I do have a hypothesis.

First of all, when we try to get the information for a slot, we call this function:

https://github.com/IntersectMBO/ouroboros-consensus/blob/main/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ImmutableDB/Impl/Iterator.hs#L274-L308

This will first compute which offset should the block be in, then read that offset from the Index. To compute which offset the block would be in, we call chunkSlotForUnknownBlock which calls chunkSlotForRegularBlock:

https://github.com/IntersectMBO/ouroboros-consensus/blob/main/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ImmutableDB/Chunks/Layout.hs#L180-L190

Say we want the ChunkSlot for 129279922. We would get the chunk number 5985 and for the relative slot we would get 3922 which plus one for the chunkCanContainEBB is 3923. This is independent from our current knowledge of the store, just by guessing where this block should be.

Now we try to read this offset from the index by calling readOffsets:

https://github.com/IntersectMBO/ouroboros-consensus/blob/main/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ImmutableDB/Impl/Index/Cache.hs#L669-L698

This will consult the MVar of the cache, and then check if splitting the sequence at the desired point gives us two sequences which don't end and start with the same slot (which would indicate that it is a backfill'ed slot). This MVar has not enough items in the sequence, in fact it has 3920 which is pretty close to the right number but not quite.


Now two other observations: when is this value updated and how did we end up looking in the immutable db.

We ended up looking in the immutable DB because the desired point is no longer in the cdbChain, which means we have removeFromChain it in copyToImmutableDB background thread.

Before removing it from the chain, we have called appendBlock:
https://github.com/IntersectMBO/ouroboros-consensus/blob/main/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ChainDB/Impl/Background.hs#L162-L166

which in turn will call Index.appendOffsets:

https://github.com/IntersectMBO/ouroboros-consensus/blob/main/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ImmutableDB/Impl/Index/Cache.hs#L765-L776

So the values there will supposedly be updated. This is basically a race condition https://hackage.haskell.org/package/base-4.20.0.1/docs/Control-Concurrent-MVar.html#v:modifyMVar_

This function is only atomic if there are no other producers for this MVar. In other words, it cannot guarantee that, by the time modifyMVar_ gets the chance to write to the MVar, the value of the MVar has not been altered by a write operation from another thread.


My suggestions are the following:
a) Encapsulate the Cached index in a TVar, which will ensure concurrent updates are properly sequenced.
b) Instead of having one big MVar with the whole cache, have one for each field, as they have very different dynamics. This can maybe ensure we don't hit this race condition.

@amesgen
Copy link
Member

amesgen commented Jul 15, 2024

So the values there will supposedly be updated. This is basically a race condition https://hackage.haskell.org/package/base-4.20.0.1/docs/Control-Concurrent-MVar.html#v:modifyMVar_

This function is only atomic if there are no other producers for this MVar. In other words, it cannot guarantee that, by the time modifyMVar_ gets the chance to write to the MVar, the value of the MVar has not been altered by a write operation from another thread.

As you are quoting this part of the documentation: What other producers are there for cacheVar? Simply grepping for cacheVar doesn't seem to reveal any; every place uses modifyMVar/swapMVar or some equivalent construct that always first tries to take the MVar.

@jasagredo
Copy link
Contributor

Right indeed. I was discussing this with Joris. There doesn't seem to be a single put without a take. However there could be modifyMVars with outdated information, something like:

-- thread 1
do 
  sth <- modifyMVar ...
  modifyMVar ... $ \... -> ... sth ...

-- thread 2
do
  modifyMVar ...

If things don't align correctly this might result in outdated information ending up in the MVar, no?

@jasagredo
Copy link
Contributor

This ticket is blocked on the results from the deployed relays by @johnalotoski, with the patch in https://github.com/IntersectMBO/ouroboros-consensus/tree/js/tvar-cache

@jasagredo jasagredo linked a pull request Jul 31, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working known-deficiency
Projects
Status: 🏗 In progress
Development

Successfully merging a pull request may close this issue.

5 participants