Reorg protection can fail #2363

gmart7t2 · 2023-08-24T19:48:16Z

Ord only creates savepoints if the number of blocks indexed is a multiple of 10 (or if it's less than 10, for some reason) when it commits to the db.

If I start indexing from scratch, interrupt the process within the first 5k blocks, then resume indexing, it's quite likely that I don't get any savepoints created.

Let's say I interrupted it after indexing 1234 blocks. A commit is done at that point, but 1234 isn't a multiple of 10 so no savepoint is created. When I resume indexing, a commit happens every 5000 blocks: after 6234 blocks, 11234 blocks, 16234 blocks, etc. None of those are multiple of 10 either.

I only get a savepoint if I am lucky enough to run the indexer when the block height ends with 9, which is 10% of the time.

Then, when a reorg happens I get a crash: "panicked at Option::unwrap()" when get_persistent_savepoint finds no savepoints.

It would be better if update_savepoints() created a savepoint if it has been 10 or more blocks since the last savepoint was made, whether or not the number of blocks indexed is a multiple of 10.

detect_reorg() assumes that savepoints happen every 10 blocks, but that isn't the case unless ord is running in server mode.

The text was updated successfully, but these errors were encountered:

raphjaph · 2023-08-27T13:33:02Z

The less than 10 is a hack so that we create savepoints while running tests.

The commit-every-5000 block logic is effectively used only during initial indexing while the indexed block height is more than 5000 blocks away from the chain tip. It is statistically extremely unlikely to get a reorged block that deep. Once the index has caught up with the chain tip, commit is called on every new block. We don't really see a scenario where the behavior you describe happens in practice. Reorgs are only really a problem when you keep ord running for a longer period of time, since reorgs are a short-lived, ephemeral phenomenon. We also only start creating savepoints if we are 21 blocks away from the tip since we do not want to create savepoints in the initial indexing for performance reasons. So the situation outlined above shouldn't occur in practice.

Did you observe the described behavior in practice?

gmart7t2 · 2023-11-07T13:15:24Z

Did you observe the described behavior in practice?

Not until today...

https://discord.com/channels/987504378242007100/1069465367988142110/1171074766875131964

Did he get very unlucky, and finish his chain indexing just as a reorg was happening? He's not responding in the discord so I don't know any details, but line 66 of reorg.rs has an unwrap() call in 0.9.0 and 0.10.0.

I have seen a fully synced server skip making a savepoint occasionally. It happens when 2 blocks are found in short succession, and the indexer commits the 2 blocks in a single commit, not committing on the multiple of 10, but on the multiple of 10 plus 1. Then when the reorg happens it goes 10 blocks further back than it should have. But that's a relatively minor issue.

gmart7t2 · 2023-11-07T13:35:53Z

A reasonable fix might be to add:

|| savepoints.is_empty()

right after the %10 check, so that there's always at least one savepoint in the db once we're close to the chain tip.

We'd need to fetch the savepoint list earlier but that's hopefully cheap.

gmart7t2 · 2023-11-08T00:37:57Z

Seems to me this isn't all that unlikely to happen if you never run the ord server. You only have a 10% chance of creating your first savepoint each time you run ord, so if you only run ord when you want to make an inscription it takes you 7 runs before you have a greater than 50% chance of having a savepoint made. If any of those 7 runs were while you were on the wrong side of a chain fork, you lose.

It's not very likely to happen to any individual, but with enough people inscribing it was bound to happen eventually.

gmart7t2 linked a pull request Aug 24, 2023 that will close this issue

Create savepoint if it has been SAVEPOINT_INTERVAL blocks since the… #2365

Open

casey added the bug label Aug 30, 2023

raphjaph mentioned this issue Nov 9, 2023

testnet fails to sync with v0.11.0 #2646

Closed

raphjaph added this to Tracker Nov 17, 2023

raphjaph moved this to Backlog in Tracker Nov 17, 2023

raphjaph moved this from Backlog to In Progress in Tracker Nov 24, 2023

raphjaph moved this from In Progress to Backlog in Tracker Nov 24, 2023

casey removed the status in Tracker Feb 12, 2024

casey removed this from Tracker Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorg protection can fail #2363

Reorg protection can fail #2363

gmart7t2 commented Aug 24, 2023

raphjaph commented Aug 27, 2023

gmart7t2 commented Nov 7, 2023

gmart7t2 commented Nov 7, 2023

gmart7t2 commented Nov 8, 2023

Reorg protection can fail #2363

Reorg protection can fail #2363

Comments

gmart7t2 commented Aug 24, 2023

raphjaph commented Aug 27, 2023

gmart7t2 commented Nov 7, 2023

gmart7t2 commented Nov 7, 2023

gmart7t2 commented Nov 8, 2023