3.2: when ship node exits with error it usually doesn't start up again properly with snapshot #596

matthewdarwin · 2022-12-23T21:00:37Z

Normally when a ship node crashes, you can start it again by using a snapshot, and then it will start with log messages like:

info  2022-12-23T20:32:22.959 nodeos    controller.cpp:494            replay               ] existing block log, attempting to replay from 218085489 to 218258039 blocks
info  2022-12-23T20:32:24.373 nodeos    log.hpp:479                   truncate             ] fork or replay: removed 172884 blocks from trace_history.log
info  2022-12-23T20:32:24.629 nodeos    log.hpp:479                   truncate             ] fork or replay: removed 172884 blocks from chain_state_history.log

Then it runs replay and all is good.

With 3.2 it seems very often the trace history index is somehow corrupted, and it re-builds the entire state history from scratch.. which takes an inordinate amount of time on a large blockchain (eg wax), so just better to restore from backup than to let this continue.

It is always the case that the trace history index index can be corrupt on an unclean shutdown, but there is something in 3.2 that makes it corrupt more than in previous versions. It seems like it is never valid?

The text was updated successfully, but these errors were encountered:

stephenpdeos · 2023-01-05T19:00:33Z

It appears that from 3.1 to 3.2 that an unclean kill of the nodeos process is more likely to corrupt the SHiP logs. Further discussion internally is required to take a stance on what level of resiliency we want these logs to have at this time. Because this is more likely to occur, possibly due to #592 we will revisit after spending some time with that issue.

stephenpdeos · 2023-02-09T18:27:35Z

Due to ongoing changes to SHiP targeted for this next release, we will continue to hold off on this issue for now.

spoonincode · 2023-02-23T18:24:35Z

fwiw there appears to be a difference in behavior between 2.0 and 3.x SHIP. 2.0 flushes both the index & log per block, where as 3.x only flushes the log. This means a crash (in addition to a power failure, etc) leaves the index+log in a state it will attempt a recovery upon relaunching.

It's not clear if this is a regression or simply a change in behavior. It's not clear what data file consistency the ship log intends to guarantee.

heifner · 2023-02-23T19:04:36Z

Seems like we might as well add back in the flush until time when determination of intended file consistency is made.

spoonincode · 2023-02-23T19:56:18Z

yeah I think it's fine to add it back

greg7mdp · 2023-02-28T15:47:43Z

Is it this change that's missing?

… or crash leaves valid ship logs.

[3.2] SHiP flush logs on write

…ush-4.0

…ush-4

[3.2 -> 4.0] SHiP flush logs on write

…ush-main

[3.2] forkdb reset in replay since blocks are signaled

…-4.0

[4.0 -> main] SHiP flush logs on write

[3.2 -> 4.0] forkdb reset in replay since blocks are signaled

…-main

[4.0 -> main] forkdb reset in replay since blocks are signaled

heifner · 2023-07-24T19:04:18Z

Cat: History

enf-ci-bot added the triage label Dec 23, 2022

enf-ci-bot added this to Team Backlog Dec 23, 2022

enf-ci-bot moved this to Todo in Team Backlog Dec 23, 2022

heifner added bug Something isn't working actionable and removed triage labels Dec 28, 2022

stephenpdeos added 👍 lgtm discussion and removed 👍 lgtm actionable labels Jan 5, 2023

heifner mentioned this issue Jan 20, 2023

make it difficult to accidentally perform a large rollback of state history logs #659

Closed

stephenpdeos added the 👍 lgtm label Feb 23, 2023

heifner self-assigned this Mar 29, 2023

heifner added the OCI Work exclusive to OCI team label Mar 29, 2023

heifner moved this from Todo to In Progress in Team Backlog Mar 29, 2023

heifner added a commit that referenced this issue Mar 30, 2023

GH-596 flush log and index files to make it more likely that a kill-9…

efd642b

… or crash leaves valid ship logs.

heifner mentioned this issue Mar 30, 2023

[3.2] SHiP flush logs on write #928

Merged

heifner moved this from In Progress to Awaiting Review in Team Backlog Mar 30, 2023

heifner added a commit that referenced this issue Mar 30, 2023

GH-596 Test forkdb reset in replay

602b85a

heifner added a commit that referenced this issue Mar 31, 2023

Merge pull request #928 from AntelopeIO/GH-596-ship-flush-3.2

4359bc0

[3.2] SHiP flush logs on write

heifner added a commit that referenced this issue Mar 31, 2023

Merge remote-tracking branch 'origin/release/3.2' into GH-596-ship-fl…

c97c4e3

…ush-4.0

heifner added a commit that referenced this issue Mar 31, 2023

GH-596 forkdb reset in replay since blocks are signaled

65d7130

heifner mentioned this issue Mar 31, 2023

[3.2 -> 4.0] SHiP flush logs on write #934

Closed

heifner added a commit that referenced this issue Mar 31, 2023

GH-596 Fix merge issue

4a43f82

heifner added a commit that referenced this issue Mar 31, 2023

Merge remote-tracking branch 'origin/release/3.2' into GH-596-ship-fl…

46b2231

…ush-4

heifner mentioned this issue Mar 31, 2023

[3.2 -> 4.0] SHiP flush logs on write #935

Merged

heifner added a commit that referenced this issue Mar 31, 2023

GH-596 forkdb reset in replay since blocks are signaled

51c4bea

heifner mentioned this issue Mar 31, 2023

[3.2] forkdb reset in replay since blocks are signaled #936

Merged

heifner added a commit that referenced this issue Mar 31, 2023

Merge pull request #935 from AntelopeIO/GH-596-ship-flush-4

fc8cd5b

[3.2 -> 4.0] SHiP flush logs on write

heifner added a commit that referenced this issue Mar 31, 2023

Merge remote-tracking branch 'origin/release/4.0' into GH-596-ship-fl…

752da16

…ush-main

heifner mentioned this issue Mar 31, 2023

[4.0 -> main] SHiP flush logs on write #937

Merged

heifner added a commit that referenced this issue Mar 31, 2023

Merge pull request #936 from AntelopeIO/GH-596-fork-db-3.2

d9a4772

[3.2] forkdb reset in replay since blocks are signaled

heifner added a commit that referenced this issue Mar 31, 2023

Merge remote-tracking branch 'origin/release/3.2' into GH-596-fork-db…

f3fcfe9

…-4.0

heifner mentioned this issue Mar 31, 2023

[3.2 -> 4.0] forkdb reset in replay since blocks are signaled #938

Merged

heifner closed this as completed in #937 Mar 31, 2023

heifner added a commit that referenced this issue Mar 31, 2023

Merge pull request #937 from AntelopeIO/GH-596-ship-flush-main

5d91a76

[4.0 -> main] SHiP flush logs on write

github-project-automation bot moved this from Awaiting Review to Done in Team Backlog Mar 31, 2023

heifner added a commit that referenced this issue Mar 31, 2023

Merge pull request #938 from AntelopeIO/GH-596-fork-db-4.0

60ebedd

[3.2 -> 4.0] forkdb reset in replay since blocks are signaled

heifner added a commit that referenced this issue Mar 31, 2023

Merge remote-tracking branch 'origin/release/4.0' into GH-596-fork-db…

618c805

…-main

heifner mentioned this issue Mar 31, 2023

[4.0 -> main] forkdb reset in replay since blocks are signaled #944

Merged

heifner added a commit that referenced this issue Apr 1, 2023

Merge pull request #944 from AntelopeIO/GH-596-fork-db-main

6f1e67b

[4.0 -> main] forkdb reset in replay since blocks are signaled

ericpassmore mentioned this issue Jul 21, 2023

Categorize Leap Bugs in Preparation up through Leap 4.x #1440

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.2: when ship node exits with error it usually doesn't start up again properly with snapshot #596

3.2: when ship node exits with error it usually doesn't start up again properly with snapshot #596

matthewdarwin commented Dec 23, 2022 •

edited

Loading

stephenpdeos commented Jan 5, 2023

stephenpdeos commented Feb 9, 2023

spoonincode commented Feb 23, 2023

heifner commented Feb 23, 2023

spoonincode commented Feb 23, 2023

greg7mdp commented Feb 28, 2023

heifner commented Jul 24, 2023

3.2: when ship node exits with error it usually doesn't start up again properly with snapshot #596

3.2: when ship node exits with error it usually doesn't start up again properly with snapshot #596

Comments

matthewdarwin commented Dec 23, 2022 • edited Loading

stephenpdeos commented Jan 5, 2023

stephenpdeos commented Feb 9, 2023

spoonincode commented Feb 23, 2023

heifner commented Feb 23, 2023

spoonincode commented Feb 23, 2023

greg7mdp commented Feb 28, 2023

heifner commented Jul 24, 2023

matthewdarwin commented Dec 23, 2022 •

edited

Loading