Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client fails to unpack some snapshot archive #1137

Closed
3 tasks done
jpraynaud opened this issue Aug 4, 2023 · 1 comment · Fixed by #1138
Closed
3 tasks done

Client fails to unpack some snapshot archive #1137

jpraynaud opened this issue Aug 4, 2023 · 1 comment · Fixed by #1138
Assignees
Labels
bug ⚠️ Something isn't working critical 🔥 Criticial bug

Comments

@jpraynaud
Copy link
Member

jpraynaud commented Aug 4, 2023

Issue

The client is not able to unpack some archive on the mainnet (previous are restored without problem) and returns an error when running command ./mithril-client snapshot download with the following digests:

  • a7a82ca3734fcfc8f233af20e6a1b78eddc09909ced8f70d293a4a26f832b812
  • 74a70bc7c6346e2dad6d1ee16d402e843dfadbf2f45edf558cd1006a33ada7a1
5/7 - Unpacking the snapshot…  
Error: "An error occured: Could not unpack './snapshot-a7a82ca3734fcfc8f233af20e6a1b78eddc09909ced8f70d293a4a26f832b812.tar.gz' in directory './db'. Error: « failed to iterate over archive »."

Expanded error:

Error: "An error occured: Could not unpack './snapshot-74a70bc7c6346e2dad6d1ee16d402e843dfadbf2f45edf558cd1006a33ada7a1.tar.gz' in directory './db'. Error: « Custom { kind: Other, error: TarError { desc: \"failed to iterate over archive\", io: Custom { kind: Other, error: \"numeric field did not have utf-8 text: ,{��\\n)�3 when getting cksum for \\t�h@�\\u{15}��1�R\\u{6}L��XP\\u{13}/�Ĉ\\u{11}oF�\\nn\\u{f}�����k\\u{16}�`���^��z8��z�\\t�JȚ�%V�-m\\u{3}_�E)�p��\\u{14}�-\\u{4}�\\u{7f}><[��2�\\\\���\\u{6}V0�]\\u{b}\" } } } »."

To do

  • Identify why the produced archive is corrupted
  • Add a verification step in the aggregator after creation of the archive, and before upload
  • Assess the time needed to operate this verification on a mainnet archive and impact on snapshot production: ~20/25 min extra computation vs ~2h30min already need to compute the archive)

Analysis

  • The archive created is corrupted, and it appears that the checksum error that we witness is due to some files evolving during the creation of the archive
  • A verification step can consist of computing the entries list of the archive: this will fail if the archive is corrupted
  • This will avoid users discover the archive is invalid after downloading it
  • We have made tests, and on the target aggregator VM, verifying an archive is an operation that takes up to 20/25 min, given the ~2h30min needed to produce the archive
  • In order to avoid creating unwanted delays for snapshotting new immutable files, we have reduced the number of retries to produce an archive (from 3 to 2)
  • In the long run, we can limit the creation of corrupted archives by modifying slightly our snapshotting algorithm: volatile, ledger state and latest immutable files should be captured in order to provide a valid snapshot (copy in a temp folder, and delete when snapshot is completed)

Later

@jpraynaud jpraynaud added bug ⚠️ Something isn't working dev 💪 labels Aug 4, 2023
@jpraynaud
Copy link
Member Author

It looks like we had already seen this problem in the CI and fixed them in #1023 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ⚠️ Something isn't working critical 🔥 Criticial bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants