paritydb breaks on upgrade to v0.9.18 #5168

tugytur · 2022-03-21T13:20:40Z

Both our paritydb-experimental sync nodes for Kusama and Polkadot failed on the startup with the release v0.9.18.

RUST_BACKTRACE=full on both gave the same output.

WARNING: "paritydb-experimental" database setting is deprecated and will be removed in future releases. Please update your setup to use the new value: "paritydb".
2022-03-21 12:06:51 Parity Polkadot
2022-03-21 12:06:51 ✌️  version 0.9.18-99cd17ddb2-x86_64-linux-gnu
2022-03-21 12:06:51 ❤️  by Parity Technologies <admin@parity.io>, 2017-2022
2022-03-21 12:06:51 📋 Chain specification: Polkadot
2022-03-21 12:06:51 🏷  Node name: dot1paritydb
2022-03-21 12:06:51 👤 Role: FULL
2022-03-21 12:06:51 💾 Database: ParityDb at /var/blockchain/chains/polkadot/paritydb/full
2022-03-21 12:06:51 ⛓  Native runtime: polkadot-9180 (parity-polkadot-0.tx12.au0)
2022-03-21 12:06:54 Essential task `txpool-background` failed. Shutting down service.
2022-03-21 12:06:54 Essential task `transaction-pool-task-0` failed. Shutting down service.
2022-03-21 12:06:54 Essential task `transaction-pool-task-1` failed. Shutting down service.
Error:
   0: Error at calling runtime api: Runtime code error: `:code` hash not found
   1: Runtime code error: `:code` hash not found
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                                 ⋮ 1 frame hidden ⋮                               
   2: polkadot::main::he63b6329d0694b27
      at <unknown source file>:<unknown line>
   3: std::sys_common::backtrace::__rust_begin_short_backtrace::h0b52984c10824fba
      at <unknown source file>:<unknown line>
   4: main<unknown>
      at <unknown source file>:<unknown line>
   5: __libc_start_main<unknown>
      at <unknown source file>:<unknown line>
   6: _start<unknown>
      at <unknown source file>:<unknown line>
Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.

both on Kusama and Polkadot the DBs were synced from scratch with the release 0.9.17-rc4. Same for the rocksdb servers, but those had no issues updating. Sync servers have the following relevant arguments --unsafe-pruning \ --pruning 1000 \

The bricked DBs are available here:

SHA256 URL:

S3 URI:

s3://euw1.parasnaps.io/polkadot_paritydb-experimental_bricked.tar
s3://euw1.parasnaps.io/kusama_paritydb-experimental_bricked.tar

HTTP URL:

The working DB that breaks during upgrade are the ones currently available at https://parasnaps.io . These are the exact DBs that broke during upgrade from ~12hours ago.

The text was updated successfully, but these errors were encountered:

bkchr · 2022-03-21T14:51:36Z

Cc @arkpar @cheme

cheme · 2022-03-21T14:53:26Z

Yes it is software bug, the database version in metadata is being upgraded while it should not have, I am looking at it.

cheme · 2022-03-21T14:56:46Z

Re-synching is safe. Also manually setting the db_version in metadata to 5 seems to fix the issue. (the fact that it upgrade to 6 is probably from a parsing error and metadata getting rewritten, setting manually the rewritten metadata to 5 on the rebuild metadatat work then).

cheme · 2022-03-21T15:12:00Z

Ok, that's it, https://github.com/paritytech/substrate/blob/1bd5d786fda99d5a0c27c339226789ac047ea4ae/client/db/src/utils.rs#L279 should not be allowed.
Here what did happen is that the columns definition of transaction storage change, and there was also a new column, so validation did fail resulting in rewrite of metadata with 'db_version=6', which breaks the state access (trying to use salt on key as done with v6).

@tugytur , sorry for the bug.
You don't need to resynch the snaps, editing "metadata" file and setting db_version to 5 should fix this.
Resynching is not useless though (db will be version 6), but really not necessary (a migration could be written at some point to update the db).

tugytur · 2022-03-21T15:25:31Z

@cheme thanks!

Can confirm that changing db_version to 5 fixes it.

I like to sync from scratch every time there are any changes to the db.
We will provide version 6 snapshots as soon as it's synced.

PolkachuIntern · 2022-03-21T15:27:22Z

I confirm that changing the db version from 6 to 5 on metadata file has fixed the issue

tugytur · 2022-03-21T21:19:04Z

@cheme
I've upgraded one of our Kusama Validators and implemented the db_version fix.
CPU usage was ~30-50% higher than usual. Heartbeat was sent every epoch.

But we were assigned as a Para Validator twice and earned 0 points.

Inbound and outbound request failures (timeout) increased from ~0 to couple thousands request per seconds.
libp2p_out_events_notifications_sizes were constantly growing as well.

rpc call babe_epochAuthorship was hanging as well. Canceled the call after 5mintues of waiting.

In the meantime I finished synching db_version 6.
I've migrated the affected Validator to new db_version snapshot and everything is back to normal. I'll make the snapshot tomorrow publicly available if the Validator runs stable through the night.

cheme · 2022-03-21T21:36:27Z

Thanks for the feedback.
I got an hypothesis the high usage is not from using paritydb version 5, but from a switch of the polkadot parachain db between 0.9.17 and 0.9.18 (which IIUC requires some additional synching).
I know we ran this on validator node before, but it may have been synched from scratch.
CC @ordian , could the describe issue be related?

tugytur · 2022-03-21T23:06:24Z

@cheme last update for tonight.
There seems to be a issue with db_version 6 as well. I'm not sure how much this would affect the Validator but I'm switching back to rocksdb for the time being.

After roughly 20 minutes of running mpsc_finality_notification_stream and mpsc_import_notification_stream suddenly keeps growing (grows to over 500 in under 20 mintues) and during that time the Validator doesn't respond to any rpc calls but it keeps syncing. When stopping the service it happens via SIGKILL after the service runs into a timeout. I did one restart and waited to be sure that this isn't a random bug.

Edit: Maybe I jumped to a conclusion here. After the second restart this did not happen again. I'll not switch to rocksdb for now and hope we have more data by the morning.

Edit2: It happened again. I've switched to rocksdb. I've kept the paritydb in case you would like to have a look at it.

cheme · 2022-03-22T08:47:00Z

@tugytur , if you have log available, it can be interesting (even if I have little hope to see something, I am currently running node without much logged).
I will push the version fix and then ask to rerun the tests we did on validator previously, will also prepare a branch which keep using rocksdb for as parachain db: could be a simple way so spot if issue is related to the switch of db at this level.

cheme · 2022-03-22T08:48:45Z

I forgot, @tugytur what are your launch parameters?

tugytur · 2022-03-22T11:49:28Z

@cheme Logging is just the default enabled and there is just the default log messages. Nothing indicating an issue.

Launch parameters:

ExecStart=/usr/local/bin/polkadot \
        --name "XXXX" \
        --validator \
        --unsafe-pruning \
        --pruning 1000 \
        --no-mdns \
        --public-addr=/ip4/X.X.X.X/tcp/30333 \
        --listen-addr=/ip4/X.X.X.X/tcp/30333 \
        --chain=kusama \
        --database=paritydb \
        --prometheus-port 9615 \
        --rpc-port 9933 \
        --ws-port 9944 \
        --telemetry-url 'wss://telemetry-backend.w3f.community/submit 1' \
        --base-path=/var/blockchain

paulormart · 2022-03-22T14:42:17Z

Hi @cheme, i faced the same issue yesterday and collected some logs, although the full trace was not active. there are some errors logged at the time when the DB crashed.

You can find the logs here, hope it helps.

cheme · 2022-03-22T14:45:27Z

Thanks a lot, I also got into some issue running on our simulation test net.
End of day I will probably propose to revert to using rocksdb for the parachain db, so using paritydb-experimental could be restored, and the issue fix in a correct way.

Edit: issue observed with simnet are actually not that clear (need to run again).

gituser · 2022-04-07T09:52:16Z

I have the same issue but on parity v0.9.18 + RocksDB:

RUST_BACKTRACE=full COLORBT_SHOW_HIDDEN=1 ./polkadot --name polkadot --chain=polkadot -d /home/polkadot/.polkadot --rpc-external --ws-external --rpc-cors all --pruning=100000 --disable-log-color --no-private-ipv4
2022-04-07 12:58:23 Parity Polkadot    
2022-04-07 12:58:23 ✌️  version 0.9.18-ef71ed8bae-x86_64-linux-gnu    
2022-04-07 12:58:23 ❤️  by Parity Technologies <admin@parity.io>, 2017-2022    
2022-04-07 12:58:23 📋 Chain specification: Polkadot    
2022-04-07 12:58:23 🏷  Node name: polkadot    
2022-04-07 12:58:23 👤 Role: FULL    
2022-04-07 12:58:23 💾 Database: RocksDb at /home/polkadot/.polkadot/chains/polkadot/db/full    
2022-04-07 12:58:23 ⛓  Native runtime: polkadot-9180 (parity-polkadot-0.tx12.au0)    
2022-04-07 12:58:43 Essential task `txpool-background` failed. Shutting down service.    
Error: 
   0: Error at calling runtime api: Runtime code error: `:code` hash not found
   1: Runtime code error: `:code` hash not found

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   1: color_eyre::config::EyreHook::into_eyre_hook::{{closure}}::h588582e1cf7d8b5b
      at <unknown source file>:<unknown line>
   2: eyre::capture_handler::h9f982628d969531f
      at <unknown source file>:<unknown line>
   3: polkadot::main::hdaa1adbaf9fcd406
      at <unknown source file>:<unknown line>
   4: std::sys_common::backtrace::__rust_begin_short_backtrace::h2902248ec0ab9496
      at <unknown source file>:<unknown line>
   5: std::rt::lang_start::{{closure}}::h57dcbb249ed8206e
      at <unknown source file>:<unknown line>
   6: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h443f738a8e9f947a
      at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ops/function.rs:259
   7: std::panicking::try::do_call::h1e21ba261ba489ec
      at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:406
   8: std::panicking::try::h6afd48af8b6c96ac
      at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:370
   9: std::panic::catch_unwind::h85dd95e0bab7fb60
      at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panic.rs:133
  10: std::rt::lang_start_internal::{{closure}}::h038455e697c8b03e
      at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/rt.rs:128
  11: std::panicking::try::do_call::h6b0ad65979f3077a
      at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:406
  12: std::panicking::try::h010108d314169ac6
      at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:370
  13: std::panic::catch_unwind::hff397f912b1535c2
      at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panic.rs:133
  14: std::rt::lang_start_internal::h52e73755f77c7dd9
      at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/rt.rs:128
  15: main<unknown>
      at <unknown source file>:<unknown line>
  16: __libc_start_main<unknown>
      at <unknown source file>:<unknown line>
  17: _start<unknown>
      at <unknown source file>:<unknown line>

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.

There is no metadata file at all but there is a db_version file which has 4 (without new line), I've tried setting it to 5 but it says to me that db version is in the future.

Any ideas how to fix this without resync? @cheme

cheme · 2022-04-07T10:20:38Z

About this issue:

there was a versioning bug in paritydb, this was fixed in master. (@paulormart and @tugytur did experience it). The bug does not touch synch from scratch, and there is a work around to avoid resynching.
@tugytur did experience some other issue with paritydb, I tried to reproduce it, seen some things, but they were false positives (I did warpsync and peaks were background synch). At this point we are in the process of switching our validators to paritydb to assert this issue, but currently I am not aware of negative feedback from devops.

@gituser in your case this should be a different issue (using rocksdb), the error indicates that some state value (the runtime code) cannot be accessed. My first idea would be that the rocksdb got corrupted somehow, I don't have ideas of how to fix this without resync.
Maybe opening a new issue, other devs can relate to it.

gituser · 2022-04-07T11:21:29Z

@cheme
I've pasted here because I had similar error.
The server froze (I think it's something to do with CPU on Ryzen it got stuck) and then Polkadot DB after reset got corrupted.

cheme · 2022-04-07T12:10:13Z

@cheme I've pasted here because I had similar error. The server froze (I think it's something to do with CPU on Ryzen it got stuck) and then Polkadot DB after reset got corrupted.

Sure totally make sense, I am not saying this should be ignored. This is quite valuable information.
But not related to the initial issue (paritydb) so the db_version workaround cannot work.
Your information about experiencing sever freezing with rocksdb on 0.9.18 is also good feedback.

alko89 · 2022-04-07T13:27:26Z

@gituser this seemed to happen to me as well about a week ago with RocksDB, I opened an issue here #5210 but got closed since I can't provide any relevant data

Edit: I'm on Intel CPU

gituser · 2022-04-09T17:48:39Z

@alko89 I ended up re-syncing from scratch. I'm on AMD though.

cheme · 2022-05-18T16:31:40Z

paritytech/parity-db#68 fix a deadlock that may explain this issue.
Today 0.9.22 release include this fix.
Will close the issue if we observe our rococo validators working properly (not yet upgraded).

nuke-web3 · 2022-06-12T21:13:06Z

@cheme Assuming things ar all good now? Please let me know if not!

cheme self-assigned this Mar 21, 2022

cheme added Z0-unconfirmed I3-bug Fails to follow expected behavior. labels Mar 21, 2022

cheme mentioned this issue Mar 21, 2022

Do not upgrade db_version on metadata reading failure. paritytech/substrate#11081

Merged

dvdplm mentioned this issue Aug 24, 2023

Integration test for DB upgrades paritytech/polkadot-sdk#851

Open

cheme mentioned this issue Apr 7, 2022

Kusama node suddenly hanged #5210

Closed

wischli mentioned this issue Apr 22, 2022

feat: upgrade to Polkadot v0.9.19 KILTprotocol/kilt-node#357

Merged

8 tasks

nuke-web3 closed this as completed Jun 12, 2022

tugytur mentioned this issue Sep 26, 2022

Member Request polkadot-fellows/seeding#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paritydb breaks on upgrade to v0.9.18 #5168

paritydb breaks on upgrade to v0.9.18 #5168

tugytur commented Mar 21, 2022

bkchr commented Mar 21, 2022

cheme commented Mar 21, 2022

cheme commented Mar 21, 2022 •

edited

Loading

cheme commented Mar 21, 2022 •

edited

Loading

tugytur commented Mar 21, 2022

PolkachuIntern commented Mar 21, 2022

tugytur commented Mar 21, 2022

cheme commented Mar 21, 2022

tugytur commented Mar 21, 2022 •

edited

Loading

cheme commented Mar 22, 2022

cheme commented Mar 22, 2022

tugytur commented Mar 22, 2022

paulormart commented Mar 22, 2022

cheme commented Mar 22, 2022 •

edited

Loading

gituser commented Apr 7, 2022 •

edited

Loading

cheme commented Apr 7, 2022 •

edited

Loading

gituser commented Apr 7, 2022

cheme commented Apr 7, 2022

alko89 commented Apr 7, 2022 •

edited

Loading

gituser commented Apr 9, 2022

cheme commented May 18, 2022 •

edited

Loading

nuke-web3 commented Jun 12, 2022

paritydb breaks on upgrade to v0.9.18 #5168

paritydb breaks on upgrade to v0.9.18 #5168

Comments

tugytur commented Mar 21, 2022

bkchr commented Mar 21, 2022

cheme commented Mar 21, 2022

cheme commented Mar 21, 2022 • edited Loading

cheme commented Mar 21, 2022 • edited Loading

tugytur commented Mar 21, 2022

PolkachuIntern commented Mar 21, 2022

tugytur commented Mar 21, 2022

cheme commented Mar 21, 2022

tugytur commented Mar 21, 2022 • edited Loading

cheme commented Mar 22, 2022

cheme commented Mar 22, 2022

tugytur commented Mar 22, 2022

paulormart commented Mar 22, 2022

cheme commented Mar 22, 2022 • edited Loading

gituser commented Apr 7, 2022 • edited Loading

cheme commented Apr 7, 2022 • edited Loading

gituser commented Apr 7, 2022

cheme commented Apr 7, 2022

alko89 commented Apr 7, 2022 • edited Loading

gituser commented Apr 9, 2022

cheme commented May 18, 2022 • edited Loading

nuke-web3 commented Jun 12, 2022

cheme commented Mar 21, 2022 •

edited

Loading

cheme commented Mar 21, 2022 •

edited

Loading

tugytur commented Mar 21, 2022 •

edited

Loading

cheme commented Mar 22, 2022 •

edited

Loading

gituser commented Apr 7, 2022 •

edited

Loading

cheme commented Apr 7, 2022 •

edited

Loading

alko89 commented Apr 7, 2022 •

edited

Loading

cheme commented May 18, 2022 •

edited

Loading