-
Notifications
You must be signed in to change notification settings - Fork 2.7k
How to prevent and deal with unexpected epoch changes #4464
Comments
One invariant of BABE is that you have to have at least one block every |
Thanks for the quick reply Rob. Is a hard spoon the only current way to recover from a situation like that? |
Short follow up @rphmeier – is there any way to recover from a pricked chain as described above? Or is a hard spoon the only option? Thanks! |
Yeah, hard spooning is the best way to solve it. You could also hack something together where you pick up the chain in a sandbox where all the nodes think they're in the past, and time is running faster than normal. There's probably a way to do that that involves manually issuing 1 block every |
Cool, thanks for your reply. Is the code that did time warping on Kusama visible somewhere? |
cc @andresilva - I'd imagine there are some ugly hacks in there we might not want to propagate |
https://github.com/paritytech/substrate/compare/andre/polkadot-master Relevant changes for the time warp are in files:
|
BTW in the code above we do a warp factor of 6x with a block time of 6s, which means that during the warp you'll end up with 1s block times. For any chain deployed on the public internet this is probably too fast and will trigger some other issues (nodes will start forking). I'd recommend warping with a factor that keeps block times at most as low as 2s. |
That’s very useful, thanks @andresilva 🙏 |
Hi, @andresilva thank you! I made a patch like this, but I got I'm sure it was build in release and I try |
darwinia-network/darwinia@ce4ce7a#diff-d2bd7cb542d4338238e9e420d754e1cbR72 Maybe use 3x here? |
Still got this. Btw, how to roll back to a specify block? I sync from stuck node and ctrl-c at a previous block to get the db. |
https://github.com/paritytech/substrate/blob/master/client/consensus/slots/src/lib.rs#L247 Print this to figure out how much time is being given to block execution, maybe there's some calculation which is making this too little. You can revert the chain using the None of these operations are simple and require in-depth knowledge of how substrate works. If you don't have much luck with my instructions I'd suggest going for a "hard-spoon" instead. |
Thanks a million. |
Hi, @andresilva! The I was on an old substrate version. I found this: So I just ignore the
|
This code will make sure the slot is always "stuck" until your local clock reaches the |
Yes, I set it to 9:50 and I boot nodes at 9:20. Everything goes well. Thanks again! |
Great! 👍😃 |
I have this error every time I start my virtual machine after a long time of being off.
Nothing resolves this error except doing purge with |
@Muhammad-Altabba Use AuRa instead of BABE for your development chain, it's what's used in |
Thanks @andresilva, |
That is documentation for the parity-ethereum (https://github.com/paritytech/parity-ethereum) client, not Substrate. My suggestion is to look at the |
Issue
In our testnet at Centrifuge, we have observed 2 chain halts so far due to unexpected epoch changes. This error seems to happen whenever blocks are not produced for a whole epoch.
Context
We are using relatively short epochs for testing purposes (30 slots a 6 seconds => 5 minutes), and had only 2 validators that were down before the errors showed up.
Questions
unexpected epoch change
errors?unexpected epoch change
error in case they happen on mainnet?Reproducible example
36c7fcfa24f54b06bb4c7a32e7627be35bdd80ef
:cargo build --release
./target/release/substrate --dev
./target/release/substrate --dev
Full output when continuing the dev testnet:
The text was updated successfully, but these errors were encountered: