Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bank's call to account::squash() causes assert on testnet #3425

Closed
pgarg66 opened this issue Mar 21, 2019 · 11 comments
Closed

Bank's call to account::squash() causes assert on testnet #3425

pgarg66 opened this issue Mar 21, 2019 · 11 comments
Assignees

Comments

@pgarg66
Copy link
Contributor

pgarg66 commented Mar 21, 2019

Problem

The fullnode is crashing with the following assert

thread 'solana-replay-stage' panicked at 'assertion failed: !self.account_locks.lock().unwrap().contains_key(&fork)', runtime/src/accounts.rs:995:9
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:70
   2: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:58
             at src/libstd/panicking.rs:200
   3: std::panicking::default_hook
             at src/libstd/panicking.rs:215
   4: solana_metrics::metrics::set_panic_hook::{{closure}}::{{closure}}
   5: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:482
   6: std::panicking::begin_panic
   7: solana_runtime::accounts::Accounts::squash
   8: solana_runtime::bank::Bank::squash
   9: solana::bank_forks::BankForks::set_root

This has been observed on V0.12 but should apply to master branch as well.
This PR comments out the offending call in V0.12 for beacons release.
#3424

Proposed Solution

Reassess the crash on master branch. If it's a contained fix, pick it up to v0.12 and revert the change that's commenting out the call to squash.

@pgarg66 pgarg66 added this to the Grandview v0.13.0 milestone Mar 21, 2019
@pgarg66
Copy link
Contributor Author

pgarg66 commented Mar 21, 2019

@sakridge , FYI

@carllin
Copy link
Contributor

carllin commented Mar 21, 2019

hmmmm this assert implies that an account for fork wasn't unlocked...so there are still transactions running on an old frozen bank?

@sakridge
Copy link
Member

@carllin that seems to be the case.

@carllin
Copy link
Contributor

carllin commented Mar 21, 2019

So PohRecorder should clear its bank once you've voted on the next bank, which means no new transactions should be attempted to be played on that old bank because the reference changes here: https://github.com/solana-labs/solana/blob/master/core/src/banking_stage.rs#L368. It could be possible, however that the queue of transactions is so long that we're still attempting to process them by the time we try to freeze and squash this bank.

@carllin
Copy link
Contributor

carllin commented Mar 21, 2019

So this check here makes sure we don't process transactions after we detect the bank has been cleared from PohRecorder: https://github.com/solana-labs/solana/blob/master/core/src/banking_stage.rs#L363. So for transactions to still be running, that must mean there is one ongoing iteration of this loop here: https://github.com/solana-labs/solana/blob/master/core/src/banking_stage.rs#L362, that isn't completed even after 32 slots have gone by. Ouch.

@sambley sambley mentioned this issue Mar 22, 2019
5 tasks
@sambley
Copy link
Contributor

sambley commented Mar 22, 2019

it doesn't look like the transactions are pending...probably looks like the some accounts are not getting unlocked in certain condition, still looking to narrow down the issue

@aeyakovenko
Copy link
Member

#3447 i think covers this crash too

@pgarg66
Copy link
Contributor Author

pgarg66 commented Mar 22, 2019

I can cherry-pick it and try it on v12

@sambley
Copy link
Contributor

sambley commented Mar 23, 2019

still seeing the crash on tip which has #3447 merged

@aeyakovenko
Copy link
Member

@sambley, same here. It fixed another race for me though.

@sambley
Copy link
Contributor

sambley commented Mar 23, 2019

Uploaded fix in #3458

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants