Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btc/spv: Unable to synchronize wallet panic. #1690

Closed
JoeGruffins opened this issue Jun 30, 2022 · 11 comments · Fixed by #2396
Closed

btc/spv: Unable to synchronize wallet panic. #1690

JoeGruffins opened this issue Jun 30, 2022 · 11 comments · Fixed by #2396
Labels
bug bug or bugfix wallet Client wallet backends

Comments

@JoeGruffins
Copy link
Member

I think the circumstances are important here.

I have been testing coins on testnet. I have two clients set up with a few wallets each. At this time I just wanted to test on the dcr_eth market, so that's what I set up the server to have on it. The clients appeared to have started up fine, but after a short time I got a panic related to the btc wallet, which I didn't expect to use. For some reason this only happened on one client as well, although both have btc wallets.

The panic:

panic: unable to synchronize wallet to chain: unable to perform wallet recovery: database not open

goroutine 1910 [running]:
github.com/btcsuite/btcwallet/wallet.(*Wallet).handleChainNotifications(0xc000d1cb40)
        /home/joe/go/pkg/mod/github.com/btcsuite/btcwallet@v0.14.1-0.20220322205928-94c51d7bdbf9/wallet/chainntfns.go:117 +0x1096
created by github.com/btcsuite/btcwallet/wallet.(*Wallet).SynchronizeRPC
        /home/joe/go/pkg/mod/github.com/btcsuite/btcwallet@v0.14.1-0.20220322205928-94c51d7bdbf9/wallet/wallet.go:211 +0x24c

The logs for that client:
dexc1.txt

@JoeGruffins
Copy link
Member Author

JoeGruffins commented Jun 30, 2022

All built on current master c68b82f .

@LasTshaMAN
Copy link
Contributor

LasTshaMAN commented Jul 2, 2022

Just some things I've spotted,

BTC SPV wallet runs SynchronizeRPC func where btcwallet panics explicitely (which isn't ideal way to handle errors IMO exactly because of this contagion effect we are observing with this issue)

Its hard to tell where exactly DEX initiates wallet connection (panic stacktrace provided above is cut off too early), but looks like could be from many places,

couple solutions come to mind:

  • SynchronizeRPC call (or even (w *spvWallet) connect, to be future-proof) could be wrapped into recover
  • since that's not the most graceful way to hanle error, maybe upsteam fix could be proposed to btcwallet guys (they've stumbled upon similar panics couple of times already here and here but didn't seem to bother handling it better so far)

@chappjc
Copy link
Member

chappjc commented Jul 2, 2022

Thanks for researching that @LasTshaMAN! Seems like a historically common cause (other than the fundamental design of panicking, which I agree isn't great) is temporary compatibility issues between btcd and btcwallet. I wouldn't think it would matter much in our case since this isn't RPC stuff (not even running btcd), but we could try to repro with our btcsuite deps updated to the latest release, which was just a few weeks ago:

16a7760

We'd want to update these before we release v0.5 anyway, so might as well do it now.

@JoeGruffins
Copy link
Member Author

SynchronizeRPC call (or even (w *spvWallet) connect, to be future-proof) could be wrapped into recover

Probably should do this.

In this case I started dexc again and was fine afterwards. So, disconnect and reconnect if panic for btc spv maybe good?

@chappjc
Copy link
Member

chappjc commented Jul 4, 2022

SynchronizeRPC call (or even (w *spvWallet) connect, to be future-proof) could be wrapped into recover

Possibly. The handleChainNotifications is launched as a goroutine (async) and the panic is after SynchronizeRPC has returned, so I think we'd have to introduce yet another goroutine that just defers a recover, runs SynchronizeRPC, signals back to the caller like startWallet when SynchronizeRPC returns so startWallet can return too, and waits on btcw.WaitForShutdown() before returning.

disconnect and reconnect if panic for btc spv maybe good

Not sure. Could get into an ugly reconnect loop. Might consider simply putting a w.stop in the recover.

One thing I'm noticing from your dexc logs is a second call to /api/login that seemed to trigger unloading of the wallet before trying to load it again. I've noted before (cannot find my comments) that logging in twice is a little wonky and we should reexamine (*Core).Login from the perspective of an already logged in Core.

The second login with no prior logout:

2022-06-30 02:18:53.558 [INF] CORE[btc][SPV]: Unloading wallet
2022-06-30 02:18:53.558 [ERR] CORE[btc][SPV]: UnloadWallet error: wallet is not loaded
2022-06-30 02:18:53.558 [TRC] CORE[btc][SPV]: Stopping neutrino client chain interface
2022-06-30 02:18:53.558 [TRC] CORE[btc][SPV]: Stopping neutrino chain sync service
2022-06-30 02:18:53.661 [TRC] CORE[btc][SPV]: Stopping neutrino DB.
2022-06-30 02:18:53.675 [INF] CORE[btc][SPV]: SPV wallet closed
2022-06-30 02:18:53.675 [WRN] CORE: Bitcoin wallet shut down before sync completed.
2022-06-30 02:18:53.675 [ERR] CORE: Unable to connect to btc wallet (start and sync wallets BEFORE starting dex!): connectWallet: connect failure: couldn't load wallet: timeout
2022-06-30 02:18:53.676 [ERR] CORE: notify: |ERROR| (walletconfig) Wallet connection warning - connectWallet: connect failure: couldn't load wallet: timeout
2022-06-30 02:18:53.676 [INF] CORE: Connected to 2 of 3 wallets.
2022-06-30 02:18:53.676 [INF] CORE: Loaded 0 active orders.
2022-06-30 02:18:53.676 [INF] CORE: Loaded 3 active match orders
2022-06-30 02:18:53.676 [ERR] CORE: Active 15a4676b202a885543ca8e66fb30268f901c66af4537a383c707d81678d569bb order retrieved for unknown market dcr_btc
2022-06-30 02:18:53.676 [ERR] CORE: Active 6e199556ce218dc8579af8b6112225941f496cf293e49ed410785286db495d0a order retrieved for unknown market dcr_btc
2022-06-30 02:18:53.676 [ERR] CORE: Active 81a041a54be78391410c950d54f8dd0eb14d1c32a3277d41b6558d47939d047c order retrieved for unknown market dcr_btc
2022/06/30 11:18:53 "POST http://127.0.0.2:5758/api/login HTTP/1.1" from 127.0.0.1:44708 - 200 25720B in 1m0.174221648s

I think we can try to catch the explicit panics from SynchronizeRPC, but the root cause seems to have to do with repeat logins or possibly abrupt shutdown of the wallet.

@chappjc
Copy link
Member

chappjc commented Jul 4, 2022

Another interesting thing from your logs is that the two /api/login calls took an extremely long time to return (race build?) and looks like the second requests was issued before the first completed:

2022-06-30 02:17:29.445 [DBG] WEB[WS]: New websocket client 127.0.0.1:56624
2022-06-30 02:17:33.131 [INF] CORE: Connecting wallet for dcr
2022-06-30 02:17:33.131 [INF] CORE: Connecting wallet for eth
2022-06-30 02:17:33.131 [INF] CORE: Connecting wallet for btc
2022-06-30 02:17:33.132 [DBG] CORE[btc][SPV]: Starting native BTC wallet...
2022-06-30 02:17:33.137 [INF] CORE[dcr][RPC]: Connected to dcrwallet (JSON-RPC API v8.9.0) proxying dcrd (JSON-RPC API v7.0.0) on testnet3
2022-06-30 02:17:33.139 [TRC] CORE: New peer count for asset dcr: 8
2022-06-30 02:17:33.234 [INF] CORE[eth][ETH]: Connected to geth, at height 7015095
...
<login not done, but the /login page is loaded again here>
2022/06/30 11:17:50 "GET http://127.0.0.2:5758/login HTTP/1.1" from 127.0.0.1:44708 - 200 4125B in 171.231µs
<second /api/login POST request starts about here>
...
2022-06-30 02:17:53.599 [INF] CORE: Connecting wallet for btc *** NOTE: DUP!
2022-06-30 02:17:53.599 [DBG] CORE[btc][SPV]: Starting native BTC wallet...
...
2022-06-30 02:17:56.914 [DBG] CORE[btc][SPV]: Starting neutrino chain service...
2022-06-30 02:17:56.926 [INF] CORE[btc][SPV]: Synchronizing wallet with network...
2022-06-30 02:17:56.926 [INF] CORE[btc]: Connected wallet with current best block 000000000000004589d72ffaa5291c79f1c7d8b1ef5f157cf2eb72870b550ec1 (2254565)
...
2022-06-30 02:17:56.954 [INF] CORE: Connected to 3 of 3 wallets.
...
2022-06-30 02:17:56.964 [DBG] CORE: Authenticated connection to 127.0.0.1:17273, acct 1aa4aec190d355ead10b956fed232f8329a3234a2062e467b97e1b2727c291aa, 0 active orders, 0 active matches, score -35 (suspended = false)
...
<first /api/login completes after about 24 seconds>
2022/06/30 11:17:56 "POST http://127.0.0.2:5758/api/login HTTP/1.1" from 127.0.0.1:56622 - 200 25728B in 23.875703767s
...
2022-06-30 02:18:53.558 [INF] CORE[btc][SPV]: Unloading wallet
2022-06-30 02:18:53.558 [ERR] CORE[btc][SPV]: UnloadWallet error: wallet is not loaded
2022-06-30 02:18:53.558 [TRC] CORE[btc][SPV]: Stopping neutrino client chain interface
2022-06-30 02:18:53.558 [TRC] CORE[btc][SPV]: Stopping neutrino chain sync service
2022-06-30 02:18:53.661 [TRC] CORE[btc][SPV]: Stopping neutrino DB.
2022-06-30 02:18:53.675 [INF] CORE[btc][SPV]: SPV wallet closed
2022-06-30 02:18:53.675 [WRN] CORE: Bitcoin wallet shut down before sync completed.
2022-06-30 02:18:53.675 [ERR] CORE: Unable to connect to btc wallet (start and sync wallets BEFORE starting dex!): connectWallet: connect failure: couldn't load wallet: timeout
2022-06-30 02:18:53.676 [ERR] CORE: notify: |ERROR| (walletconfig) Wallet connection warning - connectWallet: connect failure: couldn't load wallet: timeout
2022-06-30 02:18:53.676 [INF] CORE: Connected to 2 of 3 wallets.
...
<second /api/login completes after 1 minute>
2022/06/30 11:18:53 "POST http://127.0.0.2:5758/api/login HTTP/1.1" from 127.0.0.1:44708 - 200 25720B in 1m0.174221648s

So, two concurrent login requests looks to be a big problem.

@JoeGruffins
Copy link
Member Author

race build?

No.

Just logging in with the web server can take a very long time for me.

@ukane-philemon
Copy link
Contributor

Just logging in with the web server can take a very long time for me.

related. #1636

@buck54321
Copy link
Member

I'm contemplating a solution here. Can we wrap SynchronizeRPC in a recover, and on panic, if 1) mainnet, then issue a special error through the tipChange channel. Core should disable the wallet and issue an error notification (and possibly shut down?). If 2) testnet or simnet, just log it and move along.

@buck54321 buck54321 added bug bug or bugfix client labels Sep 15, 2022
@chappjc
Copy link
Member

chappjc commented Sep 15, 2022

The issue there is that the panic is after SynchronizeRPC has returned.

The handleChainNotifications is launched as a goroutine (async) and the panic is after SynchronizeRPC has returned, so I think we'd have to introduce yet another goroutine that just defers a recover, runs SynchronizeRPC, signals back to the caller like startWallet when SynchronizeRPC returns so startWallet can return too, and waits on btcw.WaitForShutdown() before returning.

Although having tried the above, it doesn't work. We can't recover a panic in a wild goroutine.

I think this issue is less likely to happen now that we are blocking concurrent login requests. #1761

@buck54321
Copy link
Member

We can't recover a panic in a wild goroutine.

Ah. I see now. Damn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bug or bugfix wallet Client wallet backends
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants