-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fulcrum 1.9.5 terminates with uncaught exception #214
Comments
Hmm. This should never happen, yet I believe you that it did. I will investigate. Thanks so much for the bug report. Question: Do you think it was some 1-off or does it happen with any regularity on testnet? Do you have any more of the log preceding this ? Did a block indeed arrive at that time ? Anyway this requires more investigation from me. Thanks for the report. |
Ok, I managed to reproduce the error. Still not sure why inputs were missing on your testnet but this exposed a bug in the synch code (a race condition). I have a fix. I will push it. Are you getting this error often? Would you be able to try my fix on master? |
Good news. I only recently updated to 1.9.5 (from 1.7.0) and was testing it out on Testnet before migrating it to Mainnet. It ran for about a 5 days before the error plus I didn't notice for a few days. Since I restarted (1 day) - no subsequent errors.
I'm sorry I don't understand what your asking, but am happy to help? What does master refer to? |
There was a race condition to the SynchMempoolTask. See issue #214. It could lead to a situation where the prefetcher thread was joined by 2 threads at once (bad!). This has now been addressed and also failures to read inputs from mempool and/or DB are now handled much more robustly (due to block-only txns + new block arriving as we are synching the mempool -- rare corner case!). Fixes issue #214.
Well it's good to know it's rare but it definitely was a bug and I'm super-glad you contacted me! This was def. a race condition. Darn. I may need to do a release very soon.
Oh.. I meant like can you build from the current latest github code at this repo. If you like I can test this here overnight on BTC mainnet (which is very active and rapes mempool synch code typically).. and do a release tomorrow. Or.. if you know how to build (I presume you do since you are on mac??) -- maybe you can pull the latest code from this repo and build and test it too? |
IC, thanks for the explanation. Sure, I can definitely do this on Testnet, and if I have time I'll see if I can back up the Fulcrum data on mainnet and run the "head/master" there as well. Thanks for the quick investigation. Bugs that should never happen are always the most interesting...in the end. |
Yeah this particular bug is more likely to happen on testnet -- it requires a "block only" txn to arrive in a block, and then a mempool txn to reference it, just at the precise time. On mainnet, "block-only" txns (txns that were never in mempool) are very rare... At least I think that's what happened. Anyway yeah testing on both would be appreciated if you have time! I'm already testing here on mainnet... (although I never managed to see this error since it's extremely rare and unlikely to happen..). You got lucky! |
I think I will do a release ASAP tbh. It bugs me that this bug can happen, and the fixed commit is at least as good as or better than what is currently released. Closing this issue for now. |
While compiling 1.9.7 on Ubuntu 20.04.6 LTS as follows:
I get:
|
Ambiguous overload. See: #214 (comment)
Grr why do compilers hate me so. I pushed a commit to address this: 6d4026a |
That did it! Thanks a lot, it's running. |
Update on 1.9.6+. Testnet has been running for 2 days w/o issue - no asserts. I have not seen the asserts metntioned in #141 now that I updated Mainnet. But I do see these variants of Tx dropped out of the mempool.
Are these normal and expected given the state of the mempool, that is, benign. Or something else? Thanks |
The first error (1) above is expected to happen occasionally. It's just because it's impossible to synch the mempool 100% correctly in an atomic fashion since it's fast changing on BTC when mempool is full. Sometimes txns drop out that were there a moment ago and sometimes live txns that you just were told were there, refer to other txns that are no longer there. So that error can happen when a txn gets expired or RBF'd and it has children you are busy examining. I see error (1) too on my setup here every once in a while and it's perfectly normal. The other 2 errors are rarer .. i have never seen them happen .. the fact that they happened for you successively makes me curious as to WHY they happened. In all cases the errors are recoverable and Fulcrum eventually settles. I will have to examine the other 2 errors though in more detail.. because something is fishy there. I'm going to re-open this issue and re-assign it again. I hope to get time this weekend or next week to work on this. |
I'm actually going to close this issue and i opened #216 that references your comment. |
I just updated to 1.9.5 and was running Fulcrum on BTC Testnet. After a "Failed to Find Previous Transaction" (same as issue #141??, which I still get a lot of on an older version of Fulcrum) an exception propagates to Qt which terminates Fulcrum.
Monterey 12.7, built with ZeroMQ, no jemalloc.
Terminal Output:
The text was updated successfully, but these errors were encountered: