Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Local transactions are culled from the local memory pool #10228

Closed
peterbitfly opened this issue Jan 22, 2019 · 16 comments
Closed

Local transactions are culled from the local memory pool #10228

peterbitfly opened this issue Jan 22, 2019 · 16 comments
Labels
A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust.
Milestone

Comments

@peterbitfly
Copy link

Before filing a new issue, please provide the following information.

  • Parity Ethereum version: 2.2.7-stable
  • Operating system: Linux
  • Installation: built from source
  • Fully synchronized: yes
  • Network: ethereum
  • Restarted: yes

We are sending several thousand transactions each hour via parity. Sometime it happens, that because of the specified gas price the transactions do not get mined immediately. After a while parity reject each new transactions with "Transaction culled". We run parity with the --tx-queue-per-sender 100000 flag & all transactions are submitted via the standard eth_sendTransaction RPC call.

Ideally parity should never cull a local tx from the tx pool.

@jam10o-new
Copy link
Contributor

Chances are that this has something to do with faulty nonce calculation which might happen if you are transacting extremely rapidly. Should be resolved by #10051 which just got merged.

@peterbitfly
Copy link
Author

Hm, we are submitting at most 1 tx every 4-500ms, so not really fast and always one transaction after another (we continue to send the next transaction only after the eth_sendTransaction rpc call has returned.

@jam10o-new jam10o-new added F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. labels Jan 22, 2019
@jam10o-new
Copy link
Contributor

The issue ends up being that if certain transactions completely fail to get mined, it currently messes up nonce calculation for future transactions. You can try master if you want to try the fix mentioned above, or wait for it to be backported to a release.

@peterbitfly
Copy link
Author

Ok, thanks. We will wait till the fix is backported and then report back if the issue has been resolved.

@jam10o-new
Copy link
Contributor

(short term, you can manually specify your nonce when calling sendTransaction)

@jam10o-new jam10o-new added the Z7-duplicate 🖨 Issue is a duplicate. Closer should comment with a link to the duplicate. label Jan 22, 2019
@jam10o-new
Copy link
Contributor

I suggest keeping track of this progress on #9040 - we can reopen if the issue isn't resolved after that one is.

@5chdn 5chdn added this to the 2.4 milestone Feb 7, 2019
@peterbitfly
Copy link
Author

peterbitfly commented Feb 26, 2019

We still experience this issue with Parity-Ethereum v2.3.3, so #10051 did not resolve it for us. @joshua-mir could you please reopen the issue?

@peterbitfly
Copy link
Author

@joshua-mir we are still experiencing this issue (even with version v2.4), could you please reopen it. Thanks!

@peterbitfly
Copy link
Author

We are still experiencing this issue with v2.6.0

@dtran320
Copy link

dtran320 commented Jul 25, 2019

We're also seeing this issue in version Parity-Ethereum/v2.5.5-stable-3ebc769-20190708/x86_64-linux-gnu/rustc1.36.0.

Specifically, we see 4 transactions, submitted at different times, that all got culled at the same second:

2019-07-24 16:28:58 UTC Transaction culled (hash 0x65fdc020148e71a511d6ace53cd03ddfea978b75f26551899543dfa746e14d25)
2019-07-24 16:28:58 UTC Transaction culled (hash 0xd67e6f72b362395cdeffce3e9e73a79aa79292cb60a7cd4dca1a012da23dbde0)
2019-07-24 16:28:58 UTC Transaction culled (hash 0x65158449814b36f3b522a59de7c03282181c7dffbbc0a1e8ad500b27064953c1)
2019-07-24 16:28:58 UTC Transaction culled (hash 0xc0311fe66494b8f505541dc53052a7238abc37f921b38c17f7aa27d10c15bfc9)

@jam10o-new jam10o-new removed the Z7-duplicate 🖨 Issue is a duplicate. Closer should comment with a link to the duplicate. label Jul 25, 2019
@dtran320
Copy link

@ppratscher Have you come up with any workarounds? Do changing up any of —tx-queue-per-sender, --tx-queue-mem-limit, and —tx-queue-size improve the instances of Transaction culled? What's strange is that occasionally we actually see a tx hash that gets culled in the parity logs actually gets mined.

@dvdplm
Copy link
Collaborator

dvdplm commented Aug 13, 2019

A few questions to @ppratscher and @dtran320:

  1. can you confirm that the txs that are culled are never actually mined? If there is a block that imported a tx with the same (sender, nonce) then that'd be a (legit) reason for culling, but if the tx is never mined then there's something odd going on (local tx should not be culled);
  2. approximately how much time passes between submitting the tx and seeing it culled? Unusually long?
  3. it'd be great to get the full txpool=trace logs for the culled txs, from the submission until they're culled – can you help out with that?
  4. if not, can you provide us with a tx hash and (sender, nonce) for one of the failed txs?
  5. can you confirm that no two nodes use the same local keys (because that would certainly cause weird things to happen because of txs propagating between nodes)
  6. can you confirm that you only ever see this behaviour on mainnet, never on any testnet?

Thanks!

@dtran320
Copy link

@dvdplm will get back to you shortly with as many details as possible. We don't store local keys on any of the nodes so we can rule out 5.

@dtran320
Copy link

Parity Ethereum version: Parity-Ethereum/v2.5.5-stable-3ebc769-20190708/x86_64-linux-gnu/rustc1.36.0
Operating system: Linux
Installation: Binary
Fully synchronized: yes
Network: ethereum
Restarted: yes

@dvdplm

Across several mainnet full, non-archive parity (pruning=fast) nodes with the above specs, I tailed the last 50000 lines of /mnt/data/parity.log on each, going back to ~August 9/10 on most of them, I saw 371 lines like:

2019-08-12 02:35:31 UTC Transaction culled (hash 0xc1fa923704b3e0b59a74f3544caa3ec11d411a2ff63c47e92a0ace023c0d78f9)

Ran a python script over these to look for each transaction by calling getTransactionByHash on that node.

{'params': ['0xf489ca620452effffcef1c2f198eb348e7152b2861a89e170b04f81073f53766'], 'jsonrpc': '2.0', 'method': 'eth_getTransactionByHash', 'id': 1}
Tx 0xf489ca620452effffcef1c2f198eb348e7152b2861a89e170b04f81073f53766 not found
{'params': ['0x09ef99ae33eec787dbc0341499659cc02bd413da17b2ab0c5e29b939d577f243'], 'jsonrpc': '2.0', 'method': 'eth_getTransactionByHash', 'id': 2}
Tx 0x09ef99ae33eec787dbc0341499659cc02bd413da17b2ab0c5e29b939d577f243 not found
{'params': ['0x07a01a667fea6d643f7b5b7b367252b20eb4e84a1c5ee54c2ec07d48bf0b7546'], 'jsonrpc': '2.0', 'method': 'eth_getTransactionByHash', 'id': 3}
Tx 0x07a01a667fea6d643f7b5b7b367252b20eb4e84a1c5ee54c2ec07d48bf0b7546 not found
{'params': ['0x43168ac839c5019f7b41dd8aadf81b98cd08e68b499b8a80df054de73d6f5f48'], 'jsonrpc': '2.0', 'method': 'eth_getTransactionByHash', 'id': 4}
Tx 0x43168ac839c5019f7b41dd8aadf81b98cd08e68b499b8a80df054de73d6f5f48 not found
{'params': ['0x65f2d9c62676dab55876453e274a22cbb474ba989683f0674144ac665849f1df'], 'jsonrpc': '2.0', 'method': 'eth_getTransactionByHash', 'id': 5}
Tx 0x65f2d9c62676dab55876453e274a22cbb474ba989683f0674144ac665849f1df not found
{'params': ['0x78a0070b716fe4abc5822e6a60f5c9782df481ca81eb7644a3cafdc5721d78be'], 'jsonrpc': '2.0', 'method': 'eth_getTransactionByHash', 'id': 6}
Tx 0x78a0070b716fe4abc5822e6a60f5c9782df481ca81eb7644a3cafdc5721d78be not found
{'params': ['0xf1c456ac1e811371fcf24ab827e28e8ec85aff3b7c4b413c7f458c24d3a4e56a'], 'jsonrpc': '2.0', 'method': 'eth_getTransactionByHash', 'id': 7}
Tx 0xf1c456ac1e811371fcf24ab827e28e8ec85aff3b7c4b413c7f458c24d3a4e56a not found
Out of 7 culled txs, 0 were mined, 0 are still pending, and 7 can't be found
  1. Can you confirm that the txs that are culled are never actually mined? If there is a block that imported a tx with the same (sender, nonce) then that'd be a (legit) reason for culling, but if the tx is never mined then there's something odd going on (local tx should not be culled);

Out of the 371 transaction hashes across all the nodes nodes, only 1 was mined. Calling getTranactionByHash on the other 370 tx hashes returned null.

  1. approximately how much time passes between submitting the tx and seeing it culled? Unusually long?

The culling either seems to happen pretty soon after the write was sent like this in case or seems to happen in batches quite a long time after. All of these are from a single node:

2019-08-11 16:38:22 UTC Transaction culled (hash 0x303a4ee24d3ebf6788c55162ea2bc223899ee2820a834fd14e02819a0ddddf27)
2019-08-11 16:55:38 UTC Transaction culled (hash 0x0376165744ca89e187a450c911cafdc320b7edbdd5e5f505f9efc4db353a8b88)
2019-08-11 18:12:07 UTC Transaction culled (hash 0x69e2d12da59bc86363dde68970caa5f24d67a7fab3505c8d69b3c233c7d51885)
2019-08-11 19:07:17 UTC Transaction culled (hash 0x67fc3f9dc4d5bdf207f3bf65540aa9395e252a4e5625df130966a8e4f8ac74ea)
2019-08-11 19:38:14 UTC Transaction culled (hash 0x6bc91e930ee5f6e12c9e44f9a9be3666dea319d5f82571f4d7443a1deb100084)
2019-08-12 02:32:08 UTC Transaction culled (hash 0xd234f40737cb774148411303954da6f4ff00ab89230b296a15a2b6f0aecc2653)
2019-08-12 02:32:08 UTC Transaction culled (hash 0x03724f0f58b92ddbc7f5252720c3f6c72e4448ab00e4af622c880a538390153f)
2019-08-12 02:35:31 UTC Transaction culled (hash 0x2e460262c52815c4ef80ebe37b0440329f5be6bf879b9621044cb18bff5aed64)
2019-08-12 02:35:31 UTC Transaction culled (hash 0x259b36f96282219a73c8c8f537a6a80f1583dce4daf9588c487106ab57c9256c)
2019-08-12 02:35:31 UTC Transaction culled (hash 0x8530c3ea495e23533fac1e9cd55d1432f11fd1bae74ef5142f607102312864ce)
2019-08-12 02:35:31 UTC Transaction culled (hash 0xc1fa923704b3e0b59a74f3544caa3ec11d411a2ff63c47e92a0ace023c0d78f9)
TX Hash Time submitted Time culled Delta
0x303a4ee24d3ebf6788c55162ea2bc223899ee2820a834fd14e02819a0ddddf27 2019-08-11 16:38:18 2019-08-11 16:38:22 4 seconds
0x0376165744ca89e187a450c911cafdc320b7edbdd5e5f505f9efc4db353a8b88 2019-08-11 16:55:32 2019-08-11 16:55:38 6 seconds
0x69e2d12da59bc86363dde68970caa5f24d67a7fab3505c8d69b3c233c7d51885 2019-08-11 18:11:38 2019-08-11 18:12:07 29 seconds
0x67fc3f9dc4d5bdf207f3bf65540aa9395e252a4e5625df130966a8e4f8ac74ea 2019-08-11 19:07:11 2019-08-11 19:07:17 6 seconds
0x6bc91e930ee5f6e12c9e44f9a9be3666dea319d5f82571f4d7443a1deb100084 2019-08-11 19:38:11 2019-08-11 19:38:14 3 seconds
0xd234f40737cb774148411303954da6f4ff00ab89230b296a15a2b6f0aecc2653 2019-08-05 10:54:12 2019-08-12 02:32:08 6 days, 15:37:56
0x03724f0f58b92ddbc7f5252720c3f6c72e4448ab00e4af622c880a538390153f 2019-08-05 12:32:45 2019-08-12 02:32:08 6 days, 13:59:23
0x2e460262c52815c4ef80ebe37b0440329f5be6bf879b9621044cb18bff5aed64 2019-08-05 13:54:55 2019-08-12 02:35:31 6 days, 12:40:36
0x259b36f96282219a73c8c8f537a6a80f1583dce4daf9588c487106ab57c9256c 2019-08-05 14:14:48 2019-08-12 02:35:31 6 days, 12:20:43
0x8530c3ea495e23533fac1e9cd55d1432f11fd1bae74ef5142f607102312864ce 2019-08-05 14:24:02 2019-08-12 02:35:31 6 days, 12:11:29
0xc1fa923704b3e0b59a74f3544caa3ec11d411a2ff63c47e92a0ace023c0d78f9 2019-08-05 14:33:17 2019-08-12 02:35:31 6 days, 12:02:14
  1. it'd be great to get the full txpool=trace logs for the culled txs, from the submission until they're culled – can you help out with that?

We don't have this setting on, so will try turning it on on a couple of nodes and get back to you. Is there much performance impact with this setting on?

  1. if not, can you provide us with a tx hash and (sender, nonce) for one of the failed txs?

Adding some logging to get more of these, but here's one that was culled from July 24th:

019-07-24 16:28:58 UTC Transaction culled (hash 0x65fdc020148e71a511d6ace53cd03ddfea978b75f26551899543dfa746e14d25)
Nonce: 18876
GasPrice: 10 gwei
Address: 0xc2be9e536EeE410a720C6440CBAc293f4358451c
  1. can you confirm that no two nodes use the same local keys (because that would certainly cause weird things to happen because of txs propagating between nodes)

We don't store local keys on any of the nodes so we can rule this out. All writes come from JSON-RPC calls to eth_sendRawTransaction.

  1. can you confirm that you only ever see this behaviour on mainnet, never on any testnet?

Found 0 instances of "Transaction culled" on Ropsten and Kovan in the parity logs for ~21,000 writes in the last 7 days. Note that all of our writes on Ropsten and Kovan go to nodes with pruning=archive, whereas for mainnet, we have a separate set of nodes with pruning=archive, but don't send writes to them because we've found that the full nodes with pruning=fast seem to return a response for eth_sendRawTransaction faster than full archive nodes do.

@bananapete
Copy link

Parity Ethereum version: Parity-Ethereum/v2.5.5
Operating system: Linux
Installation: Binary
Fully synchronized: yes

@joshua-mir Hi Joshua

Any news on this bug? We are having an issue which manifests itself with the same symptoms. We are running a private PoA chain with ~35 transactions per minute using the Clique engine, but occasionally we see tx's get culled rather than mined.

With relation to the previously asked questions:

  1. can you confirm that the txs that are culled are never actually mined? If there is a block that imported a tx with the same (sender, nonce) then that'd be a (legit) reason for culling, but if the tx is never mined then there's something odd going on (local tx should not be culled);
    Yes, they are not mined.
  2. Approximately how much time passes between submitting the tx and seeing it culled? Unusually long?
    No, quite the opposite, the logs I have show 2 transactions culled between 4 and 11 seconds after submission.
  3. I have trace logsit'd be great to get the full txpool=trace logs for the culled txs, from the submission until they're culled – can you help out with that?
    Yes - i have the logs from our authority nodes. They are large (~1.2GB) but can get them to you if required.
  4. Can you confirm that no two nodes use the same local keys (because that would certainly cause weird things to happen because of txs propagating between nodes)
    yes - each authority node has a separate local key
  5. Can you confirm that you only ever see this behaviour on mainnet, never on any testnet?
    This is on a private PoA chain, clique engine.

Regards
Pete

@adria0 adria0 added the A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 27, 2020
@adria0 adria0 closed this as completed Jul 27, 2020
@aaroncolaco
Copy link

I'm currently facing this issue on an Aura PoA Chain.

Submitting on average 10 (largish - arrays, bytes) tx every second and after a few hours logs are like:

2020-10-01 12:38:04  Transaction culled (hash 0x5aa608f9037bfcba831a8787f8f199b3f7845d3b1516a5daa82d9182781606e6)
2020-10-01 12:38:04  Transaction culled (hash 0xc925f0875cd8e2d0c51f809170d8cae2297a161ab4466a4eefefae6173829ae6)
2020-10-01 12:38:04  Transaction culled (hash 0x0967bf740a8cb699da14679bb686e68f0dd35210d2273ac50176f79e20d7f672)
2020-10-01 12:38:04  Transaction culled (hash 0xd0a6ba1a78da918cf011dcf1074440ac19e9b455132efc2a141f604a41f509da)
2020-10-01 12:38:04  Transaction culled (hash 0x4d104e1cd1066de76d770226d0e686c3a5ec542e8e88e7c95d29c1244171e6aa)
2020-10-01 12:38:04  Transaction culled (hash 0xc516db70f83ada10c849b337703f55f9cba2c3f027203d1ff1450390267d5622)
2020-10-01 12:38:04  Transaction culled (hash 0x4003758fe6b94e483e4af0af08068c5b413569463500895a5dec954df4c21acf)
2020-10-01 12:38:04  Transaction culled (hash 0xbb23f4cbf32ee8f7a8c358a9b88537c36bcabda20a013308614220f4db2cc672)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

No branches or pull requests

8 participants