-
Notifications
You must be signed in to change notification settings - Fork 130
[PAN-2786] Stop Transaction Pool Queue from Growing Unbounded #1586
[PAN-2786] Stop Transaction Pool Queue from Growing Unbounded #1586
Conversation
- use a `ArrayBlockingQueue` with a fixed size to limit the transaction task queue - expose a method in `MonitoredExecutors` to create a working queue with a maximum capacity - update `EthScheduler` to use a limited working queue for the `txWorkerExecutor`
Wouldn’t this cause exceptions to be thrown or the vertx/netty thread to block once the executor queue fills up? We’d want to discard the oldest messages when the queue is full because they’re likely full off outdated transactions and we want to process the most recent. We also want to ensure there’s good logging and metrics to indemnify when this happens since losing transactions is a bad thing for a lot of cases. Finally the queue size of 128 is far too small. Take a look at the queue sizes and incoming message rate for the MainNet Fast i3.large box - it gets up to 500 incoming messages/sec and only hits trouble when the queue size is in the millions. We probably don’t want to let it get that big as the messages are likely out of date by then anyway but it needs to be a few thousand at least I’d say. Having an idea of how long messages were in that queue would help to set the size appropriately. |
…tx-pool-queue-bound
Yes indeed it will cause exceptions to be thrown once the executor queue fills up. |
I’d say use a bounded queue with a really big size (a million or two), include the time stamp of when the message was first received in the queues task to execute and then when the executor runs the task the first thing it does is check it it’s too old and bail out if so. The speed difference between checking a time stamp and actually validating a transaction is so great that this mechanism will allow catching up on backlogs very quickly without anything too complex. And if for some reason we’re getting so many incoming messages that we can’t even do a long comparison on each fast enough then the client has no hope, we’ll hit the queue size limit and throw loads of exceptions. The client is unlikely to cope at that point but at least we have a lot of error messages telling us why. Definitely worth checking it throws an exception if the queue is full instead of blocking. But the idea of using a custom queue is possibly a good one. If we did that, it would be easy for it to detect that the size limit is reached and remove from the front of the list when adding an item to the end. Logging and monitoring could all be inside the custom queue implementation then too. |
…tx-pool-queue-bound
- implement a custom bounded queue - use a time based policy with keep alive configuration - implement eviction process based on the policy - add metrics
…tx-pool-queue-bound
I have implemented a custom queue that use a time based policy when adding an element at full capacity. A keep alive parameter is used to evict transactions that have been in the working queue too much time. |
ethereum/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/MockExecutorService.java
Outdated
Show resolved
Hide resolved
...m/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/EthSchedulerShutdownTest.java
Outdated
Show resolved
Hide resolved
ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/TimedTask.java
Outdated
Show resolved
Hide resolved
.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java
Outdated
Show resolved
Hide resolved
.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java
Outdated
Show resolved
Hide resolved
.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java
Outdated
Show resolved
Hide resolved
.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java
Outdated
Show resolved
Hide resolved
.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java
Outdated
Show resolved
Hide resolved
.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java
Outdated
Show resolved
Hide resolved
ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/TimedTask.java
Outdated
Show resolved
Hide resolved
ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/EthScheduler.java
Outdated
Show resolved
Hide resolved
.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java
Outdated
Show resolved
Hide resolved
.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java
Outdated
Show resolved
Hide resolved
…tx-pool-queue-bound
- use concrete class instead of interface - change metric name to comply with global policy - update unit test - wrap `Runnable` into `scheduleTxWorkerTask`
invert condition
ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedQueue.java
Outdated
Show resolved
Hide resolved
...m/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedQueueTest.java
Show resolved
Hide resolved
ethereum/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/MockExecutorService.java
Outdated
Show resolved
Hide resolved
ethereum/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/EthSchedulerTest.java
Outdated
Show resolved
Hide resolved
ethereum/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/EthSchedulerTest.java
Outdated
Show resolved
Hide resolved
ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/EthScheduler.java
Outdated
Show resolved
Hide resolved
- remove Mock class - make logic more thread safe, avoid race condition - remove element until the new one is accepted
…manager/EthScheduler.java Co-Authored-By: Adrian Sutton <adrian@symphonious.net>
….com/abdelhamidbakhta/pantheon into feature/pan-2786-tx-pool-queue-bound
…tx-pool-queue-bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple of little nits that would be good to tidy up. Thanks for your patience on this one, timezones have made it take a long time - sorry.
.../eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/DeterministicEthScheduler.java
Outdated
Show resolved
Hide resolved
...m/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedQueueTest.java
Outdated
Show resolved
Hide resolved
No worries. Thank you for the time you spent on it. |
- use assertj assertions for better readability - improve unit test
PR description
The time based policy has been removed. A follow up PR will be created to include a timeout in
TransactionsMessageProcessor
and to skip message if the message has expired.use a time based policy with keep alive configurationimplement eviction process based on the policyMonitoredExecutors
to create a working queue with a maximum capacityEthScheduler
to use a limited working queue for thetxWorkerExecutor
Fixed Issue(s)