[PAN-2786] Stop Transaction Pool Queue from Growing Unbounded #1586

AbdelStark · 2019-06-20T09:34:40Z

PR description

The time based policy has been removed. A follow up PR will be created to include a timeout in TransactionsMessageProcessor and to skip message if the message has expired.

implement a custom bounded queue
~~use a time based policy with keep alive configuration~~
~~implement eviction process based on the policy~~
add metrics
expose a method in MonitoredExecutors to create a working queue with a maximum capacity
update EthScheduler to use a limited working queue for the txWorkerExecutor

Fixed Issue(s)

- use a `ArrayBlockingQueue` with a fixed size to limit the transaction task queue - expose a method in `MonitoredExecutors` to create a working queue with a maximum capacity - update `EthScheduler` to use a limited working queue for the `txWorkerExecutor`

ajsutton · 2019-06-20T10:13:44Z

Wouldn’t this cause exceptions to be thrown or the vertx/netty thread to block once the executor queue fills up? We’d want to discard the oldest messages when the queue is full because they’re likely full off outdated transactions and we want to process the most recent.

We also want to ensure there’s good logging and metrics to indemnify when this happens since losing transactions is a bad thing for a lot of cases.

Finally the queue size of 128 is far too small. Take a look at the queue sizes and incoming message rate for the MainNet Fast i3.large box - it gets up to 500 incoming messages/sec and only hits trouble when the queue size is in the millions. We probably don’t want to let it get that big as the messages are likely out of date by then anyway but it needs to be a few thousand at least I’d say. Having an idea of how long messages were in that queue would help to set the size appropriately.

…tx-pool-queue-bound

AbdelStark · 2019-06-20T12:32:57Z

Wouldn’t this cause exceptions to be thrown or the vertx/netty thread to block once the executor queue fills up? We’d want to discard the oldest messages when the queue is full because they’re likely full off outdated transactions and we want to process the most recent.

We also want to ensure there’s good logging and metrics to indemnify when this happens since losing transactions is a bad thing for a lot of cases.

Finally the queue size of 128 is far too small. Take a look at the queue sizes and incoming message rate for the MainNet Fast i3.large box - it gets up to 500 incoming messages/sec and only hits trouble when the queue size is in the millions. We probably don’t want to let it get that big as the messages are likely out of date by then anyway but it needs to be a few thousand at least I’d say. Having an idea of how long messages were in that queue would help to set the size appropriately.

Yes indeed it will cause exceptions to be thrown once the executor queue fills up.
Ok for the default size too small, i will increase it to few thousand. A next step will be to have this number configurable in a follow up PR.
Regarding discarding oldest messages, where is the best place to put this logic ? In the EthScheduler or in the 'TransactionsMessageHandler'. If we do this in TransactionsMessageHandler it will require to expose the list of pending tasks from the EthScheduler. And we probably would have to use a custom BlockingQueue to implement the logic based on the timestamp.

ajsutton · 2019-06-20T20:56:56Z

I’d say use a bounded queue with a really big size (a million or two), include the time stamp of when the message was first received in the queues task to execute and then when the executor runs the task the first thing it does is check it it’s too old and bail out if so.

The speed difference between checking a time stamp and actually validating a transaction is so great that this mechanism will allow catching up on backlogs very quickly without anything too complex. And if for some reason we’re getting so many incoming messages that we can’t even do a long comparison on each fast enough then the client has no hope, we’ll hit the queue size limit and throw loads of exceptions. The client is unlikely to cope at that point but at least we have a lot of error messages telling us why. Definitely worth checking it throws an exception if the queue is full instead of blocking.

But the idea of using a custom queue is possibly a good one. If we did that, it would be easy for it to detect that the size limit is reached and remove from the front of the list when adding an item to the end. Logging and monitoring could all be inside the custom queue implementation then too.

…tx-pool-queue-bound

- implement a custom bounded queue - use a time based policy with keep alive configuration - implement eviction process based on the policy - add metrics

…tx-pool-queue-bound

AbdelStark · 2019-06-21T10:41:26Z

I’d say use a bounded queue with a really big size (a million or two), include the time stamp of when the message was first received in the queues task to execute and then when the executor runs the task the first thing it does is check it it’s too old and bail out if so.

The speed difference between checking a time stamp and actually validating a transaction is so great that this mechanism will allow catching up on backlogs very quickly without anything too complex. And if for some reason we’re getting so many incoming messages that we can’t even do a long comparison on each fast enough then the client has no hope, we’ll hit the queue size limit and throw loads of exceptions. The client is unlikely to cope at that point but at least we have a lot of error messages telling us why. Definitely worth checking it throws an exception if the queue is full instead of blocking.

But the idea of using a custom queue is possibly a good one. If we did that, it would be easy for it to detect that the size limit is reached and remove from the front of the list when adding an item to the end. Logging and monitoring could all be inside the custom queue implementation then too.

I have implemented a custom queue that use a time based policy when adding an element at full capacity. A keep alive parameter is used to evict transactions that have been in the working queue too much time.

ethereum/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/MockExecutorService.java

...m/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/EthSchedulerShutdownTest.java

ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/TimedTask.java

.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java

ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/TimedTask.java

ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/EthScheduler.java

.../eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedTimedQueue.java

…tx-pool-queue-bound

- use concrete class instead of interface - change metric name to comply with global policy - update unit test - wrap `Runnable` into `scheduleTxWorkerTask`

- remove time based policy - use raw `Runnable` - make a room for a new element at full capacity

invert condition

ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedQueue.java

...m/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedQueueTest.java

ethereum/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/MockExecutorService.java

ethereum/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/EthSchedulerTest.java

ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/manager/EthScheduler.java

- remove Mock class - make logic more thread safe, avoid race condition - remove element until the new one is accepted

…manager/EthScheduler.java Co-Authored-By: Adrian Sutton <adrian@symphonious.net>

….com/abdelhamidbakhta/pantheon into feature/pan-2786-tx-pool-queue-bound

…tx-pool-queue-bound

ajsutton

LGTM. Just a couple of little nits that would be good to tidy up. Thanks for your patience on this one, timezones have made it take a long time - sorry.

.../eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/DeterministicEthScheduler.java

...m/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedQueueTest.java

AbdelStark · 2019-06-25T08:25:07Z

LGTM. Just a couple of little nits that would be good to tidy up. Thanks for your patience on this one, timezones have made it take a long time - sorry.

No worries. Thank you for the time you spent on it.

- use assertj assertions for better readability - improve unit test

Merge remote-tracking branch 'upstream/master' into feature/pan-2786-…

9c2e959

…tx-pool-queue-bound

AbdelStark and others added 5 commits June 21, 2019 09:03

Merge remote-tracking branch 'upstream/master' into feature/pan-2786-…

37b22be

…tx-pool-queue-bound

[PAN-2786] Implement a bounded timed queue

40e3b37

- implement a custom bounded queue - use a time based policy with keep alive configuration - implement eviction process based on the policy - add metrics

use field instead of parameter

066ab5a

Merge remote-tracking branch 'upstream/master' into feature/pan-2786-…

37292ed

…tx-pool-queue-bound

Merge branch 'master' into feature/pan-2786-tx-pool-queue-bound

85112a2

mbaxter reviewed Jun 21, 2019

View reviewed changes

ajsutton reviewed Jun 24, 2019

View reviewed changes

AbdelStark added 4 commits June 24, 2019 08:56

Merge remote-tracking branch 'upstream/master' into feature/pan-2786-…

3602cce

…tx-pool-queue-bound

fix PR pass 1

804f98c

- use concrete class instead of interface - change metric name to comply with global policy - update unit test - wrap `Runnable` into `scheduleTxWorkerTask`

fix PR

7247ef4

- remove time based policy - use raw `Runnable` - make a room for a new element at full capacity

Update BoundedQueueTest.java

4367eca

invert condition

ajsutton reviewed Jun 25, 2019

View reviewed changes

AbdelStark and others added 4 commits June 25, 2019 09:03

fix PR comments

79f7e91

- remove Mock class - make logic more thread safe, avoid race condition - remove element until the new one is accepted

Update ethereum/eth/src/main/java/tech/pegasys/pantheon/ethereum/eth/…

df475ab

…manager/EthScheduler.java Co-Authored-By: Adrian Sutton <adrian@symphonious.net>

Merge branch 'feature/pan-2786-tx-pool-queue-bound' of https://github…

99d8c9c

….com/abdelhamidbakhta/pantheon into feature/pan-2786-tx-pool-queue-bound

spotless apply

a0adf1a

AbdelStark requested a review from ajsutton June 25, 2019 07:35

Merge remote-tracking branch 'upstream/master' into feature/pan-2786-…

1ddbba4

…tx-pool-queue-bound

ajsutton approved these changes Jun 25, 2019

View reviewed changes

.../eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/DeterministicEthScheduler.java Outdated Show resolved Hide resolved

...m/eth/src/test/java/tech/pegasys/pantheon/ethereum/eth/manager/bounded/BoundedQueueTest.java Outdated Show resolved Hide resolved

AbdelStark added 2 commits June 25, 2019 10:25

fix nit comments

253e432

- use assertj assertions for better readability - improve unit test

spotless apply

b3029ea

AbdelStark merged commit 09f0a41 into PegaSysEng:master Jun 25, 2019

AbdelStark mentioned this pull request Jun 25, 2019

[PIE-1707] Implement a timeout in TransactionMessageProcessor #1604

Merged

AbdelStark deleted the feature/pan-2786-tx-pool-queue-bound branch August 23, 2019 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PAN-2786] Stop Transaction Pool Queue from Growing Unbounded #1586

[PAN-2786] Stop Transaction Pool Queue from Growing Unbounded #1586

AbdelStark commented Jun 20, 2019 •

edited

Loading

ajsutton commented Jun 20, 2019

AbdelStark commented Jun 20, 2019

ajsutton commented Jun 20, 2019

AbdelStark commented Jun 21, 2019

ajsutton left a comment

AbdelStark commented Jun 25, 2019

[PAN-2786] Stop Transaction Pool Queue from Growing Unbounded #1586

[PAN-2786] Stop Transaction Pool Queue from Growing Unbounded #1586

Conversation

AbdelStark commented Jun 20, 2019 • edited Loading

PR description

Fixed Issue(s)

ajsutton commented Jun 20, 2019

AbdelStark commented Jun 20, 2019

ajsutton commented Jun 20, 2019

AbdelStark commented Jun 21, 2019

ajsutton left a comment

Choose a reason for hiding this comment

AbdelStark commented Jun 25, 2019

AbdelStark commented Jun 20, 2019 •

edited

Loading