Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the chain with IBFT PoA: the chain crashes while spamming many transactions #1850

Open
JDawg287 opened this issue Aug 24, 2023 · 8 comments

Comments

@JDawg287
Copy link

JDawg287 commented Aug 24, 2023

Running the chain with IBFT PoA: the chain crashes while spamming many transactions

Description

While stress testing the Polygon Edge client v1.1.0, I came across a strange behaviour. To explain a bit I want to use a private network using the IBFT PoA consensus protocol. In order to assess the stability of the Polygon Edge client under stress, I made a network using AWS EC2 instances and made a script to spam EOA to EOA transactions to the network.

Environment

  • Ubuntu 22.04.2 LTS
  • Polygon Edge v1.1.0
  • Head (13e82b6)
  • AWS EC2 instances - t3.xlarge

Chain Specs

  • 4 Validators
  • Block time: 3s
  • Gas limit: 672000000
  • Max enqueued: 100000000
  • Max slots: 100000000

Setup

The setup is fairly simple. I am running 4 validator nodes and each node has 1000 secrets generated for it. I premine an amount for each Ethereum address (for the secrets) in the genesis.json. This is to prevent having many transactions in the mempool with different nonces for any single Ethereum address at a single time.
Here is what the setup looks like:
image

The script to spam the transactions is as follows:

import { ethers } from 'ethers';
import dotenv from 'dotenv';

let totalTransactions = 0;

function makeWallet(privateKey, provider) {
    return new ethers.Wallet(privateKey, provider);
}

// Main function
async function main(args) {
    const endpoint = args[0];
    const amount = args[1];
    const toAddresses = JSON.parse(args[2]);
    const senderList = JSON.parse(args[3]);

    // Validate the arguments
    if (!endpoint || !amount || !toAddresses || !senderList) {
        console.log('Usage: node send_batch.js <endpoint> <amount> <toAddresses> <senderList>');
        console.log('Example: node send_batch.js http://localhost:8545 0.1 \'["0x1234567890", "0x0987654321"]\' \'["0x1234567890", "0x0987654321"]\' 1');
        return;
    }

    dotenv.config();
    const batchSize = parseInt(process.env.BATCH_SIZE);
    const timeout = parseInt(process.env.TIMEOUT_BATCH_SEND);

    // Validate the environment variables
    if (!batchSize || !timeout) {
        console.log('Please set the environment variables BATCH_SIZE, TIMEOUT_BATCH_SEND');
        return;
    }

    let options = { // default settings should allow for big batches
        polling: false,
        staticNetwork: null,
        batchStallTime: 1,
        batchMaxSize: 1024,
        batchMaxCount: 1000000,
        cacheTimeout: 250
    }
    const batchProvider = new ethers.JsonRpcProvider(endpoint, null, options = options);

    // keep a map of nonces for each sender
    let nonceMap = {};
    for (let i = 0; i < senderList.length; i++) {
        let sender = senderList[i];
        let wallet = makeWallet(sender, batchProvider);
        nonceMap[sender] = await batchProvider.getTransactionCount(wallet.address);
    }
    // console.log(nonceMap);

    while (true) {
        console.time('Batch creation time');
        const transactionsMap = [];
        for (let i = 0; i < batchSize; i++) {
            let arrayIndex = i % toAddresses.length;
            let to_address = toAddresses[arrayIndex];
            let senderIndex = i % senderList.length;
            let sender = senderList[senderIndex];
            let transaction = {
                to: to_address,
                value: ethers.parseEther(amount),
                gasLimit: 21000,
                data: "0x",
                gasPrice: ethers.parseUnits('0.0', 'gwei'),
                nonce: nonceMap[sender]
            };
            transactionsMap.push([makeWallet(sender, batchProvider), transaction]);
            nonceMap[sender] += 1;
        };
        console.timeEnd('Batch creation time');
        totalTransactions += batchSize;
        console.log(`Total transactions sent: ${totalTransactions}`);
        // console.log(transactionsMap);

        console.time('Batch send time');
        Promise.all(transactionsMap.map(
            tuple => tuple[0].sendTransaction(tuple[1])
        )).then(receipts => {
            console.log(`Sent ${receipts.length} transactions`)
        }).catch(err => {
            console.log(err)
        });
        console.timeEnd('Batch send time');
        await new Promise(resolve => setTimeout(resolve, timeout));
    }
}

// Call the main function and handle any errors that occur
main(process.argv.slice(2)).catch(err => console.log(err));

Each validator node is spammed with transactions using a single HTTP connection and the transactions are sent via JSON RPC. This is to simulate a real use case where potentially the network would need to handle many transactions at the same time. The idea is to keep spamming transactions at a steady rate. The script also takes care of managing the nonce for each address since eth_sendTransaction method is not supported by the Polygon Edge client as of yet.

I start the test by firing up the nodes and letting the network produce some blocks. Note: If I let the chain run the chain keeps growing indefinitely.
Then I set the environment variables which are stored in a .env file.

BATCH_SIZE=3000
TIMEOUT_BATCH_SEND=3000

After that, I run the send_batch.js script on each node using the following command:

node send_batch.js http://0.0.0.0:10002 0.000000000000000001 '["<val_addr_1>", "<val_addr_2>", "<val_addr_3>", "<val_addr_4>"]' $(cat secrets/secrets-*)

Where the secrets-* represents the file containing secrets to be used by each node (e.g. secrets-1.json for validator 1, secrets-2.json for validator 2 etc.). The send_batch.js script has a timeout to stop the machine from hanging.

Expected behaviour

The script works fine and I am able to predict the output throughput (TPS) when the BATCH_SIZE is kept low. I can calculate spamming with a BATCH_SIZE of a certain amount, would produce a certain number of transactions in a block. For e.g. setting the BATCH_SIZE to 250 and the TIMEOUT_BATCH_SEND to 3000 (3s - same as the block time), I can calculate

250 / 3 (timeout)  = 83.33 TPS
83.33 * 4 (validators) = 333.33 TPS
333.33 * 3 (block time) = 999.99~1000 Transactions in a block

Which can be seen from the data I collected:
image
Note: ignoring the batch creation time in this case since it is insignificant

Now the problem arises when I increase the BATCH_SIZE. With a bigger BATCH_SIZE, I also account for the batch creation time since it takes longer for each batch to be created. For this particular scenario, I increased the BATCH_SIZE to 3000 and added 2.2s to the timeout (which can be calculated from the script above). I was expecting around 6900 transactions in a block (which the Polygon Edge chain should be able to handle by looking at the old tests from your team here), but I was unable to see that number. I was also getting empty blocks for some reason. After letting the chain run for a while, it seizes to produce any new blocks. I started noticing this behaviour from BATCH_SIZE 2500 and above seemingly at random block heights.

image

Any clue as to why there are no new blocks under load? I also tried to change the block time to 6 seconds but the behaviour is more or less the same.

Logs

The logs from one of the validators can be found here. Had to upload it separately as the file is too large.

@RonTuretzky
Copy link

wonder if you've tried this with polybft? @JDawg287

@Vitomir2
Copy link

Vitomir2 commented Jul 4, 2024

Hey, buddy :) Did you try the pandoras-box to test your nodes? It is a pretty decent tool and has EOA, ERC-20, and ERC-721 modes 💯

However, I have a custom polygon-edge node implementation, and the node is configured to have like 20 transactions per batch. When I test with 1000-1400 transactions, it works fine, but when I increase it to 2000 or more, then some start to get stuck in the TX pool and I receive the following errors:

2024-07-04T05:28:45.303Z [WARN]  hydra.server.dispatcher: failed to dispatch: method=eth_sendRawTransaction err="maximum number of enqueued transactions reached"
2024-07-04T05:28:45.215Z [ERROR] hydra.txpool: failed to add tx: err="already known"

It seems like I have a similar problem to yours. Did you find a way to handle the issue?

Another strange thing is that I achieve 120 TPS, but I see in the Polygon-edge's docs they have nearly 2.5k TPS.

@RonTuretzky
Copy link

Hey, buddy :) Did you try the pandoras-box to test your nodes? It is a pretty decent tool and has EOA, ERC-20, and ERC-721 modes 💯

However, I have a custom polygon-edge node implementation, and the node is configured to have like 20 transactions per batch. When I test with 1000-1400 transactions, it works fine, but when I increase it to 2000 or more, then some start to get stuck in the TX pool and I receive the following errors:

2024-07-04T05:28:45.303Z [WARN]  hydra.server.dispatcher: failed to dispatch: method=eth_sendRawTransaction err="maximum number of enqueued transactions reached"
2024-07-04T05:28:45.215Z [ERROR] hydra.txpool: failed to add tx: err="already known"

It seems like I have a similar problem to yours. Did you find a way to handle the issue?

Another strange thing is that I achieve 120 TPS, but I see in the Polygon-edge's docs they have nearly 2.5k TPS.

What impl are you using?

Did you change the --max-enqueued-transactions on the server run params?

@Vitomir2
Copy link

Vitomir2 commented Jul 4, 2024

Hey, buddy :) Did you try the pandoras-box to test your nodes? It is a pretty decent tool and has EOA, ERC-20, and ERC-721 modes 💯
However, I have a custom polygon-edge node implementation, and the node is configured to have like 20 transactions per batch. When I test with 1000-1400 transactions, it works fine, but when I increase it to 2000 or more, then some start to get stuck in the TX pool and I receive the following errors:

2024-07-04T05:28:45.303Z [WARN]  hydra.server.dispatcher: failed to dispatch: method=eth_sendRawTransaction err="maximum number of enqueued transactions reached"
2024-07-04T05:28:45.215Z [ERROR] hydra.txpool: failed to add tx: err="already known"

It seems like I have a similar problem to yours. Did you find a way to handle the issue?
Another strange thing is that I achieve 120 TPS, but I see in the Polygon-edge's docs they have nearly 2.5k TPS.

What impl are you using?

Did you change the --max-enqueued-transactions on the server run params?

Nah, it uses the default one. I suppose increasing it will do the trick, but what is the recommended number of max enqueued txs?

@RonTuretzky
Copy link

Hey, buddy :) Did you try the pandoras-box to test your nodes? It is a pretty decent tool and has EOA, ERC-20, and ERC-721 modes 💯
However, I have a custom polygon-edge node implementation, and the node is configured to have like 20 transactions per batch. When I test with 1000-1400 transactions, it works fine, but when I increase it to 2000 or more, then some start to get stuck in the TX pool and I receive the following errors:

2024-07-04T05:28:45.303Z [WARN]  hydra.server.dispatcher: failed to dispatch: method=eth_sendRawTransaction err="maximum number of enqueued transactions reached"
2024-07-04T05:28:45.215Z [ERROR] hydra.txpool: failed to add tx: err="already known"

It seems like I have a similar problem to yours. Did you find a way to handle the issue?
Another strange thing is that I achieve 120 TPS, but I see in the Polygon-edge's docs they have nearly 2.5k TPS.

What impl are you using?

Did you change the --max-enqueued-transactions on the server run params?

Nah, it uses the default one. I suppose increasing it will do the trick, but what is the recommended number of max enqueued txs?

No reference to that in the documentation , but we've been using it with pretty high values.

Again, what implementation are you using?

@Vitomir2
Copy link

Vitomir2 commented Jul 4, 2024

Hey, buddy :) Did you try the pandoras-box to test your nodes? It is a pretty decent tool and has EOA, ERC-20, and ERC-721 modes 💯
However, I have a custom polygon-edge node implementation, and the node is configured to have like 20 transactions per batch. When I test with 1000-1400 transactions, it works fine, but when I increase it to 2000 or more, then some start to get stuck in the TX pool and I receive the following errors:

2024-07-04T05:28:45.303Z [WARN]  hydra.server.dispatcher: failed to dispatch: method=eth_sendRawTransaction err="maximum number of enqueued transactions reached"
2024-07-04T05:28:45.215Z [ERROR] hydra.txpool: failed to add tx: err="already known"

It seems like I have a similar problem to yours. Did you find a way to handle the issue?
Another strange thing is that I achieve 120 TPS, but I see in the Polygon-edge's docs they have nearly 2.5k TPS.

What impl are you using?
Did you change the --max-enqueued-transactions on the server run params?

Nah, it uses the default one. I suppose increasing it will do the trick, but what is the recommended number of max enqueued txs?

No reference to that in the documentation , but we've been using it with pretty high values.

Again, what implementation are you using?

It is a modified version of the polybft. If this is what you are asking.

@RonTuretzky
Copy link

Hey, buddy :) Did you try the pandoras-box to test your nodes? It is a pretty decent tool and has EOA, ERC-20, and ERC-721 modes 💯
However, I have a custom polygon-edge node implementation, and the node is configured to have like 20 transactions per batch. When I test with 1000-1400 transactions, it works fine, but when I increase it to 2000 or more, then some start to get stuck in the TX pool and I receive the following errors:

2024-07-04T05:28:45.303Z [WARN]  hydra.server.dispatcher: failed to dispatch: method=eth_sendRawTransaction err="maximum number of enqueued transactions reached"
2024-07-04T05:28:45.215Z [ERROR] hydra.txpool: failed to add tx: err="already known"

It seems like I have a similar problem to yours. Did you find a way to handle the issue?
Another strange thing is that I achieve 120 TPS, but I see in the Polygon-edge's docs they have nearly 2.5k TPS.

What impl are you using?
Did you change the --max-enqueued-transactions on the server run params?

Nah, it uses the default one. I suppose increasing it will do the trick, but what is the recommended number of max enqueued txs?

No reference to that in the documentation , but we've been using it with pretty high values.

Again, what implementation are you using?

It is a modified version of the polybft. If this is what you are asking.

Is it publicly available?

@Vitomir2
Copy link

Vitomir2 commented Jul 4, 2024

Hey, buddy :) Did you try the pandoras-box to test your nodes? It is a pretty decent tool and has EOA, ERC-20, and ERC-721 modes 💯
However, I have a custom polygon-edge node implementation, and the node is configured to have like 20 transactions per batch. When I test with 1000-1400 transactions, it works fine, but when I increase it to 2000 or more, then some start to get stuck in the TX pool and I receive the following errors:

2024-07-04T05:28:45.303Z [WARN]  hydra.server.dispatcher: failed to dispatch: method=eth_sendRawTransaction err="maximum number of enqueued transactions reached"
2024-07-04T05:28:45.215Z [ERROR] hydra.txpool: failed to add tx: err="already known"

It seems like I have a similar problem to yours. Did you find a way to handle the issue?
Another strange thing is that I achieve 120 TPS, but I see in the Polygon-edge's docs they have nearly 2.5k TPS.

What impl are you using?
Did you change the --max-enqueued-transactions on the server run params?

Nah, it uses the default one. I suppose increasing it will do the trick, but what is the recommended number of max enqueued txs?

No reference to that in the documentation , but we've been using it with pretty high values.
Again, what implementation are you using?

It is a modified version of the polybft. If this is what you are asking.

Is it publicly available?

Yes, it is a public repo already - https://github.com/Hydra-Chain/hydragon-node

Btw, I have just checked that the default enqueue transactions is 128, atm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants