Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use 0.0.58 freeze admin account for freeze transaction #365

Closed
wants to merge 23 commits into from

Conversation

JeffreyDallas
Copy link
Contributor

Description

This pull request changes the following:

  • Changed to use 0.0.58 freeze admin account to sent freeze upgrade transactions

Related Issues

Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
…init-account

Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
@JeffreyDallas JeffreyDallas requested review from leninmehedy and a team as code owners June 7, 2024 16:57
@JeffreyDallas JeffreyDallas self-assigned this Jun 7, 2024
Copy link
Contributor

github-actions bot commented Jun 7, 2024

Unit Test Results - Linux

  1 files   19 suites   1m 27s ⏱️
123 tests 123 ✅ 0 💤 0 ❌
149 runs  149 ✅ 0 💤 0 ❌

Results for commit a5ce03c.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Jun 7, 2024

Unit Test Results - Windows

  1 files   19 suites   1m 32s ⏱️
123 tests 123 ✅ 0 💤 0 ❌
149 runs  149 ✅ 0 💤 0 ❌

Results for commit a5ce03c.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Jun 7, 2024

E2E Relay Tests Coverage Report

1 files  1 suites   3m 1s ⏱️
5 tests 5 ✅ 0 💤 0 ❌
6 runs  6 ✅ 0 💤 0 ❌

Results for commit a5ce03c.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Jun 7, 2024

E2E Tests Coverage Report

56 tests   55 ✅  5m 53s ⏱️
10 suites   0 💤
 1 files     1 ❌

For more details on these failures, see this check.

Results for commit a5ce03c.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Jun 7, 2024

E2E Node PEM Stop Add Tests Coverage Report

 1 files   1 suites   9m 41s ⏱️
18 tests 15 ✅ 2 💤 1 ❌
20 runs  17 ✅ 2 💤 1 ❌

For more details on these failures, see this check.

Results for commit a5ce03c.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Jun 7, 2024

E2E Mirror Node Tests Coverage Report

11 tests   11 ✅  3m 42s ⏱️
 1 suites   0 💤
 1 files     0 ❌

Results for commit a5ce03c.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Jun 7, 2024

E2E Node PFX Kill Add Tests Coverage Report

 1 files   1 suites   10m 34s ⏱️
18 tests 15 ✅ 2 💤 1 ❌
20 runs  17 ✅ 2 💤 1 ❌

For more details on these failures, see this check.

Results for commit a5ce03c.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Jun 7, 2024

E2E Node Local Build Tests Coverage Report

8 tests   8 ✅  5m 29s ⏱️
2 suites  0 💤
1 files    0 ❌

Results for commit ebfa104.

♻️ This comment has been updated with latest results.

JeffreyDallas and others added 2 commits June 7, 2024 12:52
Co-authored-by: Jeromy Cannon <jeromy@swirldslabs.com>
Signed-off-by: JeffreyDallas <39912573+JeffreyDallas@users.noreply.github.com>
…init-account

Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
jeromy-cannon
jeromy-cannon previously approved these changes Jun 7, 2024
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>

// set operator of freeze transaction as freeze admin account
const accountKeys = await this.accountManager.getAccountKeysFromSecret(FREEZE_ADMIN_ACCOUNT, config.namespace)
const freezeAdminPrivateKey = accountKeys.privateKey
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E2E test of adding node failed here due to accoutnKeys is "undefined",
seems 0.0.58 accout does not preexist yet.
If I added node init() first, then there are error during update account

failed to update account keys for accountId 0.0.300

solo.log

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in your logs it looks like solo account init is failing. Does it ever pass? Originally, I had it running the full set of accounts in an E2E test, but Lenin reduced it to a subset to make it run faster. So, we haven't been running the full set on a regular basis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If running the following commands from terminal, everything is working,

  kind delete cluster -n "${SOLO_CLUSTER_NAME}" || true
  kind create cluster -n "${SOLO_CLUSTER_NAME}" || return
  solo init -d ${FST_CHARTS_DIR} --namespace "${SOLO_NAMESPACE}" -i node0,node1,node2 -t v0.49.0-alpha.2 -s "${SOLO_CLUSTER_SETUP_NAMESPACE}" || return
  solo node keys --gossip-keys --tls-keys --key-format pem || return
  solo cluster setup  || return
  solo network deploy || return
  solo node setup || return
  solo node start || return
  solo account init || return
  solo node add -i node3 || return

but E2E add not test did not have solo account init part, so i added:

      it('should succeed with init command', async () => {
        const status = await accountCmd.init(argv)
        expect(status).toBeTruthy()
      }, 180000)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally when I wrote this code, I found that the first time the nodes were doing a write transaction they would create the system accounts and pause for a few hundred milliseconds. This caused an issue where the grpc calls all being released at once would fail.

Later, when Lenin was trying to fix the logic, he removed all of that logic and instead of running them in parallel, just made them run sequentially. Later, he found the bug was actually in @hashgraph/sdk, and put in a fix to correct the loggers they were creating and not closing. I asked him to re-enable the parallel, which he did. But, I don't think he updated the logic to do the initial single account update call before releasing the rest. He also removed the throttling I had. The throttling was less than ideal, however, it helped me get past the issue created by the logger leak (which I did not know existed at that time), and prevented me from overwhelming the consensus nodes.

I'm guessing that this might be related to either:

  1. the consensus nodes pause to do genesis system account creation logic and rejecting the flood of grpc calls
  2. the haproxy and/or consensus node is being overwhelmed with connections and not keeping up correctly

In the case that it is #1, you can check the timestamps. I believe in the hgcaa.log, you can see when it is creating the system accounts and line that up with your grpc call timestamps.

If it is #2, you could try figuring out how to fix the consensus node or haproxy to handle the connections without error; or, you could figure out how to throttle the transactions to prevent them from releasing too many at once.

It has been awhile since I looked at this code, I think Lenin in one version of the code released a certain number of calls before waiting and releasing some more. If that code is still there, you could reduce that number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.
Just wonder why the same piece of code was called
when solo account init is used in command line, working fine

But when called using accountCmd.init(argv), it failed.
The throttling, or parallel/sequential mechanism should be the same when used in the above two
different scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you run from the command line, it will dump the configs into the logs file. I think it also dumps them when you run accountCmd.init(argv). You could take the rows and compare them to see what parameters might be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing any difference in argv dumped values.

JeffreyDallas and others added 5 commits June 14, 2024 12:54
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
@JeffreyDallas JeffreyDallas requested a review from a team as a code owner June 20, 2024 17:06
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
@JeffreyDallas
Copy link
Contributor Author

JeffreyDallas commented Jun 20, 2024

To make sure we compare apple to apple I created a script ./cmdline.sh to setup cluster, network and nodes.

  SOLO_CLUSTER_NAME=solo-e2e
  SOLO_NAMESPACE=solo-e2e
  SOLO_CLUSTER_SETUP_NAMESPACE=solo-e2e-cluster

  kind delete cluster -n "${SOLO_CLUSTER_NAME}" || true
  kind create cluster -n "${SOLO_CLUSTER_NAME}" || return
  solo init --namespace "${SOLO_NAMESPACE}" -i node0,node1,node2 -t v0.49.0-alpha.2 -s "${SOLO_CLUSTER_SETUP_NAMESPACE}" || return
  # solo node keys --gossip-keys --tls-keys  || return
  solo cluster setup  || return
  solo network deploy  || return
  solo node setup  --gossip-keys --tls-keys --key-format pem || return
  solo node start  || return

also lowered total number of system account so the test can finish quicker


if following the setup script by solo account init ,
each batch of update Account finished in 2~3 seconds.

each k8.createSecret() would take 200-300ms,
each sendAccountKeyUpdate() would take 2200~2300ms

step to reproduce ./cmdline.sh; solo account init

hgcaa log:

2024-06-20 17:41:13.509 INFO  87   GenesisRecordsConsensusHook - Queued 100 system account records with consTime 2024-06-20T17:41:12.986128424Z
2024-06-20 17:41:13.510 INFO  96   GenesisRecordsConsensusHook - Queued 2 staking account records with consTime 2024-06-20T17:41:12.986128424Z
2024-06-20 17:41:13.514 INFO  104  GenesisRecordsConsensusHook - Queued 101 misc account records with consTime 2024-06-20T17:41:12.986128424Z
2024-06-20 17:41:13.532 INFO  112  GenesisRecordsConsensusHook - Queued 501 treasury clone account records with consTime 2024-06-20T17:41:12.986128424Z
2024-06-20 17:41:13.532 INFO  120  GenesisRecordsConsensusHook - Queued 0 blocklist account records with consTime 2024-06-20T17:41:12.986128424Z


if following the setup script by a modified e2e test e2e-account-init
( removed all setup part, only kept calling accountCmd.init(argv) ) then it will show
each batch of update Account finished in 20~30 seconds.

each k8.createSecret() would take 7000-8000 ms,
each sendAccountKeyUpdate() would take 4000 ~10000 ms

step to reproduce ./cmdline.sh ; npm run e2e-account-init

hgcaa log:

2024-06-20 17:14:55.421 INFO  87   GenesisRecordsConsensusHook - Queued 100 system account records with consTime 2024-06-20T17:14:54.941991055Z
2024-06-20 17:14:55.422 INFO  96   GenesisRecordsConsensusHook - Queued 2 staking account records with consTime 2024-06-20T17:14:54.941991055Z
2024-06-20 17:14:55.428 INFO  104  GenesisRecordsConsensusHook - Queued 101 misc account records with consTime 2024-06-20T17:14:54.941991055Z
2024-06-20 17:14:55.447 INFO  112  GenesisRecordsConsensusHook - Queued 501 treasury clone account records with consTime 2024-06-20T17:14:54.941991055Z
2024-06-20 17:14:55.447 INFO  120  GenesisRecordsConsensusHook - Queued 0 blocklist account records with consTime 2024-06-20T17:14:54.941991055Z

Now need to figure out why in scenario of running e2e test, even with the same pre-existing cluster/network
setup, it takes much longer to create screte and finish signing the transactions.

Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
Signed-off-by: Jeffrey Tang <jeffrey@swirldslabs.com>
@JeffreyDallas
Copy link
Contributor Author

no longer needed

@JeffreyDallas JeffreyDallas deleted the 00308-D-init-account branch August 23, 2024 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node add stops working when you run account init before
2 participants