Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect account sequence errors in vald #321

Open
ggutoski opened this issue Jan 17, 2022 · 1 comment
Open

incorrect account sequence errors in vald #321

ggutoski opened this issue Jan 17, 2022 · 1 comment
Assignees

Comments

@ggutoski
Copy link
Contributor

My testnet validator posts many log messages of the form:

2022-01-17T19:34:58Z INF backing off (retry in 6.950446643s ): rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1623, got 1622: incorrect account sequence: invalid request module=vald

ie. sequence mismatch off by 1 for a couple hundred attempts now for hours. This behaviour persists even after restarting the node. (This puts the validator in missed-too-many-blocks status.)

More logs:

2022-01-17T20:04:58Z INF operator axelarvaloper1ckg9aus8tv9xayyc9pzz8yxvrpty9mmnc09yak sending heartbeat acknowledging keys: [avax-master-1 avax-master-genesis avax-secondary-1 avax-secondary-genesis eth-master-1 eth-master-genesis eth-secondary-1 eth-secondary-genesis ftm-master-1 ftm-master-genesis ftm-secondary-1 ftm-secondary-genesis moonbeam-master-1 moonbeam-master-genesis moonbeam-secondary-1 moonbeam-secondary-genesis poly-master-1 poly-master-genesis poly-secondary-1 poly-secondary-genesis] listener=tss module=vald
2022-01-17T20:04:58Z INF operator axelarvaloper1ckg9aus8tv9xayyc9pzz8yxvrpty9mmnc09yak sending heartbeat acknowledging keys: [avax-master-1 avax-master-genesis avax-secondary-1 avax-secondary-genesis eth-master-1 eth-master-genesis eth-secondary-1 eth-secondary-genesis ftm-master-1 ftm-master-genesis ftm-secondary-1 ftm-secondary-genesis moonbeam-master-1 moonbeam-master-genesis moonbeam-secondary-1 moonbeam-secondary-genesis poly-master-1 poly-master-genesis poly-secondary-1 poly-secondary-genesis] listener=tss module=vald
2022-01-17T20:05:08Z INF backing off (retry in 5s ): RPC error -32603 - Internal error: timed out waiting for tx to be included in a block module=vald
2022-01-17T20:05:08Z INF backing off (retry in 5s ): RPC error -32603 - Internal error: timed out waiting for tx to be included in a block module=vald
2022-01-17T20:05:16Z INF backing off (retry in 6.903285946s ): rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald
2022-01-17T20:05:16Z INF backing off (retry in 6.903285946s ): rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald

...

2022-01-17T20:08:16Z ERR handler goroutine: failure to broadcast outgoing heartbeat msg: aborting after 10 retries: rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald
2022-01-17T20:08:16Z ERR handler goroutine: failure to broadcast outgoing heartbeat msg: aborting after 10 retries: rpc error: code = InvalidArgument desc = account sequence mismatch, expected 1626, got 1625: incorrect account sequence: invalid request module=vald

A while later I also see

2022-01-17T21:16:03Z ERR tx response with error: : tx already in mempool module=vald
2022-01-17T21:16:03Z ERR job panicked:runtime error: invalid memory address or nil pointer dereference module=vald
2022-01-17T21:16:03Z ERR tx response with error: : tx already in mempool module=vald
2022-01-17T21:16:03Z ERR job panicked:runtime error: invalid memory address or nil pointer dereference module=vald

@milapsheth suggests that perhaps there are two instances of vald running. My investigation supports this. A ps inside the vald container shows:

docker exec -it vald sh
/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 {vald-entrypoint} /bin/sh /home/axelard/shared/vald-entrypoint.sh startValdProc
   30 root      0:00 {vald-entrypoint} /bin/sh /home/axelard/shared/vald-entrypoint.sh startValdProc
   33 root      0:20 axelard vald-start --tofnd-host tofnd --node http://axelar-core:26657 --validator-addr axelarvaloper1ckg9aus8tv9xayyc9pzz8yxvrpty9mmnc09yak
   44 root      0:00 sh
   51 root      0:00 ps

suggesting that startValdProc is being called twice.

Look at lines 341 and 344 of the following snippet:

docker run \
-d \
--name vald \
--network "${docker_network}" \
--user 0:0 \
--restart unless-stopped \
--env TOFND_HOST=tofnd \
--env HOME=/home/axelard \
--env VALIDATOR_HOST=http://axelar-core:26657 \
--env PRESTART_SCRIPT=/home/axelard/shared/init-vald.sh \
--env CONFIG_PATH=/home/axelard/shared/ \
--env SLEEP_TIME=2s \
--env VALIDATOR_ADDR="${validator}" \
--env RECOVERY_FILE=/home/axelard/.axelar/recovery.json \
--env KEYRING_PASSWORD="${KEYRING_PASSWORD}" \
--env AXELAR_MNEMONIC_PATH=/home/axelard/shared/broadcaster.txt \
--env AXELARD_CHAIN_ID="${chain_id}" \
--entrypoint /home/axelard/shared/vald-entrypoint.sh \
-v "${vald_directory}/:/home/axelard/.axelar" \
-v "${shared_directory}/:/home/axelard/shared" \
"axelarnet/axelar-core:${axelar_core_version}" startValdProc

344: run startValdProc
341: call vald-entrypoint.sh which does:

@ggutoski
Copy link
Contributor Author

I think I was collecting logs incorrectly with docker logs -f vald 2&> vald.log. Perhaps the 2&> was causing the duplicate output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants