Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balance field in valset validator-info is not populated for all validators #807

Closed
verabehr opened this issue Sep 25, 2023 · 6 comments
Closed
Assignees
Labels
bug Something isn't working paloma

Comments

@verabehr
Copy link

palomad q valset validator-info palomavaloper... doesn't show evm balances for all unjailed validators.

I think this value should be set as it is used for confirming validators balances on the evm and jailing them if they're below - https://github.com/palomachain/paloma/blob/master/x/evm/keeper/attest_validator_balances.go

Here is an example of a validator with balance information:

palomad q valset validator-info palomavaloper1c8uxkkz00qn97wsp9um8y3a8wlmrv3c546jw4q
chainInfos:
- address: 0xEe21301aF1d9562B5cBEdf520077Ea0a9bC9d535
  balance: ""
  chainReferenceID: base-main
  chainType: evm
  pubkey: 7iEwGvHZVitcvt9SAHfqCpvJ1TU=
  traits: []
- address: 0xEe21301aF1d9562B5cBEdf520077Ea0a9bC9d535
  balance: ""
  chainReferenceID: bnb-main
  chainType: evm
  pubkey: 7iEwGvHZVitcvt9SAHfqCpvJ1TU=
  traits: []
- address: 0xEe21301aF1d9562B5cBEdf520077Ea0a9bC9d535
  balance: ""
  chainReferenceID: eth-main
  chainType: evm
  pubkey: 7iEwGvHZVitcvt9SAHfqCpvJ1TU=
  traits: []
- address: 0xEe21301aF1d9562B5cBEdf520077Ea0a9bC9d535
  balance: ""
  chainReferenceID: matic-main
  chainType: evm
  pubkey: 7iEwGvHZVitcvt9SAHfqCpvJ1TU=
  traits: []
- address: 0xEe21301aF1d9562B5cBEdf520077Ea0a9bC9d535
  balance: ""
  chainReferenceID: op-main
  chainType: evm
  pubkey: 7iEwGvHZVitcvt9SAHfqCpvJ1TU=
  traits: []

The last message to add validator balances before our validator was jailed at 2023-09-25T16:41:59Z

Sep 25 16:41:59 mainnet-validator palomad[902718]: {"level":"info","module":"server","module":"x/valset","val-addr":"palomavaloper1wm4y8yhppxud6j5wvwr7fyynhh09tmv5fy845g","reason":"not supporting these external chains: [evm, base-main], [evm, bnb-main], [evm, matic-main]","time":"2023-09-25T16:41:59Z","message":"jailing a validator"}
Sep 25 16:38:51 mainnet-validator palomad[902718]: {"level":"info","module":"server","module":"x/palomaconsensus","queue-type-name":"evm/base-main/validators-balances","message-id":108639,"time":"2023-09-25T16:38:51Z","message":"put message into consensus queue"}
Sep 25 16:38:51 mainnet-validator palomad[902718]: {"level":"info","module":"server","module":"x/palomaconsensus","queue-type-name":"evm/bnb-main/validators-balances","message-id":108640,"time":"2023-09-25T16:38:51Z","message":"put message into consensus queue"}
Sep 25 16:38:51 mainnet-validator palomad[902718]: {"level":"info","module":"server","module":"x/palomaconsensus","queue-type-name":"evm/eth-main/validators-balances","message-id":108641,"time":"2023-09-25T16:38:51Z","message":"put message into consensus queue"}
Sep 25 16:38:51 mainnet-validator palomad[902718]: {"level":"info","module":"server","module":"x/palomaconsensus","queue-type-name":"evm/matic-main/validators-balances","message-id":108642,"time":"2023-09-25T16:38:51Z","message":"put message into consensus queue"}
Sep 25 16:38:51 mainnet-validator palomad[902718]: {"level":"info","module":"server","module":"x/palomaconsensus","queue-type-name":"evm/op-main/validators-balances","message-id":108643,"time":"2023-09-25T16:38:51Z","message":"put message into consensus queue"}

checking pigeon logs for that time shows balances being recorded for some validators, but not ours
e.g.

Sep 25 16:39:16 pigeon[902972]: {"balance":95532145000000000,"evm-address":"0x378aa2d29b46b0c3497f5b2fb80919348bf54c3e","height":32055821,"level":"info","msg":"got balance","nearest-to-time":"2023....
@taariq taariq added paloma bug Something isn't working labels Sep 26, 2023
@byte-bandit
Copy link

Observations:

This is a problem that's been around a long time as well. What's happening is that Pigeons will send their external chain infos every 30 seconds. Those records will NOT include the balances of the external addresses. There's a separate job running for this, which queries external chain balances and those will be attested by all pigeons before being added to the external chain information set.

Now, when Pigeon reports in every 30 seconds, it OVERRIDES the existing record. This might be by design, but either the balances aren't being attested properly or pigeon sends the chain info too often.

I'm not sure about the intentions behind this design, but I suggest we store the last ExternalChainInfo record sent to Paloma - and if this value doesn't deviate from the last one sent, we simply skip the send. We save on fees and reduce messages and also stop constantly overriding this balance record.

I still need to check how often balances are updated and attested.

@verabehr
Copy link
Author

@byte-bandit great find. Yes pigeon sends this way too often and we in fact have a ticket to be smarter about that here #20.
We increased the frequency of chain info being sent because initially validators would get jailed whenver we added a new chain if my memory serves me well. Long story short, I agree with your suggestion and the current status was not the desired end state.

@byte-bandit
Copy link

@verabehr I added a PR that will avoid making a new call for updating the external account infos unless they have changed. This should greatly reduce the amount of noise we get from this.

I also checked on what Paloma does. It will ask to update the balances on external chains every 300 blocks, which is roughly every 500 seconds at the moment, over 8 minutes. So it's no wonder almost all chain info records were not showing any balances as they constantly got overridden.

Also, 300 blocks seems like a high interval to me for the balance updates. I would maybe cut it down to 100 blocks - although this will tripple the RPC endpoint activity for all pigeons for the balance attestation.

@byte-bandit
Copy link

More observations:

At the moment, external chain balances will be queried every 8 minutes or so. Changes in the balance will NOT trigger a new valset update. That's PROBABLY okay, as I don't think the balance data in the active valset is used anywhere else than jailing pigeons due to too low balance of external chain tokens.

More than that, I need to look at how Paloma selects Pigeons - where is the balance data coming from there? The valset or live data?

@byte-bandit
Copy link

@verabehr @taariq

Here's a short summary of my findings on all of this:

  • Balances of external chain tokens are queried and verified every 8 minutes
  • This does not trigger a valset update
  • That's probably fine, as the balance doesn't seem to be used
  • Validators will get jailed when their balance update (every 8 minutes) is below the chain minimum
  • When selecting a Pigeon for message relaying, Paloma inspects the current valset. The target chain balance is not at all considered when making this decision, I suspect that Paloma assumes that any validator with too low balance would be jailed at this stage anyway

The problem

The problem with increasing this balances interval even higher (i.e. once per hour) bears the risk of running into issues. Let's assume we're sending a high amount of messages. Some validators balances might drop below the minimum threshold of what's needed for relaying messages. Paloma will currently jail those validators every 8 minutes, so there's an 8 minute overlap where a message might go to a Pigeon that won't be able to relay the message, which will lead to a failed relay.

Upping that interval to i.e. an hour will mean that there is now a window of 1 hour where Pigeons with too little funds will still receive messages. More importantly, after getting jailed, they will stay jailed for up to one hour after having more funds injected - as we only check once per hour.

What's next?

I suggest we leave the balance query interval at 8 minutes. It's not ideal, but reactive enough to jail/unjail as needed. Once we get message relay retries where a different pigeon will pick up a failed attempt (discussed here: #802), a message would still get relayed properly even when assigned to a failing Pigeon first.

@verabehr
Copy link
Author

As discussed, we will close this out and observe if that becomes an issue. We will also increase the frequency of gas refunds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working paloma
Projects
None yet
Development

No branches or pull requests

3 participants