Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimising querying for validator delegations #15162

Closed
freak12techno opened this issue Feb 25, 2023 · 18 comments · Fixed by #15731
Closed

Optimising querying for validator delegations #15162

freak12techno opened this issue Feb 25, 2023 · 18 comments · Fixed by #15731
Labels
C:x/staking T: Performance Performance improvements

Comments

@freak12techno
Copy link
Contributor

When querying for validator's delegations, if I am correct, it takes all the delegations from the database, and then only returns those where validator address matches the one queried for (x/staking/keeper/delegation.go). This might make it awfully slow if, for example, there's 10k delegations overall and validator I'm querying for has only 5 delegations.

It concerns me because I have my tools that are periodically querying my delegators count (by asking for 1st delegation and the count of all delegators), and such queries are basically killing the fullnode I'm querying it from.

I can likely make a PR on optimising it, but I'm not sure if we store delegations by validator somewhere (likely not, if not we need to index it separately to work. If yes, we just need to take the validator delegations and use it instead of asking for all delegations).

Do you think it's the right approach (the one I described) or do we have other reasons to do it the way it's done now?

@github-actions github-actions bot added the needs-triage Issue that needs to be triaged label Feb 25, 2023
@freak12techno
Copy link
Contributor Author

Another point to be optimised: querying validators delegations with pagination and counting them. If I'm correct, for example if a validator has 1000 delegations and I am querying for first 5 while asking to count all of them, it'll take all of them into the database then return only the first 5, but will use all the entries to calculate the total delegations count. Maybe it's worth it to store delegators count somewhere as a separate key in store and use it for calculation?

@freak12techno
Copy link
Contributor Author

Okay I've researched a bit more, so I was partially right.
Here's how validator delegations are queried:

func (k Querier) ValidatorDelegations(c context.Context, req *types.QueryValidatorDelegationsRequest) (*types.QueryValidatorDelegationsResponse, error) {

It'll get the store and use GenericFilteredPaginate on that, which will query for all the records until it finds N records which are matching the predicate (N = limit).
In my case I have 1 delegation on my validator (validator itself), so it'll have to iterate through all the records to check there are no more records to return, as 1 is not enough for a full batch.
Also if there's a validator where there are more delegations than the limit in pagination, it'll have to iterate through all entries to count them.
It may be fixed by 1) storing all delegations not in the same store key, but by validator, so when you need to get validator delegations, you just take the right key and receive only those with needed validator, and 2) maybe storing delegations count per validator somewhere as a separate key, so no need to iterate over all validator entries to get the count, which would make requests with count_total=true faster.

@SmartStake
Copy link

thanks for raising this item. even otherwise, the delegations API has been gradually slowing down over the last two years. it has gone from near instant response times to high response times to the current stage where for Cosmos and Terra, it just dies and is impractical). The workarounds have somewhat helped but it is still an everyday operational work item for smart stake dashboard needs (even for syncing all delegations just once a day). Hoping to get some short term improvements as outlined on this ticket and more permanent improvements in longer term.

@atheeshp
Copy link
Contributor

atheeshp commented Feb 27, 2023

  1. maybe storing delegations count per validator somewhere as a separate key, so no need to iterate over all validator entries to get the count, which would make requests with count_total=true faster.

I don't think this make the query faster, assume there is only one delegation from a delegator_address and he is querying the state with page_limit 10 and count_total = false (still this query can iterate over all the delegations in state in the worst case, if it is at the end of the state).

The only solution that I can see here is add an extra index which can iterate over validator address (which is already discussed here).

for ex the key can look like:

0x81 | ValidatorAddrLen (1 byte) | ValidatorAddr | DelegatorAddrLen (1 byte) | DelegatorAddr -> (some non nil value)

with the above key we can iterate over validator_address and decode the complete key to get the delegator address, now we have both delegator_address and validator_address, we can simply form the key to get delegation entry from delegation key prefix using these keys.

and also can we close this issue and discuss on the older one, since there is already an issue present for this #10516.

@freak12techno
Copy link
Contributor Author

freak12techno commented Feb 27, 2023

@atheeshp

I don't think this make the query faster, assume there is only one delegation from a delegator_address and he is querying the state with page_limit 10 and count_total = false (still this query can iterate over all the delegations in state by querying my delegators list with limit=1 and count_total=true and just using the total count from the response).

It may improve the speed of queries with count_total=true though, and I am pretty sure this would be useful, as it can be a use case to query for delegators amount and not the delegators themselves (for example I am using such queries for my validator's website).

and also can we close this issue and discuss on the older one, since there is already an issue present for this #10516.

Fine by me, but is it possible to put this older issue somewhere higher in the team backlog or resume discussion? The older issue seems quite stale, and the issue is quite present, moreover, with the current indexing it may be used as an attack vector on public nodes, as if you do a lot of such requests (asking for all delegations of a validator with a default limit, while this validator has 1 delegation, which will cause the fullnode to iterate over all delegations) to a public node, it'll apparently run out of resources and will stop being in touch with the rest of the network, keeping staying on one block (I had this behaviour on multiple nodes when doing a lot of such requests constantly).

@atheeshp
Copy link
Contributor

You may be right, but if we rely on the FilteredPagination the query won't be faster, since it is unmarshalling the delegations and checking delegation info. Assume the delegation entry is at the end of the state.

@freak12techno
Copy link
Contributor Author

freak12techno commented Feb 27, 2023

@atheeshp

Assume the delegation entry is at the end of the state.

agree for the case where it's at the end, but still it'll be faster if it's in the beginning or in the middle, as then we won't need to iterate over all entries for getting total count

@atheeshp
Copy link
Contributor

Hmm, but I feel that's may not be best way to do it. Let's wait to grab more eyes on this. Thanks for the thoughts @freak12techno

cc: @tac0turtle @AmauryM @julienrbrt

@alexanderbez
Copy link
Contributor

alexanderbez commented Feb 27, 2023

Storing the count doesn't make sense IMO. However, I would agree that having a reverse index of validator address to delegator could make sense. The general idea that always comes up in these types of convos is that the gRPC query layer isn't meant to service all needs. It meant to serve basic representations of data from the state machine. It's the role of indexers and other 3rd party tools to do the heavy lifting. If your tool or service relies on this particular bit of info, you should ideally be using an service like Numia or something.

We're trying to remove complexity from the gRPC layer not add more to it 👍

@freak12techno
Copy link
Contributor Author

I'd love to work on that if possible (as it's something that I would love to be fixed asap, as it breaks my monitoring tools), but I'd need some guidance on that, as I never contributed to cosmos-sdk before so I may have some knowledge gaps. Can someone describe the steps to be done to get this working, if it's not difficult?

@tac0turtle
Copy link
Member

agree with bez, i think we should strive to provide ways to do this off chain instead of on chain. Something like versionDB from crypto.com which will be adopted in the sdk would help with your issue and wouldnt need to use the underlying node.

This is a good query to test with storage v2 once we have a working poc. I would guess that if youre not asking for proofs, the performance increase would magnitudes.

@alexanderbez
Copy link
Contributor

I'd love to work on that if possible (as it's something that I would love to be fixed asap, as it breaks my monitoring tools

What's being suggested is that you rely on an indexing service for your monitoring tool :)

@freak12techno
Copy link
Contributor Author

But having the index on validators for delegations isn't only affecting the api endpoints performance, but also the chain itself sometimes, doesn't it? I mean the fullnode needs to ask for validators delegations at least when slashing a validator. So this should be a good thing to add not only because the api is slow here.
Additionally, this can be used to make public fullnodes consume a lot of resources and to get them out of touch with the rest of the chain.

(If the delegators count part is better to be done with external services that's fine by me, but at least indexing delegations by validator would benefit a lot.)

@alexanderbez
Copy link
Contributor

alexanderbez commented Mar 1, 2023

But having the index on validators for delegations isn't only affecting the api endpoints performance, but also the chain itself sometimes, doesn't it?

Nope! It's solely used for this one query. The key is used in the state machine yes, but no where in the state machine do we use this key.

@freak12techno
Copy link
Contributor Author

Okay it's still not clear for me if this would be fixed and if yes, when can it be fixed. My concern if that even if I'll switch my monitoring tools to use something like indexers, there are a lot of tools (namely, restake is the biggest one I guess) that do such queries and it can cause the node such request is done against to get out of sync with the rest of the chain (I had it on my own node until I switched domains on my public node so there's no requests from cosmos.directory: once in a while my node would consider itself to be caught up, while staying on a single block for more than an hour, then it somehow fixes itself, making a node kinda unusable during such periods, as you can't broadcast anything through that node and some different queries to that node fail. I highly doubt it's the hardware, as it doesn't happen with my validator node, which has lower specs, and the issue disappeared once there were no requests from cosmos.directory to my node).
Additionally it's inconvenient imo that there's an endpoint that allows you to query data from it that can make a fullnode basically not working for some period while it seems to be something that is not difficult to fix.

@alexanderbez can you please clarify my concerns?

@alexanderbez
Copy link
Contributor

Okay it's still not clear for me if this would be fixed and if yes, when can it be fixed.

Since the state machine doesn't need it, it's solely a client UX concern. That being the case, in order to support it would require the state machine to track it and in addition, would require a migration. To me, it's simply not worth it IMO.

there are a lot of tools (namely, restake is the biggest one I guess) that do such queries and it can cause the node such request is done against to get out of sync with the rest of the chain ...

Those tools too should be using indexers.

@freak12techno
Copy link
Contributor Author

@tac0turtle I see this issue got closed due to merging PR, nice to hear that!
Can you clarify which release it'll be included to and when can we expect it on Cosmos Hub?

@tac0turtle
Copy link
Member

yes!! @atheeshp killed it on finding a solution!! He is owed all the thanks!!

It will be in the eden release, its unclear when this release will land on the hub. We are aiming for a end of june release in the sdk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:x/staking T: Performance Performance improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants