Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] - Provide a query UTxO by address query that does not depend on the node #4541

Closed
dnadales opened this issue Oct 18, 2022 · 18 comments
Closed
Labels
area: utxohd comp: cardano-cli needs-grooming 🌳 Issues that need to be properly defined status: needs more info Insufficient information, needs clarification. type: enhancement An improvement on the existing functionality user type: internal Created by an IOG employee

Comments

@dnadales
Copy link
Member

Querying the UTxO set by address should not be implemented by relying on a query that the node provides. The node and upstream components should provide enough information so that other functionality can be implemented on top of it. cardano-cli could provide support for querying the UTxO set by address by interfacing with a service that maintains an index of UTxO set by address.

@Jimbo4350
Copy link
Contributor

I disagree. This is incredibly useful for testing and having to run another tool to simply query the utxo will be a PITA.

@dnadales
Copy link
Member Author

CC: @marshada

@ch1bo
Copy link
Contributor

ch1bo commented Oct 20, 2022

If you remove this, expect quite a big backlash. There needs to be a clear way how to re-achieve the feature you will be removing and lot's of communication (at least one version deprecation cycle?)

So what's the alternative? Ship the node with kupo?

@dnadales
Copy link
Member Author

There needs to be a clear way how to re-achieve the feature you will be removing and lot's of communication (at least one version deprecation cycle?)

Absolutely. We should not break existing user-facing functionality.

@abailly-iohk
Copy link
Contributor

abailly-iohk commented Oct 21, 2022

kupo has a benchmarks page that shows querying UTxO by address performance: https://github.com/CardanoSolutions/kupo/tree/master/benchmarks#query Seems like well under a second around 2 seconds for a heavily used address (jpeg.store smart contract IIUC). AFAIK it does it by running a SQL query over an internal sqlite DB. My personal experiments a while ago for Mithril, using a sqlite DB populated with all the mainnet's UTxO, gave me similar results.

So indeed as you note @ch1bo we could delegate this job to external tools, be it kupo or marconi or any other chain indexer out there.

What I fail to understand though, is why we can't provide this seemingly simple feature natively, given we DO store the UTxO on disk. But I wasn't involved in the decisions on UTxO HD so I am probably missing a lot of context. @dnadales could you point at some design documents or decision records on this feature?

@abailly-iohk
Copy link
Contributor

Perhaps a way to settle the issue would be to involve the community through a CIP, possibly with a broader scope of "Delegate complex queries to node to 3rd party tools"?

@CarlosLopezDeLara CarlosLopezDeLara added type: enhancement An improvement on the existing functionality user type: internal Created by an IOG employee status: needs more info Insufficient information, needs clarification. comp: cardano-cli area: utxohd and removed enhancement New feature or request labels Oct 21, 2022
@dnadales
Copy link
Member Author

dnadales commented Oct 21, 2022

What I fail to understand though, is why we can't provide this seemingly simple feature natively, given we DO store the UTxO on disk.

As far as I understand this would require adding additional complexity to consensus, and we'd rather limit the scope of its responsibilities given the already immense essential complexity this component has to deal with. However I do not have a strong preference regarding where this feature should live, but I do incline towards having leaner components.

@dnadales could you point at some design documents or decision records on this feature?

utxo-db.pdf
utxo-db-api.pdf
source

@dcoutts
Copy link
Contributor

dcoutts commented Oct 21, 2022

@abailly-iohk yes, the only way to make it fast is to use an index. The original feature was a quick convenient hack that got totally out of hand. (My bad.) And yes, sqlite would be perfectly good as an implementation of such and index, and the right place to do that is in an indexing client, and not within the node process itself.

One of our general architectural principles for the node and its external clients is that we make all chain information available to clients, but we do not store information in the node (and provide client access) that the node does not itself need. The node is part of the trusted base of the system. It is complex and has strong security and performance requirements. It is the wrong place to build database features. External clients are the perfect place to build database features.

The right solution is a client that maintains an index and provides query access. There are then multiple choices: provide a minimal one that just replicates what the CLI provided before (via the node), or use some existing more general purpose indexer, or some combo to satisfy different use cases.

@abailly-iohk
Copy link
Contributor

@dcoutts Sure, I am in agreement with the general principles, but given the fact quite a few people out there rely on this feature, and have been for quite while, we need to have a clear migration path and buy-in before considering this feature should be removed, lest we generate more resentment.

@dnadales Thanks a lot for the links and the context.

What do people think of putting the discussion in the open through a CIP?

@dnadales
Copy link
Member Author

we need to have a clear migration path and buy-in before considering this feature should be removed

Absolutely, we're not advocating removing this and let the users without an alternative.

@dnadales Thanks a lot for the links and the context.

My pleasure. And thank you @dcoutts for providing the rationale for this change.

What do people think of putting the discussion in the open through a CIP?

I have no objections :)

@CarlosLopezDeLara
Copy link
Contributor

What do people think of putting the discussion in the open through a CIP?

Makes sense!

@dcoutts
Copy link
Contributor

dcoutts commented Oct 21, 2022

@abailly-iohk I agree a replacement (or combination) needs to be decided on. It's clear there are multiple choices, we just need to pick one or more and if that includes new indexing components or CLI integration then we need to find the time to do that work.

And there is no proposal to immediately remove the feature from the node, but it will become slower and slower.

@gitmachtl
Copy link
Contributor

gitmachtl commented Oct 27, 2022

many 3rd party applications/tools rely on the fact that you can do all the query via cardano-cli. removing this functionality would cause a lot of trouble. a useful solution would be to maybe add components so an additional indexed based database can also be written out by the node like kupo is doing it. doing basic/essential queries and work with the core components node+cli is critical imo. there should not be a need for another 3rd party tool to do such fundamental things.

@dcoutts
Copy link
Contributor

dcoutts commented Oct 29, 2022

@gitmachtl thanks for the feedback, indeed, one of the plausible options is that the CLI can still be used, either with built-in indexing, or the CLI querying another simple indexing process that is bundled with the node & cli.

Here's a couple of the plausible options:

  1. For use cases that only need to query a fixed set of addresses: the CLI could have simple built-in indexing where each time the CLI needs to query addresses, it loads the most recent index snapshot, and fast-forwards it via the node. Or if it's too far behind, it can start with a slow linear query to the node to scan for the set of addresses requested. The size of the on-disk snapshots would be small: proportional to the size of the fixed set of addresses. This would not be fast if the querying is used infrequently, but it would not require running another process.
  2. For use cases that need to query arbitrary addresses, or need reliably quick query times: run a dedicated indexing client alongside the node and cli, and have the cli query that indexer process as the backend to the cli's query command. Bundle the indexer with the node and cli. This would use something like sqlite3. The on-disk index would be proportional in size to the utxo.

Of course it's easy to say lets do all these things, but of course there's always limited development time & resources available, so part of the question is what gives the best result for the most people with the least development effort.

@Jimbo4350
Copy link
Contributor

It it possible to keep the UTxO consensus query as is with the caveat that it will become slower (not true for majority of simple testnet cases) as well as implement a dedicated indexing client for when people care about performance?

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

@github-actions github-actions bot added the Stale label Nov 29, 2022
@gitmachtl
Copy link
Contributor

keep open - ping

@github-actions github-actions bot removed the Stale label Nov 30, 2022
@disassembler
Copy link
Contributor

closing in favor of #4678

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: utxohd comp: cardano-cli needs-grooming 🌳 Issues that need to be properly defined status: needs more info Insufficient information, needs clarification. type: enhancement An improvement on the existing functionality user type: internal Created by an IOG employee
Projects
None yet
Development

No branches or pull requests

8 participants