Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] - Query utxo indexer service #4678

Open
disassembler opened this issue Nov 29, 2022 · 16 comments
Open

[FR] - Query utxo indexer service #4678

disassembler opened this issue Nov 29, 2022 · 16 comments
Labels
type: enhancement An improvement on the existing functionality type: internal feature Non user-facing functionality user type: internal Created by an IOG employee

Comments

@disassembler
Copy link
Contributor

disassembler commented Nov 29, 2022

Internal/External
Internal if an IOHK staff member.

Area
Other Any other topic (Delegation, Ranking, ...).

Describe the feature you'd like
Do not use ledger state for query utxo.

The utxo on-disk work will make querying utxo from ledger state really slow (it's already really slow and getting slower as utxo size increases). What we need is a separate service that talks to the node socket to keep track of all utxos (or maybe a subset of interesting ones that the user defines). What we want to do is provide a similar interface, e.g.

cardano-cli query utxo --testnet-magic 1 --address addr_test1vzpwq95z3xyum8vqndgdd9mdnmafh3djcxnc6jemlgdmswcve6tkw

But not use the ledger state queries to get this information. Essentially, this will need to something like foldBlocks or marconi does indexing state transitions on the chain from genesis and keeping it separate from the node itself.

Describe alternatives you've considered

  • Slow querying to disk
  • in memory boolean in config for utxo to remain in memory
    Additional context / screenshots
    Add any other context or screenshots about the feature request here.

From @dcoutts

@abailly-iohk yes, the only way to make it fast is to use an index. The original feature was a quick convenient hack that got totally out of hand. (My bad.) And yes, sqlite would be perfectly good as an implementation of such and index, and the right place to do that is in an indexing client, and not within the node process itself.

One of our general architectural principles for the node and its external clients is that we make all chain information available to clients, but we do not store information in the node (and provide client access) that the node does not itself need. The node is part of the trusted base of the system. It is complex and has strong security and performance requirements. It is the wrong place to build database features. External clients are the perfect place to build database features.

The right solution is a client that maintains an index and provides query access. There are then multiple choices: provide a minimal one that just replicates what the CLI provided before (via the node), or use some existing more general purpose indexer, or some combo to satisfy different use cases.

@disassembler disassembler added the type: enhancement An improvement on the existing functionality label Nov 29, 2022
@disassembler disassembler changed the title [FR] - [FR] - Query utxo indexer service Nov 29, 2022
@abailly-iohk
Copy link
Contributor

There already exists quite a few indexers out there. And I can recommend one from a former colleague of mine: https://github.com/CardanoSolutions/kupo
What would probably be the best route moving forward would be to define a common interface for this kind of query that both clients (cardano-cli) and servers (kupo, marconi, oura, blockfrost...) would agree on implementing. Then the cardano-cli could be pointed at various services, possibly with fallbacks.

@xoriole
Copy link

xoriole commented Dec 5, 2022

cardano-ledger-index service for Cardano

cardano-ledger-index service provides helpful queries for ledger state. It maintains extra indexes for faster querying of ledger state. Built on Haskell. Uses SQLite for persistence.

Reference: Marconi
https://github.com/input-output-hk/plutus-apps/tree/main/marconi

Support for queries:

  • query utxo by address
  • query utxo by stake-address

Future queries support:

  • query balance by address or stake address
  • query addresses by stake-address
  • query pool live stakes
  • query pool delegations.

How is it different from dbsync

  • cardano-db-sync indexes everything on the blockchain, and maintains all the past data. Finding current state requires multiple aggregations and is intensive.
  • cardano-ledger-index indexes and maintains only current state of blockchain. It will be faster to query current ledger state of address, utxos, delegations, pools etc.

@disassembler
Copy link
Contributor Author

@abailly-iohk is our head of architecture. He'll be providing some feedback on how he wants this to be architected.

@Jimbo4350
Copy link
Contributor

I'd like to petition to keep the UTxO query for the case of local testnets. We have tests in cardano-node that utilize the UTxO query and in this instance performance is not an issue since the UTxO is small. The UTxO query is also useful for quick and dirty debugging especially when integrating a new era. Not having to setup an additional indexer would save myself and QA additional work.

@abailly-iohk
Copy link
Contributor

The goal is to keep the UTxO query in the cardano-cli in order to not disrupt (too much) downstream users, but to move the work needed to execute the query outside of the node/consensus.
I think that for the simple (testnets) use case the role of an indexer could be fulfilled by some stateless query tool able to read directly the node's Chain DB, much like what the existing db-analyzer command-line tool does.

@abailly-iohk
Copy link
Contributor

abailly-iohk commented Dec 5, 2022

@disassembler We would need some numbers on the current vs. forecasted performance of this (and possibly other) kind of queries, and a better understanding of what's the target/requirement from users' perspective. I think @dnadales or @jasagredo have already run or will run some benchmarks for the utxo-hd case.

@abailly-iohk
Copy link
Contributor

Also, we really would want to have some some kind of impact study which implies talking to various node users to understand what would be an acceptable solution.

@abailly-iohk
Copy link
Contributor

@disassembler Shouldn't we reclassify this issue as Spike or Experiment based on DQ's work?

@dQuadrantDev
Copy link

While being stuck on ouroboros-network, we have explored the alternative approach of working on the rpc command integration to cardano-cli.

A non-production version for the marconi-mamba with rpcs and update of cardano-cli seems doable within this week. By rpc we are refering to json rpc simlar to bitcoin and ethereum

  • We'll update the marconi-mamba application to support rpc for balance and utxo query on indexed db.
  • We'll update cardano-cli with new subcommand rpc so commane like:
    cardano-cli rpc ada_getutxos or cardano-cli rpc ada_getbalance
    will return the response via the rpc.

Meanwhile, we can also explore the ouroboros-network for the proxying part where a meeting with ouroboros-network team would greatly help.

Marconi

@mesudip
Copy link

mesudip commented Dec 21, 2022

We we were looking at ways to intercept and interpret cli-to-node communication and return query result using different service.

But it seems that the setting up an intercepting service was harder than we thought. cardano-api didn't have enough interface for this purpose and we looked at the ouroboros-network repo.

Can somebody point out how to setup an node-to-client service with ouroboros-network library but different handlers for the protocol messages?

Also, all the query apis in cardano-api seems to close connection to the node-socket after the query, Is there a way to implement the query service with persisted connection?

@dQuadrantDev
Copy link

@disassembler The non-production demo is available here.
https://github.com/dQuadrant/plutus-apps/blob/feature/rpc/Readme.md

@mesudip
Copy link

mesudip commented Jan 27, 2023

I've created a PR. #4810

@gitmachtl
Copy link
Contributor

gitmachtl commented Jan 30, 2023

Can we have an output, that replicates the

cardano-cli query utxo --testnet-magic 1 --address addr_test1vzpwq95z3xyum8vqndgdd9mdnmafh3djcxnc6jemlgdmswcve6tkw

one? With that i mean a plaintext output like the original one. Many tools on the cli rely on this kind of output, because of the still existing bug in jq to work with really large numbers. For that reason also f.e. koios sends the amounts as strings. The current output from cardano-cli query utxo - if you send it to an out-file - is json, but tools like jq (most common on the cli) is having a problem with those large numbers (lovelaces). So, many tools are just using the plaintext output that cardano-cli provides and read the values from there. It would be extremly awesome and helpful if there would be a "cli compatible" output mode to it can just replace the old command.

@rdlrt
Copy link

rdlrt commented Jan 31, 2023

The addition from PR seems a bit less than ideal to my eyes - essentially proposal is using cardano-cli for curl substitution just for user facing convenience, calling out small deficiencies that I see :

  • I find it strange to mention integration of generic RPC without any CIP standard for methods and formats alignment. Moreover, It seems this is intended to initially integrate via marconi (for instance, earlier referred Kupo provides exactly similar functionalities). From history : work with different teams often ends up becoming a bottleneck across updates (supporting versions/forks/feature additions/modifications, etc) - if teams are not aligned and working against a standard, these incompatibilities would become difficult to manage.
  • The solutions based on sqlite do not scale well (horizontally/vertically/clustering), and can be easily corrupted - while this is only intended to keep 'live view' which is substantially smaller subset of data - populating these could still take a while(?). Having said that - it could be a seperate issue if going down generic RPC lane.
  • There's often a lot of repetition for creating HTTP layer (not just a working GET/POST interface) from scratch which eventually becomes handing off basic security functionality to proxy layers (like nginx/haproxy/caddy). If not considered as standard practice - would lead insecure instances operating in public.
  • Likewise, it would also be essential for client (CLI) to negotiate with these transport mechanisms accordingly in ways that are compatible with common security additions (at minimum negotiation over TLS, providing error-mechanisms, etc)

@mesudip
Copy link

mesudip commented Feb 6, 2023

@gitmachtl @rdlrt
Thank you for your feedback on the PR. I would like to clarify that this is not a final PR, It was a quick implementation based on what's already available to show what can be done. I can understand your concerns regarding compatibility, scalability and security.

Based on the above feedback I have listed following points to be considered for final implementation

  • Prepare CIP standard for set of JSON RPC methods that a cardano-indexing-solution can support. An implementation may support a subset of methods.
  • Use marconi for connecting with cardano-node, but prepare an interface for backend storage that can be easily swapped. We can start with a backend that supports SQL standard.
  • In our cardano-cli RPC command PR, we have already included option to add basic-auth. We will add Basic-Auth configuration to the rpc-server. I think we can leave out SSL/TLS part to the proxy layer.
  • Regarding the output format, we can start with JSON, as it's the most recognized format. Using plaintext output for building wrapper services is one of the issues currently in cardano cli/ecosystem in general. I think we should try to standardize the output format of CLI, maybe by adding --format for the output format as suggested in [FR] - CLI improvements for formatting and consistency #3213

Given the considerations outlined, do you believe that this is the right way to go forward with the proposed changes?

Regards,
- mesudip

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement An improvement on the existing functionality type: internal feature Non user-facing functionality user type: internal Created by an IOG employee
Projects
None yet
Development

No branches or pull requests

10 participants