-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove bech32 data columns (or atleast make them optional) #1835
Comments
This alone sounds like a good reason. |
It's not just about Bech32; it's about text address representation in general. So, this applies to both Bech32 for Shelley and Base58 for Byron. I don't think removal is a good idea, but making them optional would be the best solution.In fact, raw address data is sufficient for everything. It can be naturally indexed without hashing, and it takes up less memory for storage, even if we're talking about a separate table for addresses in the future. Cardano is about CBOR, not text. If you need to find any text address in the database, just convert it to raw data—whether on the client/backend side or even at the PostgreSQL level. This will be even faster than implicit text hashing. Moreover, as for Shelley, we don't use text view at all due to the nature of the addresses. We only need the spending part, which is already stored as binary data. If we need to build a text address, we do it manually by deriving stake keys ourselves to ensure they are correct, and then we build the full address. It might be a good idea to use the |
A bit unfortunate that recently we removed the raw bytea address, with the assumption that text address is used more frequently.
It doesn't really change for bytea, especially since the size of addresses can be quite big due to older bugs. |
The bytea fields can still be made |
btree indexed data must not exceed 2704 bytes, but we have
raw bytes are ofcourse slightly smaller, but still. |
Ugh, these 17 records on mainnet are from byron era :( and pointer addresses are now retired. We'd have to live with that indexing forever on cardano? Coz currently (5 years history) , that index creation is taking 8-12 hours... coming years it might take multiple days for one index alone |
There should be some workaround, I haven't put much thought into it. We could insert a hash of the address when it's out of limits and introduce a new table |
Ye - was my preference too (to split them out to seperate table or column - especially with address detail moving out). Move those anomolies out and be able to optimise tx_out better. |
Problem Report
Currently dbsync stores bech32 encoded strings for various columns across tables.
These text fields are indexed as
hash
type, which are sub-optimal , as they cannot be used in parallel. Thus, restoring from a snapshot - ends up waiting for 10-12 hours for index againsttx_out.address
- as it only uses a single CPU/worker for this process (atleast as of Postgres 16, as listed here ).Secondly, the encoding standards are often based on CIP, and a change in CIP (eg: CIP129) would mean dbsync can only use/implement it when changing previous data, which could be quite an unnecessary data operation. Larger the history grows, bigger the resistance would be to add any improvements to naming if it has a large data migration dependency.
The request is for dbsync not to store bech32 data at all, or if they need to be - atleast make them optional. Instead, consumers can use the decoding/encoding on where filter of queries, by making use of extensions (eg: pg_bech32 , pg_base58, etc.). While hopefully the address field itself is moving to it's own table soon as part of #1820 , it's still a bit of an overkill to store duplicate copy of similar data.
*** Positive Outcomes ***
*** Cons ***
The text was updated successfully, but these errors were encountered: