-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIP-0067 | Asset Name Label Registry #298
CIP-0067 | Asset Name Label Registry #298
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @alessandrokonrad and although it's a much different proposition I'll try to link this with the proposal #137 which has been stalled for about 3 months: not to compare them but to get some of the same community contributing to this review.
Keeping the token registry centralised (ugh) does address the security concerns raised by the earlier proposal, by leaving new inclusions up to manual verification by CIP editors (ugh) and requiring pull requests to this repo each time a new asset is added. I suppose that is the only alternative now to an agreed-upon, secure standard for keeping those records on chain... if so then 1) could your CIP please discuss the alternative possibility of an on-chain token registry & why your solution is better?
As you say in the Motivation "As more assets are minted" it's becoming harder for "third parties" to know what to do with them. Maybe all Cardano assets don't need to be included in this registry, and therefore the overhead of manual work on this centralised DB would be OK? If so then 2) perhaps you would explain more in the CIP text which assets the new registry would be applicable, and which would not? 🧐
The purpose of this CIP is to register token standards. This is not about registering specific assets, so it's not a central token registry.
Because of the labels the 3rd party knows it has to look for the metadata in a datum and it knows the exact structure of the metadata as it's defined in the standard. |
thanks @alessandrokonrad then there would not be very much overhead & also not related very much to the other proposal discussion. I appreciate your clearing up my misunderstanding 😎 |
CIP-0067/README.md
Outdated
|
||
For example: | ||
|
||
UTF-8 encoded: `(123)TestToken`\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, (333) ends up taking 5 bytes of ascii which is more than coming up with a binary specification for this. I guess the advantage is that this is human readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's true, I think the compromise is worth it. With 5-6 bytes you still have 25-26 bytes free in the asset name.
I also thought about encoding it in binary, but you won't be able to ever utf-8 encode the asset name again. The only solution I have in mind here is to split the asset name in two parts if a label was detected in the asset name. Decode the label separately and the remaining asset name to make both parts human readable if applicable.
This CIP needs a Rationale section that explains how it achieves its goals, and in particular whether it is secure. In particular, there is no way to know whether the creator of a token knew about this standard or intended to adhere to it (without some other metadata channel), so it is potentially dangerous to assume that these labels have these meanings. An oblivious token issuer might use the Ways to improve the security would be to use a more obscure encoding of the data so it's less likely to be used by accident, but otherwise I think this is a fundamental security problem with this proposal. It might be safe enough to use anyway, but this should be discussed explicitly in the CIP. |
CIP-0067/README.md
Outdated
|
||
## Specification | ||
|
||
To classify assets the `asset_name` needs to be prefixed with an opening and closing parentheses and the label in between: `({Label})`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatives to parentheses to make it more obscure:
- Single colon, e.g.
:222:
(5 bytes) - Double colon, e.g.
::222::
(7 bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, what about prefixing every thing with CIP67:{label}:
? A bit longer, but it leaves 24-20 bytes for the asset name. One can also find a more compact binary encoding which would take less bytes but loose the "readability" aspect of it.
For example, consider the prefix to be 0xC067
, that's only 2 bytes and reads well once hex-encoded.
@alessandrokonrad I read the rationale section, but "Asset name labels make it easy to classify assets." is kind of vague. If I understand correctly, you want this proposal to easily detect the metadata type in a Plutus validator?
I appreciate the security input @michaelpj but I think in practice these aren't huge issues? What might be more common is an attacker using the lack of verifiability to attack a smart contract, but that can be defended against. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we import the registry from CIP-10 to ensure backwards compatibility?
Would also appreciate an expansion in the rationale section.
Otherwise, LGTM! 👍
Regarding the encoding of the standard in the token name. Since most 3rd parties do not rely on the actual token name for their displayed name but rather the metadata, the visual impact of the encode matters less. Besides the possible UTF-8 encoding |
IntroI'll elaborate on my previous comment a bit (pun intended). For encoding the classifier prefix that captures what “kind” a token is, we want to propose an encoding using a binary representation. Instead of encoding it via Given this fixed length prefix, we introduce a starting and ending delimiter If we encode on the bit level, we can fit information in these 24 bits close to the entropic limit. In the design, we should consider that the encoding obscures the use of the standard to prevent accidental use of the standard. This means that the user who tries to follow this encoding should put effort in it. This ultimately results in adding some pseudo randomness to the encoding that is unlikely to be unintentionally copied. The pseudo-randomness is pseudo since it needs to be reproducible by third parties that check if the tokens follow a certain standard, I called this the above-mentioned “checksum”. Here the idea is to encode the standard used in binary (the label number) followed by some checksum. This checksum right now is ill-defined and could depend on multiple things (we will get back to that). DetailsThus, a general prefix will take the form (in binary)
To check that a token name follows a standard, you perform these steps;
Here there are some choices to be made; namely, how many labels do we need to cover the future amount of token standards that might arise? This determined the number of bits necessary (the Moreover, the pseudo-randomness function is not yet defined. So again, this function has the utility to prevent the accidental usage of this standard by simply making the prefix look random. The chance of copying this standard then, given that the wrapper delimiters are in the first and last four bits of the prefix, goes like ~
There are
There are Furthermore, note that given that Lastly, we might want to consider what inputs we use in our “checksum” function. This could be only the ExamplesNow that we have all the considerations mentioned, we expand three options + bonus option that we came up with that vary in complexity but also in security. All examples below assume big endians.
This checksum function has as an advantage that it is easy to implement and light weight. A disadvantage is that it maps An implementation in TypeScript:
This function also has some collisions, though this is negligible if the domain is smaller than the co-domain. In the case they are equal, the number of collisions is around 3%. A good thing about this function is that it maps the inputs pseudo randomly.
Since this function is an injection that also looks pseudo random, it minimized the chance of accidentally using the wrong pair of label number and checksum. The downside is that it is more involved and is computational more straining (the function is based on taking powers and modular computation). A naive but correct implementation in Haskell for this function with parameters
ConclusionThere are many ways we can encode this classification of assets in their token name. The design depends on the level of security that we together would like to have for this CIP, this needs to be balanced with its complexity. @alessandrokonrad and I have thought this through thoroughly and would love your technical opinion on this matter. Since your opinion is highly valued, I am tagging you once again @KtorZ @SebastienGllmt and @michaelpj, but anyone is welcome to give their view. |
CIP-0067/registry.schema.json
Outdated
@@ -0,0 +1,62 @@ | |||
{ | |||
"$schema": "http://json-schema.org/draft-07/schema", | |||
"$id": "https://github.com/cardano-foundation/CIPs/blob/master/CIP-?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://cips.cardano.org/cips/cip67/registry.schema.json should (eventually) work.
e.g. https://cips.cardano.org/cips/cip10/registry.schema.json
CIP-0067/registry.schema.json
Outdated
"examples": ["CIP-0025 - NFT Metadata Standard"] | ||
} | ||
}, | ||
"additionalProperties": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allow additional properties?
CIP-0067/README.md
Outdated
|
||
`asset_name_label` | description | ||
---------------------------- | ----------------------- | ||
0 - 15 | reserved\* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the asterisk points to?
CIP-0067/README.md
Outdated
65536 - 131071 | reserved - private use | ||
|
||
For the registry itself, please see [registry.json](./registry.json) in the machine-readable format. Please open your pull request against | ||
this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem we had / have with CIP-0010 is that there's no particular "rule" that defines what can go in the registry. As editors we try to do some basic sanity check and ask people to pitch / justify a bit their project; but it would be nice / preferable if these rules were specified in the specification itself. For example:
Adding an entry to the registry
To propose an addition to the registry edit the registry.json
with your details, open a pull request against the CIPs repository and give a brief description of your project and how you intend to use metadata associated with the label entry.
CIP-0067/README.md
Outdated
|
||
## Specification | ||
|
||
To classify assets the `asset_name` needs to be prefixed with an opening and closing parentheses and the label in between: `({Label})`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, additional suggestion: whatever the choice of format it, perhaps give an ABNF syntax to describe it so that symbols aren't misinterpreted (one person reading ({label})
might think for its label 42, the prefix would be: ({42})
.
asset-name = asset-name-label asset-name-body
asset-name-label = "(" 1-5DIGIT ")"
asset-name-body = *OCTET ; exact length depends on the asset-name-label's length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*I am only seeing / reading this comment now: #298 (comment)
CIP-0067/README.md
Outdated
|
||
## Motivation | ||
|
||
As more assets are minted and different standards emerge to query data for these assets, it's getting harder for 3rd parties to determine the asset type and how to proceed with it. This standard is similar to [CIP-0010](https://github.com/cardano-foundation/CIPs/tree/master/CIP-0010), but focuses on the asset name of an asset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As more assets are minted and different standards emerge to query data for these assets, it's getting harder for 3rd parties to determine the asset type and how to proceed with it. This standard is similar to [CIP-0010](https://github.com/cardano-foundation/CIPs/tree/master/CIP-0010), but focuses on the asset name of an asset. | |
As more assets are minted and different standards emerge to query data for these assets, it's getting harder for 3rd parties to determine the asset type and how to proceed with it. This standard is similar to [CIP-0010](../CIP-0010), but focuses on the asset name of an asset. |
CIP-0067/README.md
Outdated
@@ -0,0 +1,53 @@ | |||
--- | |||
CIP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CIP: | |
CIP: 67 |
CIP-0067/README.md
Outdated
Status: Draft | ||
Type: Informational | ||
Created: 2022-07-13 | ||
Post-History: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-History: |
CIP-0067/README.md
Outdated
|
||
## Rationale | ||
|
||
Asset name labels make it easy to classify assets. It's important to understand that a registered label standard itself doesn't provide any security off-chain nor on-chain as they can be spoofed. Only in combination with the Policy ID security can be derived from the minting policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answers I'd like to see in the Rationale section:
- What are the 0-15 labels reserved for and how is this different from the 65536 - 131071 range?
- How big is the risk of collision for a given choice of prefix? (i.e. how likely is it that a "randomly generated" -- or a hash digest -- asset name may be misinterpreted as a CIP-0067's label. (though the current rationale hints in that direction by recommending to always consider the asset label in conjunction with the policy id)
- Is there any consideration regarding the size of the prefix? It seems to me that keeping the size of the prefix / asset name label under 4 bytes is preferable, because it allows to still embed 28-byte hash digests in the asset name.
Thanks for the detailed walkthrough on checksums options that you've considered. I have mainly two questions from reading your comment: (1) I understand the willingness to avoid collision but, since this registry exists off-chain anyway, wouldn't it make sense to also bundle the policy id that supports some of the identified tokens? So, if a certain project wants to use or define a new label, they can do so by adding an entry to this registry including their policy id. Downstream components that support the standard can interpret only policies that are included in the registry. Since an asset name is always to be seen within the context of a policy, the checksum altogether becomes even redundant. Doesn't it? (2) It seems to me that a checksum as large as the data payload is overkill. In this scenario, the role of the checksum is really to make it harder for people to unknowingly abide by the standard, then I think that a best effort is sufficient; given that there are also specific prefix and suffix that already make it less likely to happen. The prefix/suffix by themselves already makes the chance of accidental use of the standard less than 0.5%. Adding even a checksum on only 4 bits brings that down to ~0.025%, if my calculations are correct. And, that isn't even considering that the label itself must match an existing label. Say the standard becomes really popular and ends up with 1000s of labels, you'd still need to get an equality on the 20 remaining bits (though you have 1000s of possible cases, so that's ~0.1% probability of collision, which brings the overall collision probability to 0.000025% (2.5e-7). I think this is largely sufficient as a "best effort". |
I'm not sure I understand why we're prefixing and suffixing with a bunch of 0s instead of just using a larger checksum. Is it meant for human readability by looking at the binary encoding? One thing that might be interesting is instead of coming up with a new checksum algorithm to instead use bech32 with The problem with bech32 is that it operates on 5-bit chunks which may be tricky and also that it only has 32 separate characters for display despite the fact cip67 encodes a utf8 string (so it means these assets would have multiple representations -- the bech32 representation and the name encoded inside. Although this is also kind of true with your custom 0-padding approach as well) |
@KtorZ The goal of this registry is to define token standards. Having to register your token/project first into such a registry slows down things a lot and also makes it very hard for 3rd parties to verify labels. And it also centralizes things.
I agree. Overall are you in favor of a binary encoded or utf-8 encoded label version? I try to summarize the idea of CIP-0067 and our thought process that went into it since the creation of this CIP. Initially we wanted to go with an Advantage:
Downside:
So we thought a binary encoding that also renders nicely in hex is probably a better approach. Advantage:
Downside:
And since a binary encoding is very space efficienct we thought to maybe include a checksum to make it even more obscure. In the end it doesn't matter if a checksum collides with the chechusm of another label, because all we want is to avoid that someone follows this standard by accident. The question is only should we have a checksum at all or is the binary encoding sufficient enough? Would this concept of a checksum make it unnecessarily difficult for tools and 3rd parties?
@SebastienGllmt the initial four 0s and the last four are meant to be brackets. The idea is to have a bits length that can also be converted from and to hex easily. So yeah it makes it also easier to read when looking at the hex string. E.g. A label with 3 bytes (without checksum) in hex: |
I thought about bech32 as well although, it comes with probabilistic error-detection and as you said, operates over 5 bits already. So it feels like an overkill for this particular purpose. I do agree with the "not re-inventing the wheel" statement though which is why I tend to be in favor of a relatively simple checksum solution. I would also consider a CRC before bech32.
The binary-encoded label is a much more robust approach. Especially because the asset name isn't meant to be a direct user-facing piece of information in principle. This is why we have metadata after all. Thus, for a standard which is ultimately about providing such metadata, I find it even ironic to make any effort to have the label somewhat human-readable 😬 ! All-in-all, regardless of the solution chosen, the CIP will need to include test vectors and a reference implementation to ease development of different solutions. |
CIP-0067/README.md
Outdated
|
||
## Rationale | ||
|
||
Asset name labels make it easy to classify assets. It's important to understand that an oblivious token issuer might use the prefix X for all kinds of things, leading to misinterpretation by clients that follow this standard. We can minimize this attack vector by making the label format obscure. Brackets, checksum and fixed size binary encoding make it unlikely someone follows this standard by accident. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the numerous discussions that happened regarding the label's format, the choice of checksum, the brackets etc.. I'd have expected a slightly thicker Rationale
section summarizing the various discussions, the choices considered and why they were rejected.
CIP-0067/README.md
Outdated
|
||
Asset name labels make it easy to classify assets. It's important to understand that an oblivious token issuer might use the prefix X for all kinds of things, leading to misinterpretation by clients that follow this standard. We can minimize this attack vector by making the label format obscure. Brackets, checksum and fixed size binary encoding make it unlikely someone follows this standard by accident. | ||
|
||
## Reference Implementation(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Reference Implementation(s) | |
### Reference Implementation(s) |
CIP-0067/README.md
Outdated
- [Lucid TypeScript implementation of toLabel/fromLabel](https://github.com/spacebudz/lucid/blob/39cd2129101bd11b03b624f80bb5fe3da2537fec/src/utils/utils.ts#L500-L522) | ||
- [Lucid TypeScript implementation of CRC-8](https://github.com/spacebudz/lucid/blob/main/src/misc/crc8.ts) | ||
|
||
## Test Vectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Test Vectors | |
### Test Vectors |
CIP-0067/README.md
Outdated
|
||
To classify assets the `asset_name` needs to be prefixed the following `4 bytes` binary encoding: | ||
``` | ||
[ 0000 | 2 bytes label_num | 1 byte checksum | 0000 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ 0000 | 2 bytes label_num | 1 byte checksum | 0000 ] | |
[ 0000 | 16 bits label_num | 8 bits checksum | 0000 ] |
Since the bracket is given in bits, I find it more logical to speak about bits for the inner parts as well.
CIP-0067/README.md
Outdated
``` | ||
- The leading and ending four 0s are brackets | ||
- `label_num` has a fixed size of 2 bytes (`Label range in decimal: [0, 65535]`). | ||
If `label_num` < 2 bytes the remaining bits need to be padded with 0s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If `label_num` < 2 bytes the remaining bits need to be padded with 0s. | |
If `label_num` < 2 bytes the remaining bits need to be left-padded with 0s. |
CIP-0067/README.md
Outdated
- Init: `0x00` | ||
- RefIn: `false` | ||
- RefOut: `false` | ||
- XorOut: `0x00` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what these refer to exactly (I mean, I have only seen this notation in one particular implementation of the crc-8 algorithm in Go after searching for those specific terms). I seems that the literature mostly refer to either the lookup table and/or the polynomial representation. So it's probably sufficient to simply mention:
- Polynomial representation (normal):
0x07
- Lookup table: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rphair @SebastienGllmt @crptmppt
As discussed in editors meeting # 54 (iirc), this proposal has now addressed the different points that were raised. I am approving, and happy to merge as Proposed
while implementation are being worked on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great stuff 🤩
oh no @KtorZ I forgot to change status in top level README before merging... and this was not included in the housekeeping update. Will submit a PR for this after the housekeeping one is merged. |
We're going too fast, that's a premiere for the CIP process 😄 |
This proposal defines a standard to classify Cardano native assets by the asset name.
see rendered Markdown