Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0067 | Asset Name Label Registry #298

Merged
merged 13 commits into from
Oct 25, 2022

Conversation

alessandrokonrad
Copy link
Contributor

@alessandrokonrad alessandrokonrad commented Jul 14, 2022

This proposal defines a standard to classify Cardano native assets by the asset name.


see rendered Markdown

Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @alessandrokonrad and although it's a much different proposition I'll try to link this with the proposal #137 which has been stalled for about 3 months: not to compare them but to get some of the same community contributing to this review.

Keeping the token registry centralised (ugh) does address the security concerns raised by the earlier proposal, by leaving new inclusions up to manual verification by CIP editors (ugh) and requiring pull requests to this repo each time a new asset is added. I suppose that is the only alternative now to an agreed-upon, secure standard for keeping those records on chain... if so then 1) could your CIP please discuss the alternative possibility of an on-chain token registry & why your solution is better?

As you say in the Motivation "As more assets are minted" it's becoming harder for "third parties" to know what to do with them. Maybe all Cardano assets don't need to be included in this registry, and therefore the overhead of manual work on this centralised DB would be OK? If so then 2) perhaps you would explain more in the CIP text which assets the new registry would be applicable, and which would not? 🧐

@alessandrokonrad
Copy link
Contributor Author

alessandrokonrad commented Jul 15, 2022

thanks @alessandrokonrad and although it's a much different proposition I'll try to link this with the proposal #137 which has been stalled for about 3 months: not to compare them but to get some of the same community contributing to this review.

Keeping the token registry centralised (ugh) does address the security concerns raised by the earlier proposal, by leaving new inclusions up to manual verification by CIP editors (ugh) and requiring pull requests to this repo each time a new asset is added. I suppose that is the only alternative now to an agreed-upon, secure standard for keeping those records on chain... if so then 1) could your CIP please discuss the alternative possibility of an on-chain token registry & why your solution is better?

As you say in the Motivation "As more assets are minted" it's becoming harder for "third parties" to know what to do with them. Maybe all Cardano assets don't need to be included in this registry, and therefore the overhead of manual work on this centralised DB would be OK? If so then 2) perhaps you would explain more in the CIP text which assets the new registry would be applicable, and which would not? 🧐

The purpose of this CIP is to register token standards. This is not about registering specific assets, so it's not a central token registry.
Initially we only had CIP-0025 and the Cardano foundation off-chain registry. For a 3rd party it was fine to to work with that: Check if the asset has on-chain metadata following CIP-0025, if not then check the off-chain registry. But as new standards emerge a 3rd party needs to go through all options until one fits, but then it's still not clear if the correct method was used.
So I thought wouldn't it be cool to have a way to determine the token standard and type (NFT, FT, etc.) just from the assetname. The 3rd party knows exactly what to do with the asset and what type it is.
For instance my other proposal Datum Metadata Standard makes use of this CIP. There are 3 registered asset_name_labels:

  • 100: Reference NFT
  • 222: NFT => Look up Reference NFT and retrieve specific metadata from output datum
  • 333: FT => Look up Reference NFT and retrieve specific metadata from output datum

Because of the labels the 3rd party knows it has to look for the metadata in a datum and it knows the exact structure of the metadata as it's defined in the standard.

@rphair
Copy link
Collaborator

rphair commented Jul 15, 2022

thanks @alessandrokonrad then there would not be very much overhead & also not related very much to the other proposal discussion. I appreciate your clearing up my misunderstanding 😎


For example:

UTF-8 encoded: `(123)TestToken`\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, (333) ends up taking 5 bytes of ascii which is more than coming up with a binary specification for this. I guess the advantage is that this is human readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true, I think the compromise is worth it. With 5-6 bytes you still have 25-26 bytes free in the asset name.
I also thought about encoding it in binary, but you won't be able to ever utf-8 encode the asset name again. The only solution I have in mind here is to split the asset name in two parts if a label was detected in the asset name. Decode the label separately and the remaining asset name to make both parts human readable if applicable.

@michaelpj
Copy link
Contributor

This CIP needs a Rationale section that explains how it achieves its goals, and in particular whether it is secure.

In particular, there is no way to know whether the creator of a token knew about this standard or intended to adhere to it (without some other metadata channel), so it is potentially dangerous to assume that these labels have these meanings. An oblivious token issuer might use the (X) prefix for all kinds of things, leading to misinterpretation by clients that follow this standard. For example, in #299 (comment) this manifests as metadata spoofing attacks.

Ways to improve the security would be to use a more obscure encoding of the data so it's less likely to be used by accident, but otherwise I think this is a fundamental security problem with this proposal. It might be safe enough to use anyway, but this should be discussed explicitly in the CIP.


## Specification

To classify assets the `asset_name` needs to be prefixed with an opening and closing parentheses and the label in between: `({Label})`.
Copy link
Contributor Author

@alessandrokonrad alessandrokonrad Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatives to parentheses to make it more obscure:

  • Single colon, e.g. :222: (5 bytes)
  • Double colon, e.g. ::222:: (7 bytes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, what about prefixing every thing with CIP67:{label}: ? A bit longer, but it leaves 24-20 bytes for the asset name. One can also find a more compact binary encoding which would take less bytes but loose the "readability" aspect of it.

For example, consider the prefix to be 0xC067, that's only 2 bytes and reads well once hex-encoded.

@BlakeBrown
Copy link

This CIP needs a Rationale section that explains how it achieves its goals, and in particular whether it is secure.

@alessandrokonrad I read the rationale section, but "Asset name labels make it easy to classify assets." is kind of vague. If I understand correctly, you want this proposal to easily detect the metadata type in a Plutus validator?

In particular, there is no way to know whether the creator of a token knew about this standard or intended to adhere to it (without some other metadata channel), so it is potentially dangerous to assume that these labels have these meanings. An oblivious token issuer might use the (X) prefix for all kinds of things, leading to misinterpretation by clients that follow this standard. For example, in #299 (comment) this manifests as metadata spoofing attacks.

Ways to improve the security would be to use a more obscure encoding of the data so it's less likely to be used by accident, but otherwise I think this is a fundamental security problem with this proposal. It might be safe enough to use anyway, but this should be discussed explicitly in the CIP.

I appreciate the security input @michaelpj but I think in practice these aren't huge issues? (XXX) or :XXX: are both fairly obscure patterns. The onus is on the token issuer (NMKR, Anvil, JPG) to ensure that minted assets are following the latest CIPs. It would be exceptionally rare that an individual would mint an asset on their own AND accidentally classify their metadata incorrectly against this standard.

What might be more common is an attacker using the lack of verifiability to attack a smart contract, but that can be defended against.

Copy link

@BlakeBrown BlakeBrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we import the registry from CIP-10 to ensure backwards compatibility?

Would also appreciate an expansion in the rationale section.

Otherwise, LGTM! 👍

@perturbing
Copy link
Collaborator

Regarding the encoding of the standard in the token name. Since most 3rd parties do not rely on the actual token name for their displayed name but rather the metadata, the visual impact of the encode matters less. Besides the possible UTF-8 encoding «222» mentioned in CIP 68, we might choose to do it differently and encoded it in binary to reach the entropic limit of the encoding. This also opens the door to implement a checksum for extra obscurity. What are your thoughts? @michaelpj @SebastienGllmt @KtorZ

@perturbing
Copy link
Collaborator

perturbing commented Aug 31, 2022

Intro

I'll elaborate on my previous comment a bit (pun intended). For encoding the classifier prefix that captures what “kind” a token is, we want to propose an encoding using a binary representation. Instead of encoding it via «xxx», we would like to use a fixed length prefix of 4 bytes (32 bits). We chose 4 bytes because that is in our opinion a low-impact size on the total amount available of the 32 byte token name, while still providing room to encode information.

Given this fixed length prefix, we introduce a starting and ending delimiter 0000 in binary (hex 0x0) that wraps the other 24 bits. Here, the other 24 bits should encode the possible number of standards that might follow (the XXX in the proposal). Before I introduce our possible designs, I would like to abstract the situation to give context and rational for the following designs that differ in complexity but also security.

If we encode on the bit level, we can fit information in these 24 bits close to the entropic limit. In the design, we should consider that the encoding obscures the use of the standard to prevent accidental use of the standard. This means that the user who tries to follow this encoding should put effort in it. This ultimately results in adding some pseudo randomness to the encoding that is unlikely to be unintentionally copied. The pseudo-randomness is pseudo since it needs to be reproducible by third parties that check if the tokens follow a certain standard, I called this the above-mentioned “checksum”. Here the idea is to encode the standard used in binary (the label number) followed by some checksum. This checksum right now is ill-defined and could depend on multiple things (we will get back to that).

Details

Thus, a general prefix will take the form (in binary)

[0000 | n bit label number | (24 - n) bit checksum | 0000]

To check that a token name follows a standard, you perform these steps;

  1. Take the first four bytes from the token name, this might be a prefix of this standard
  2. Check that the first and last four bits of this prefix are 0000
  3. Extract the label number by taking the n bits after the initial four zero bits of this prefix
  4. Perform a checksum with this number and possibly with some extra entropy derived from the token name
  5. Compare the just calculated checksum to the (24-n) bits after the label number.

Here there are some choices to be made; namely, how many labels do we need to cover the future amount of token standards that might arise? This determined the number of bits necessary (the n above), for our proposed ~1000 asset labels you will need at least 10 bits (more precisely this gives you 1024 possible label numbers), we might consider more space for more possible asset labels. This might prevent a IPV4 to IPV6 migration in the future. Also, what representation do we choose for the bit representation of an integer (big or little endian)?

Moreover, the pseudo-randomness function is not yet defined. So again, this function has the utility to prevent the accidental usage of this standard by simply making the prefix look random. The chance of copying this standard then, given that the wrapper delimiters are in the first and last four bits of the prefix, goes like ~1/(2^(24-n)). This if we associate with each label number a unique 24-n bit pseudo random bit string. More explicitly, given some n, if someone accidentally had the prefix,

[0000 | 101.....011 | ????? |0000]               --- here the 101.....011 is of lenth n.

There are (2^(24-n)) possible bit string of length 24-n that could fill the spot with the question mark. But only one makes the prefix a valid one. But note that the argument also holds for the converse, given an accidental prefix of the form,

[0000 | ????? | 110.......101 |0000]             --- here the 110.....111 is of lenth 24-n.

There are (2^(24-n)) possibilities, but only some will make a valid standard such that the checksum corresponds to the question mark. I say “some” here, since the mapping does not need to be injective. The choice of this mapping determines the strength of the security of this standard.

Furthermore, note that given that n>12 we have by the dove tail argument that such a mapping can never be injective, a collision is guaranteed. Now, if the amount of these are low, this poses no significant extra security problem. But we need to consider it.

Lastly, we might want to consider what inputs we use in our “checksum” function. This could be only the n bit label number, or some extra entropy from the asset name. Note that this entropy cannot be derived from the policy ID, as this depends on the minting script that might need this prefix encoded in its logic.

Examples

Now that we have all the considerations mentioned, we expand three options + bonus option that we came up with that vary in complexity but also in security. All examples below assume big endians.

  1. A simple approach. Use as a checksum, a cross sum. This function adds the entries of the hex representation of the integer, and maps it to 8 bits (so n=16). As an example; the label 100 is in hex 0x64, so the cross sum is 0x6 + 0x4 = 0xA. Explicitly, this maps between the value 0x0 -> 0x00 and 0xFFFF -> 0x60.

This checksum function has as an advantage that it is easy to implement and light weight. A disadvantage is that it maps 2^16 values onto 2^8 values, there are many collisions. For example, both 0x64 and 0x46 map to 0xA (since addition is commutative and decomposing a number into sums is not unique). Besides, the correspondence between the domain and image does not look that random.

An implementation in TypeScript:

const checksum = (l: number): string =>
  l.toString(16).split("").reduce((acc, curr) => acc + parseInt(curr, 16), 0x0)
    .toString(16).padStart(2, "0");

const toLabel = (l: number): string => {
  if (l < 0 || l > 65535) {
    throw new Error(`Label ${l} out of range. Min Label: 0 - Max Label: 65535`);
  }
  return "0" + l.toString(16).padStart(4, "0") + checksum(l) +
    "0";
};

const fromLabel = (s: string): number | null => {
  if (s.length !== 8 || !(s[0] === "0" && s[7] === "0")) return null;
  const label = parseInt(s.slice(1, 5), 16);
  const check = s.slice(5, 7);
  return check === checksum(label) ? label : null;
};
  1. An intermediate approach. Use a hash function as a checksum function. Here you can vary the size of the domain and co-domain arbitrarily. If we want n=10, you can truncate the digest of the hash function to fit the size 24-n=14.

This function also has some collisions, though this is negligible if the domain is smaller than the co-domain. In the case they are equal, the number of collisions is around 3%. A good thing about this function is that it maps the inputs pseudo randomly.

  1. An complex approach. Use a LCG function with equal domain and co-domain size. This function can be made a bijection while preserving the pseudo randomly mapping of the inputs. This can be done by tweaking the parameters described in the wiki. The function does not need to have an equal sized domain and co-domain, given that the domain is smaller, it is always possible to keep the function injective. Even further, if we see the domain as a subset of the co-domain, this can be made into a bijection again. The conclusion is, the domain and co-domain can be of any size as long as the domain is smaller. This still making the function injective.

Since this function is an injection that also looks pseudo random, it minimized the chance of accidentally using the wrong pair of label number and checksum. The downside is that it is more involved and is computational more straining (the function is based on taking powers and modular computation).

A naive but correct implementation in Haskell for this function with parameters c=1, m=4096 and a=5.

-- This function is a bijection between any [x..x+2^12] and [0..4095].
modLinearCG :: Integer -> Integer
modLinearCG n = let sum 0 = 1 
                    sum x = (5^x) `mod` (2^12) + (sum (x-1)) `mod` (2^12)
                in sum n `mod` (2^12)
  1. All of the above mapping can extend their security if they add entropy from the rest of the token name to further decrease the likelihood of accidentally following this standard. This can be done by combining the label number together with some uniformly but deterministic sample from the token name. The combination can also be done pseudo random for an additional layer of complexity.

Conclusion

There are many ways we can encode this classification of assets in their token name. The design depends on the level of security that we together would like to have for this CIP, this needs to be balanced with its complexity. @alessandrokonrad and I have thought this through thoroughly and would love your technical opinion on this matter. Since your opinion is highly valued, I am tagging you once again @KtorZ @SebastienGllmt and @michaelpj, but anyone is welcome to give their view.

@@ -0,0 +1,62 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://github.com/cardano-foundation/CIPs/blob/master/CIP-?",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CIP-0067/registry.schema.json Show resolved Hide resolved
"examples": ["CIP-0025 - NFT Metadata Standard"]
}
},
"additionalProperties": true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow additional properties?


`asset_name_label` | description
---------------------------- | -----------------------
0 - 15 | reserved\*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the asterisk points to?

65536 - 131071 | reserved - private use

For the registry itself, please see [registry.json](./registry.json) in the machine-readable format. Please open your pull request against
this file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem we had / have with CIP-0010 is that there's no particular "rule" that defines what can go in the registry. As editors we try to do some basic sanity check and ask people to pitch / justify a bit their project; but it would be nice / preferable if these rules were specified in the specification itself. For example:


Adding an entry to the registry

To propose an addition to the registry edit the registry.json with your details, open a pull request against the CIPs repository and give a brief description of your project and how you intend to use metadata associated with the label entry.


## Specification

To classify assets the `asset_name` needs to be prefixed with an opening and closing parentheses and the label in between: `({Label})`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, additional suggestion: whatever the choice of format it, perhaps give an ABNF syntax to describe it so that symbols aren't misinterpreted (one person reading ({label}) might think for its label 42, the prefix would be: ({42}).

asset-name = asset-name-label asset-name-body

asset-name-label = "(" 1-5DIGIT ")"

asset-name-body = *OCTET ; exact length depends on the asset-name-label's length

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*I am only seeing / reading this comment now: #298 (comment)


## Motivation

As more assets are minted and different standards emerge to query data for these assets, it's getting harder for 3rd parties to determine the asset type and how to proceed with it. This standard is similar to [CIP-0010](https://github.com/cardano-foundation/CIPs/tree/master/CIP-0010), but focuses on the asset name of an asset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As more assets are minted and different standards emerge to query data for these assets, it's getting harder for 3rd parties to determine the asset type and how to proceed with it. This standard is similar to [CIP-0010](https://github.com/cardano-foundation/CIPs/tree/master/CIP-0010), but focuses on the asset name of an asset.
As more assets are minted and different standards emerge to query data for these assets, it's getting harder for 3rd parties to determine the asset type and how to proceed with it. This standard is similar to [CIP-0010](../CIP-0010), but focuses on the asset name of an asset.

@@ -0,0 +1,53 @@
---
CIP:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CIP:
CIP: 67

Status: Draft
Type: Informational
Created: 2022-07-13
Post-History:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Post-History:


## Rationale

Asset name labels make it easy to classify assets. It's important to understand that a registered label standard itself doesn't provide any security off-chain nor on-chain as they can be spoofed. Only in combination with the Policy ID security can be derived from the minting policy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answers I'd like to see in the Rationale section:

  • What are the 0-15 labels reserved for and how is this different from the 65536 - 131071 range?
  • How big is the risk of collision for a given choice of prefix? (i.e. how likely is it that a "randomly generated" -- or a hash digest -- asset name may be misinterpreted as a CIP-0067's label. (though the current rationale hints in that direction by recommending to always consider the asset label in conjunction with the policy id)
  • Is there any consideration regarding the size of the prefix? It seems to me that keeping the size of the prefix / asset name label under 4 bytes is preferable, because it allows to still embed 28-byte hash digests in the asset name.

@KtorZ
Copy link
Member

KtorZ commented Sep 9, 2022

@perturbing

Thanks for the detailed walkthrough on checksums options that you've considered. I have mainly two questions from reading your comment:

(1) I understand the willingness to avoid collision but, since this registry exists off-chain anyway, wouldn't it make sense to also bundle the policy id that supports some of the identified tokens? So, if a certain project wants to use or define a new label, they can do so by adding an entry to this registry including their policy id. Downstream components that support the standard can interpret only policies that are included in the registry. Since an asset name is always to be seen within the context of a policy, the checksum altogether becomes even redundant. Doesn't it?

(2) It seems to me that a checksum as large as the data payload is overkill. In this scenario, the role of the checksum is really to make it harder for people to unknowingly abide by the standard, then I think that a best effort is sufficient; given that there are also specific prefix and suffix that already make it less likely to happen. The prefix/suffix by themselves already makes the chance of accidental use of the standard less than 0.5%. Adding even a checksum on only 4 bits brings that down to ~0.025%, if my calculations are correct. And, that isn't even considering that the label itself must match an existing label. Say the standard becomes really popular and ends up with 1000s of labels, you'd still need to get an equality on the 20 remaining bits (though you have 1000s of possible cases, so that's ~0.1% probability of collision, which brings the overall collision probability to 0.000025% (2.5e-7). I think this is largely sufficient as a "best effort".

@SebastienGllmt
Copy link
Contributor

I'm not sure I understand why we're prefixing and suffixing with a bunch of 0s instead of just using a larger checksum. Is it meant for human readability by looking at the binary encoding?

One thing that might be interesting is instead of coming up with a new checksum algorithm to instead use bech32 with CIP67: as the prefix. Note that in bech32, the prefix is included in the checksum mechanism so other people can use the same concept in the future to change CIP67 to something else if they want to

The problem with bech32 is that it operates on 5-bit chunks which may be tricky and also that it only has 32 separate characters for display despite the fact cip67 encodes a utf8 string (so it means these assets would have multiple representations -- the bech32 representation and the name encoded inside. Although this is also kind of true with your custom 0-padding approach as well)

@alessandrokonrad
Copy link
Contributor Author

alessandrokonrad commented Sep 14, 2022

So, if a certain project wants to use or define a new label, they can do so by adding an entry to this registry including their policy id. Downstream components that support the standard can interpret only policies that are included in the registry

@KtorZ The goal of this registry is to define token standards. Having to register your token/project first into such a registry slows down things a lot and also makes it very hard for 3rd parties to verify labels. And it also centralizes things.

It seems to me that a checksum as large as the data payload is overkill.

I agree. Overall are you in favor of a binary encoded or utf-8 encoded label version?

I try to summarize the idea of CIP-0067 and our thought process that went into it since the creation of this CIP.
The idea is to classify/categorize assets by the asset name. Asset names are prefixed with a label (format yet to be determined). These labels are registered in the registry of this CIP. Each label in the registry points to a CIP defining the criteria for an asset with such a label. E.g. How should a 3rd party display it, what token type is it or where to get metadata from. Since Cardano relies heavily on tokens not only on the user side, but also on the plutus side (e.g. state thread tokens, reference NFTs from CIP-0068, etc.), labels could be used on every end. Giving explorers also a nice visual touch.

Initially we wanted to go with an UTF-8 encoded asset name label.

Advantage:

  • easy to use
  • directly human readable

Downside:

  • takes up more bytes of the asset name than necessary
  • higher chance someone follows this standard by accident
  • it feels hacky when using obscure ascii symbols to make following by accident impossible

So we thought a binary encoding that also renders nicely in hex is probably a better approach.

Advantage:

  • it's cleaner
  • more robust and chances are very low someone follows that by accident
  • far less bytes necessary
  • fixed size (e.g. with 2 bytes 65536 labels are possible)

Downside:

  • not directly human readable (but because of using a fixed size it's quite easy to slice off the label part from the asset name and decode it separately)
  • 3rd parties need to first of all adopt this standard so that things render nicely

And since a binary encoding is very space efficienct we thought to maybe include a checksum to make it even more obscure. In the end it doesn't matter if a checksum collides with the chechusm of another label, because all we want is to avoid that someone follows this standard by accident. The question is only should we have a checksum at all or is the binary encoding sufficient enough? Would this concept of a checksum make it unnecessarily difficult for tools and 3rd parties?

I'm not sure I understand why we're prefixing and suffixing with a bunch of 0s instead of just using a larger checksum. Is it meant for human readability by looking at the binary encoding?

@SebastienGllmt the initial four 0s and the last four are meant to be brackets. The idea is to have a bits length that can also be converted from and to hex easily. So yeah it makes it also easier to read when looking at the hex string.
And since 8 bits make up 1 byte for the bracket + 2 bytes for the label + eventually 1 byte for a checkusm that works well.

E.g. A label with 3 bytes (without checksum) in hex: 0ffff0 => Label number: 65535

@KtorZ
Copy link
Member

KtorZ commented Sep 27, 2022

@SebastienGllmt: One thing that might be interesting is instead of coming up with a new checksum algorithm to instead use bech32

I thought about bech32 as well although, it comes with probabilistic error-detection and as you said, operates over 5 bits already. So it feels like an overkill for this particular purpose.

I do agree with the "not re-inventing the wheel" statement though which is why I tend to be in favor of a relatively simple checksum solution. I would also consider a CRC before bech32.

@alessandrokonrad: Overall are you in favor of a binary encoded or utf-8 encoded label version?

The binary-encoded label is a much more robust approach. Especially because the asset name isn't meant to be a direct user-facing piece of information in principle. This is why we have metadata after all. Thus, for a standard which is ultimately about providing such metadata, I find it even ironic to make any effort to have the label somewhat human-readable 😬 !
I would also not over do it, to not hinder adoption of the standard. As I explained above, a best-effort checksum can already drastically reduce the chance of collision. If anything though, I'd choose a prefix / suffix that relates to the CIP a bit more (e.g. 0067....0067).

All-in-all, regardless of the solution chosen, the CIP will need to include test vectors and a reference implementation to ease development of different solutions.


## Rationale

Asset name labels make it easy to classify assets. It's important to understand that an oblivious token issuer might use the prefix X for all kinds of things, leading to misinterpretation by clients that follow this standard. We can minimize this attack vector by making the label format obscure. Brackets, checksum and fixed size binary encoding make it unlikely someone follows this standard by accident.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the numerous discussions that happened regarding the label's format, the choice of checksum, the brackets etc.. I'd have expected a slightly thicker Rationale section summarizing the various discussions, the choices considered and why they were rejected.

CIP-0067/README.md Show resolved Hide resolved

Asset name labels make it easy to classify assets. It's important to understand that an oblivious token issuer might use the prefix X for all kinds of things, leading to misinterpretation by clients that follow this standard. We can minimize this attack vector by making the label format obscure. Brackets, checksum and fixed size binary encoding make it unlikely someone follows this standard by accident.

## Reference Implementation(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Reference Implementation(s)
### Reference Implementation(s)

- [Lucid TypeScript implementation of toLabel/fromLabel](https://github.com/spacebudz/lucid/blob/39cd2129101bd11b03b624f80bb5fe3da2537fec/src/utils/utils.ts#L500-L522)
- [Lucid TypeScript implementation of CRC-8](https://github.com/spacebudz/lucid/blob/main/src/misc/crc8.ts)

## Test Vectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Test Vectors
### Test Vectors


To classify assets the `asset_name` needs to be prefixed the following `4 bytes` binary encoding:
```
[ 0000 | 2 bytes label_num | 1 byte checksum | 0000 ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[ 0000 | 2 bytes label_num | 1 byte checksum | 0000 ]
[ 0000 | 16 bits label_num | 8 bits checksum | 0000 ]

Since the bracket is given in bits, I find it more logical to speak about bits for the inner parts as well.

```
- The leading and ending four 0s are brackets
- `label_num` has a fixed size of 2 bytes (`Label range in decimal: [0, 65535]`).
If `label_num` < 2 bytes the remaining bits need to be padded with 0s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If `label_num` < 2 bytes the remaining bits need to be padded with 0s.
If `label_num` < 2 bytes the remaining bits need to be left-padded with 0s.

- Init: `0x00`
- RefIn: `false`
- RefOut: `false`
- XorOut: `0x00`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what these refer to exactly (I mean, I have only seen this notation in one particular implementation of the crc-8 algorithm in Go after searching for those specific terms). I seems that the literature mostly refer to either the lookup table and/or the polynomial representation. So it's probably sufficient to simply mention:

  • Polynomial representation (normal): 0x07
  • Lookup table: ...

CIP-0067/README.md Outdated Show resolved Hide resolved
Copy link
Member

@KtorZ KtorZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphair @SebastienGllmt @crptmppt

As discussed in editors meeting # 54 (iirc), this proposal has now addressed the different points that were raised. I am approving, and happy to merge as Proposed while implementation are being worked on.

Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great stuff 🤩

@rphair rphair merged commit 26e02ec into cardano-foundation:master Oct 25, 2022
@rphair
Copy link
Collaborator

rphair commented Oct 25, 2022

oh no @KtorZ I forgot to change status in top level README before merging... and this was not included in the housekeeping update. Will submit a PR for this after the housekeeping one is merged.

@KtorZ
Copy link
Member

KtorZ commented Oct 25, 2022

We're going too fast, that's a premiere for the CIP process 😄

@rphair rphair changed the title CIP-0067? | Asset Name Label Registry CIP-0067 | Asset Name Label Registry May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants