Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow duplicated enum variant indices #628

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

pkhry
Copy link

@pkhry pkhry commented Sep 11, 2024

Description

  • Disallows usage of duplicated variant indices during derivation.
    e.g
enum Example {
    #[codec(index = 0)]
    A, 
    B = 0,  // Error here
    C,
}
  • Assigns unique indexes for each enum variant based on heuristic described below:
    • if codec::index is present -> use it and add the index to the set of duplicates. If there are duplicate indexes -> return an error.
    • if there is explicit discriminant(e.g. A = 3) -> use it and add it to the set of duplicates. If there are any duplicates in the set of used indexes or explicit indexes -> return an error.
    • no discriminant/explicit index -> use an incrementing number starting from 0, if number is in the used set it will get incremented until its no longer in set.
enum Example {
    #[codec(index = 0)]
    A, 
    B = 1, 
    C, // Will have index `2`
}

enum Example {
    #[codec(index = 1)]
    A, 
    B = 2, 
    C, // Will have index `0`
}

See also:

#507

@pkhry pkhry requested a review from koute September 11, 2024 08:38
assert_eq!(T::B.encode(), vec![0]);
}
#[test]
fn index_attr_vairant_duplicates_indices() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you testing here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments, but basically that indexes are correct in the presence of explicit disciminants/ codec(index =?)

derive/src/decode.rs Outdated Show resolved Hide resolved
.base10_parse::<u8>()
.expect("Internal error, index attribute must have been checked");
return Some(byte);
pub struct UsedIndexes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docs.

Also this struct should only store the used_set. The function from_iter is not required.

You only need variant_index, which should be a verbatim copy of the old variant_index function plus that you add the resulting index to used_set and check if it already exists.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added docs.
we need current to assign incremental indexes to variants in ascending order also we need to traverse the variants first and collect all the assigned indexes before we can process them to check for duplicates and have a set of indexes already used, so that when we encounter a variant without explicit discriminant/codec(index = ?) we could assign an index using UsedIndexes.current by incrementing it until the u8 is not contained inside the used set.

.expect("Internal error, index attribute must have been checked");
return Some(byte);
pub struct UsedIndexes {
used_set: HashSet<u8>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a HashMap<u8, Span> to show a proper error message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, not it's a hashMap. Now when we encounter a duplicate variant index we would also see span of the previous use site

@gui1117
Copy link
Contributor

gui1117 commented Oct 2, 2024

This looks like a difficult breaking change for users.

They would have to manually check all their usage to see if any index is changing.

Maybe it would be better to have a safe transition.

Also if we are to change the indices, the most idiomatic order could be to follow the rust index implementation maybe.

enum Foo
    A = 3,
    B, // is 4

This discussion can be relevant: #173

}
} else if let Some((
_,
expr @ syn::Expr::Lit(ExprLit { lit: syn::Lit::Int(lit_int), .. }),
Copy link
Contributor

@gui1117 gui1117 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't capture any discriminant, user can write this:

#[derive(Encode, Decode)]
enum A {
	A = 3 + 4,
}

I think it can be ok to constraint our implementation, but then we should compile-error if the expression is not a int literal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with to disallow this pattern

@gui1117
Copy link
Contributor

gui1117 commented Oct 2, 2024

I like the PR, but I think we could keep the order as it is currently, compile an error if the discriminant expression is not a int literal, and compile an error if there is some duplicated indices.

@pkhry
Copy link
Author

pkhry commented Oct 2, 2024

This looks like a difficult breaking change for users.

They would have to manually check all their usage to see if any index is changing.

Maybe it would be better to have a safe transition.

Also if we are to change the indices, the most idiomatic order could be to follow the rust index implementation maybe.

enum Foo
    A = 3,
    B, // is 4

This discussion can be relevant: #173

I feel like it's very hard to achieve safe transition, initial impl that i pushed tried to index each variant in a manner that was somewhat compatible with some of the things already existing.

I also pushed an update that indexes variants in a way that's closer to the rust spec

@pkhry pkhry requested a review from gui1117 October 2, 2024 14:03
@gui1117
Copy link
Contributor

gui1117 commented Oct 2, 2024

This looks like a difficult breaking change for users.
They would have to manually check all their usage to see if any index is changing.
Maybe it would be better to have a safe transition.
Also if we are to change the indices, the most idiomatic order could be to follow the rust index implementation maybe.

enum Foo
    A = 3,
    B, // is 4

This discussion can be relevant: #173

I feel like it's very hard to achieve safe transition, initial impl that i pushed tried to index each variant in a manner that was somewhat compatible with some of the things already existing.

I also pushed an update that indexes variants in a way that's closer to the rust spec

I see the commit for the compile error on complex expression, but not the other commit?

Thinking again the index variant closer to rust spec is not so important. I think the most important is to not break user code.

IMO we don't need to change the index, I know the index is not ideal, but any index will feel not so ideal (some people will prefer your initial implementation which is great, some other will prefer to follow rust discriminant order).

So maybe we should just not change the order and instead have a compile error when there is conflict? what do you think?

also revert indexing behaviour to the initial approach
@pkhry
Copy link
Author

pkhry commented Oct 15, 2024

This looks like a difficult breaking change for users.
They would have to manually check all their usage to see if any index is changing.
Maybe it would be better to have a safe transition.
Also if we are to change the indices, the most idiomatic order could be to follow the rust index implementation maybe.

enum Foo
    A = 3,
    B, // is 4

This discussion can be relevant: #173

I feel like it's very hard to achieve safe transition, initial impl that i pushed tried to index each variant in a manner that was somewhat compatible with some of the things already existing.
I also pushed an update that indexes variants in a way that's closer to the rust spec

I see the commit for the compile error on complex expression, but not the other commit?

Thinking again the index variant closer to rust spec is not so important. I think the most important is to not break user code.

IMO we don't need to change the index, I know the index is not ideal, but any index will feel not so ideal (some people will prefer your initial implementation which is great, some other will prefer to follow rust discriminant order).

So maybe we should just not change the order and instead have a compile error when there is conflict? what do you think?

  • reverted to the indexing strategy used initially, we now just give out a compile error after encountering duplicates.
  • added ui tests for failing cases

@pkhry pkhry requested a review from bkchr October 21, 2024 11:29
Copy link
Contributor

@gui1117 gui1117 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is good, some refactor and then I will approve.

@@ -19,4 +19,5 @@ fn scale_codec_ui_tests() {
let t = trybuild::TestCases::new();
t.compile_fail("tests/scale_codec_ui/*.rs");
t.pass("tests/scale_codec_ui/pass/*.rs");
t.compile_fail("tests/scale_codec_ui/fail/*.rs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not put those tests at "tests/scale_codec_ui/" ?
or move all failing test in the fail dir.

(anyway I don't mind)

@@ -0,0 +1,23 @@
error: Duplicate variant index. qed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QED means: quod erat demonstrandum = which was to be demonstrated.

I think it doesn't fit in an error message.

Maybe writing: scale codec error: Invalid variant index, the variant index is duplicated.

derive/src/utils.rs Show resolved Hide resolved
@gui1117
Copy link
Contributor

gui1117 commented Nov 4, 2024

Also I just found out we can assert that variant indices are unique using constant evaluation.
No need to enforce that discriminant is a literal (so no breaking change at all)

A draft implementation look like this:

diff --git a/derive/src/encode.rs b/derive/src/encode.rs
index a29eec7..798237f 100644
--- a/derive/src/encode.rs
+++ b/derive/src/encode.rs
@@ -345,7 +345,7 @@ fn impl_encode(data: &Data, type_name: &Ident, crate_path: &syn::Path) -> TokenS
                                                        }
                                                };
 
-                                               [hinting, encoding]
+                                               [hinting, encoding, index]
                                        },
                                        Fields::Unnamed(ref fields) => {
                                                let fields = &fields.unnamed;
@@ -378,7 +378,7 @@ fn impl_encode(data: &Data, type_name: &Ident, crate_path: &syn::Path) -> TokenS
                                                        }
                                                };
 
-                                               [hinting, encoding]
+                                               [hinting, encoding, index]
                                        },
                                        Fields::Unit => {
                                                let hinting = quote_spanned! { f.span() =>
@@ -394,13 +394,14 @@ fn impl_encode(data: &Data, type_name: &Ident, crate_path: &syn::Path) -> TokenS
                                                        }
                                                };
 
-                                               [hinting, encoding]
+                                               [hinting, encoding, index]
                                        },
                                }
                        });
 
-                       let recurse_hinting = recurse.clone().map(|[hinting, _]| hinting);
-                       let recurse_encoding = recurse.clone().map(|[_, encoding]| encoding);
+                       let recurse_hinting = recurse.clone().map(|[hinting, _, _]| hinting);
+                       let recurse_encoding = recurse.clone().map(|[_, encoding, _]| encoding);
+                       let recurse_indices = recurse.clone().map(|[_, _, index]| index);
 
                        let hinting = quote! {
                                // The variant index uses 1 byte.
@@ -411,6 +412,23 @@ fn impl_encode(data: &Data, type_name: &Ident, crate_path: &syn::Path) -> TokenS
                        };
 
                        let encoding = quote! {
+                                       const _: () = {
+                                               let indices = [#( #recurse_indices ,)*];
+                                               let len = indices.len();
+
+                                               // Check each pair for uniqueness
+                                               let mut i = 0;
+                                               while i < len {
+                                                       let mut j = i + 1;
+                                                       while j < len {
+                                                               if indices[i] == indices[j] {
+                                                                       panic!("TODO: good error message with variant names and indices");
+                                                               }
+                                                               j += 1;
+                                                       }
+                                                       i += 1;
+                                               }
+                                       };
                                match *#self_ {
                                        #( #recurse_encoding )*,
                                        _ => (),

@pkhry
Copy link
Author

pkhry commented Nov 4, 2024

Also I just found out we can assert that variant indices are unique using constant evaluation. No need to enforce that discriminant is a literal (so no breaking change at all)

A draft implementation look like this:

diff --git a/derive/src/encode.rs b/derive/src/encode.rs
index a29eec7..798237f 100644
--- a/derive/src/encode.rs
+++ b/derive/src/encode.rs
@@ -345,7 +345,7 @@ fn impl_encode(data: &Data, type_name: &Ident, crate_path: &syn::Path) -> TokenS
                                                        }
                                                };
 
-                                               [hinting, encoding]
+                                               [hinting, encoding, index]
                                        },
                                        Fields::Unnamed(ref fields) => {
                                                let fields = &fields.unnamed;
@@ -378,7 +378,7 @@ fn impl_encode(data: &Data, type_name: &Ident, crate_path: &syn::Path) -> TokenS
                                                        }
                                                };
 
-                                               [hinting, encoding]
+                                               [hinting, encoding, index]
                                        },
                                        Fields::Unit => {
                                                let hinting = quote_spanned! { f.span() =>
@@ -394,13 +394,14 @@ fn impl_encode(data: &Data, type_name: &Ident, crate_path: &syn::Path) -> TokenS
                                                        }
                                                };
 
-                                               [hinting, encoding]
+                                               [hinting, encoding, index]
                                        },
                                }
                        });
 
-                       let recurse_hinting = recurse.clone().map(|[hinting, _]| hinting);
-                       let recurse_encoding = recurse.clone().map(|[_, encoding]| encoding);
+                       let recurse_hinting = recurse.clone().map(|[hinting, _, _]| hinting);
+                       let recurse_encoding = recurse.clone().map(|[_, encoding, _]| encoding);
+                       let recurse_indices = recurse.clone().map(|[_, _, index]| index);
 
                        let hinting = quote! {
                                // The variant index uses 1 byte.
@@ -411,6 +412,23 @@ fn impl_encode(data: &Data, type_name: &Ident, crate_path: &syn::Path) -> TokenS
                        };
 
                        let encoding = quote! {
+                                       const _: () = {
+                                               let indices = [#( #recurse_indices ,)*];
+                                               let len = indices.len();
+
+                                               // Check each pair for uniqueness
+                                               let mut i = 0;
+                                               while i < len {
+                                                       let mut j = i + 1;
+                                                       while j < len {
+                                                               if indices[i] == indices[j] {
+                                                                       panic!("TODO: good error message with variant names and indices");
+                                                               }
+                                                               j += 1;
+                                                       }
+                                                       i += 1;
+                                               }
+                                       };
                                match *#self_ {
                                        #( #recurse_encoding )*,
                                        _ => (),

it's possible, but should we do const eval in that place?

  • Decode would still be using the new variant indexing code.
  • adds additional need to keep the variant spans around in case when we need the error.

@gui1117
Copy link
Contributor

gui1117 commented Nov 4, 2024

I'm good with the current PR.
I just didn't know it was possible with const evaluation.

If we go the const evaluation way, we should do for both decode and encode. There won't be overhead as it is const evaluation. The error will be a bit less pretty but probably ok.

Anyway we can go with the PR as it is.

@pkhry pkhry requested a review from gui1117 November 5, 2024 12:13
@gui1117
Copy link
Contributor

gui1117 commented Nov 7, 2024

don't hate me, but I am conflicted, the current PR is good but it is a breaking change. We no longer support enums with discriminant being an expression rather than a literal.

And there to achieve the goal of this PR we can do it without a breaking change, by using constant evaluation.

People regularly complains that libraries are breaking too often, maybe we should avoid this breaking change and implement with the constant evaluation.

@pkhry
Copy link
Author

pkhry commented Nov 7, 2024

don't hate me, but I am conflicted, the current PR is good but it is a breaking change. We no longer support enums with discriminant being an expression rather than a literal.

And there to achieve the goal of this PR we can do it without a breaking change, by using constant evaluation.

People regularly complains that libraries are breaking too often, maybe we should avoid this breaking change and implement with the constant evaluation.

All good, I'll re-do it

@pkhry
Copy link
Author

pkhry commented Nov 8, 2024

don't hate me, but I am conflicted, the current PR is good but it is a breaking change. We no longer support enums with discriminant being an expression rather than a literal.

And there to achieve the goal of this PR we can do it without a breaking change, by using constant evaluation.

People regularly complains that libraries are breaking too often, maybe we should avoid this breaking change and implement with the constant evaluation.

don't hate me, but I am conflicted, the current PR is good but it is a breaking change. We no longer support enums with discriminant being an expression rather than a literal.

And there to achieve the goal of this PR we can do it without a breaking change, by using constant evaluation.

People regularly complains that libraries are breaking too often, maybe we should avoid this breaking change and implement with the constant evaluation.

Ok, it seems like string interpolation is not possible without external dependency inside constant blocks, see alternate PR's resulting error message. the only way to have string interpolation in constant blocks is to use const_format from what i've understood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants