Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simpler macro versions of ULE/VarULE macros to reduce syn deps #5127

Open
Manishearth opened this issue Jun 26, 2024 · 22 comments
Open

Add simpler macro versions of ULE/VarULE macros to reduce syn deps #5127

Manishearth opened this issue Jun 26, 2024 · 22 comments

Comments

@Manishearth
Copy link
Member

See #2310

In #5124 we're trying to slim down ICU4X's deps. A lot of our crates don't need the full power of the ule/varule macros, and it would be nice to expose MBE macro versions of them.

@Manishearth
Copy link
Member Author

There are a bunch of similar but distinct use cases for simple transparent wrappers.

  • A wrapper ULE type that has a custom AsULE impl pointing to it but otherwise transparently wraps a u8 or something. e.g. SkeletonDataIndexULE. We need a way to generate passthrough ULE impls for this.
  • A wrapper VarULE type that uses ref-cast like impls, e.g. NormalizedPropertyNameStr. We need a way to generate passtrhoguh ULE impls and ref-cast-like methods for this.
  • A wrapper AsULE type that is transparent for unclear reasons: e.g. every property type. Do these actually need to be transparent? Can these just be AsULE? Either way, this can be handled with a non-wrapping macro that just does impl_passthrough_asule!(ULEType, AsULEType, AsULEWrappedType) or something.

Furthermore:

  • make_ule on an enum is doable with a macro
  • make_ule on a struct is super doable with a macro

@robertbastian
Copy link
Member

How feasible is it to write zerovec-derive without syn? It'd still be more powerful than macro_rules.

@sffc
Copy link
Member

sffc commented Jun 26, 2024

I proposed one solution in #5101.

What other solutions might be considered?

@Manishearth
Copy link
Member Author

Manishearth commented Jun 26, 2024

@sffc I'm confused, this issue is proposing the other solution. It's listing out the macros that could potentially be written. I know how to write these macros. I'm not writing them yet because i want to figure out what the full gamut is before I start designing them. Is there something else you're looking for here?

How feasible is it to write zerovec-derive without syn? It'd still be more powerful than macro_rules.

It's ... really annoying I'd say. You'd end up implementing a half-baked parser..

Now, you could do something like implementing a macro that gets called as impl_wrapper_ule!(// docs go here, CanonicalCombiningClass, pub u8) as a proc macro, but at that point you can just do a regular macro just as easily.

Honestly, for the thing this issue is about, MBEs are quite sufficient. For things not listed in my comment above, I don't think a non-syn impl will be easier than MBE since MBEs actually let you implement a half-decent parser without writing an actual parser.

@Manishearth
Copy link
Member Author

I'm also not prioritizing working on this: I don't perceive this issue as urgent compared to 2.0. Please let me know if you think this should be higher priority, happy to start looking at this properly, but so far I don't see any reason to rush this. It's good to document all of the fruit hanging at any height for #5124 so that we have an idea of potential wins, but I'm not yet convinced I should spend time on this vs 2.0 issues.

(and 2.0 diplomat still needs work)

@sffc
Copy link
Member

sffc commented Jun 26, 2024

@sffc I'm confused, this issue is proposing the other solution. It's listing out the macros that could potentially be written.

But I wrote these macros in the PR, and the way I wrote them was rejected.

Currently the issue is describing the use cases behind the macros I proposed without proposing a concrete alternative.

I know how to write these macros.

You've said this, but I don't know what's in your head, and what tradeoffs different ways of writing the macros could entail.

I'm also not prioritizing working on this: I don't perceive this issue as urgent compared to 2.0.

This issue is unlikely to ever be higher priority than milestone tasks, but we've (or at least I've) been wanting these macros for a long time and we keep incurring debt without them. I opened my PR when I did because I felt I could justify the work since I was expecting two more of these landing this week (DataKeyAttributes and PatternULE which is needed for units patterns). We also delayed the Pattern rewrite for a long time since it was never higher priority than milestone tasks.

@Manishearth
Copy link
Member Author

Currently the issue is describing the use cases behind the macros I proposed without proposing a concrete alternative.

I don't want to actually put in the effort of writing these macros until the full set of use cases is enumerated. Making macros flexible is a lot of work, and it is ideal to come in understanding the full desired extent of flexibility.

And no, this is not describing the use cases behind the macros you proposed, it describes multiple others as well: you covered one of these use cases with that PR.

You've said this, but I don't know what's in your head, and what tradeoffs different ways of writing the macros could entail.

I don't see the point of going deep down implementation land until we know what we're hoping to cover. For that I need to look at the actual potential callsites.

What's in my head is more or less an understanding of what is possible with the macro system, it seems like you think I have a very specific implementation in mind? I have multiple ideas, and could implement it one way or the other, but I don't think we're there yet.

The hope is to not have any external difference: no new traits, no new per-crate unsafe code, just replacing a safe derive or make_ule with a safe wrapping macro invocation.

I believe I already put forth a rough sketch earlier in that thread, but here is what one such macro invocation can look like:

impl_wrapper_varule!(
 [cast: (cast_ref, cast_as_ref, cast_box, ..), ... other options .... ],

/// Docs
#[derive(..)]
pub struct FooWrapper(str);

);

and basically the macro applies the transparent for you.

Implementing the "options" will get tricky but not too tricky as long as you pick a good syntax. Handling pub/priv stuff will be annoying.

Highlighting that this is an example, the exact syntax and mechanisms are not yet something I've decided upon.

Happy to talk more about the design but it's still not clear to me what you want to know about it. I know enough to know that this is possible, but I can't have a more concrete design until I better understand all the use cases, which is precisely what I'm trying to do with this issue.

@Manishearth
Copy link
Member Author

Manishearth commented Jun 27, 2024

We also delayed the Pattern rewrite for a long time since it was never higher priority than milestone tasks.

Sure, but for me actually work on this before 2.0, some other 2.0 blocker work item is getting pushed back. Diplomat stuff is taking more time than expected so I'm trying to be careful with what I take on.

I'd rather schedule this as 2.x priority: to be fixed basically immediately after 2.0.

@Manishearth
Copy link
Member Author

Unless there's a reason this will impact the 2.0 design in some way: which is not entirely out of the question! If we can break a bunch of dependency links, and maybe add some features, that's a valid reason to make a breaking change that enables that.

Which is another reason I want to understand the use cases first, so we can see which crates we can liberate here.

But also so far I don't see a reason for these macros to cause 2.0 breakages. If there are, do let me know.

(One potential one is if we discover that e.g. properties can be 100% no-proc-macro if we turn bidi/ccc/whatever into a feature flag. Though that particular one can still be done stably. There may be others like it)

@sffc
Copy link
Member

sffc commented Aug 20, 2024

In terms of what's needed: honestly just ref-cast functionality that works across all standard pointery types solves ~60% of the call sites (by personal recollection; the exact list can be compiled upon request).

The other 40% would be to also have support for a validate_with function that checks invariants and then implements ULE, AsULE, and fallible versions of ref-cast functions.

But even just the ref-cast functions would be great because that's a big source of unsafe code. ULE impls are also unsafe so it would be nice to toss those in for good measure.

@Manishearth
Copy link
Member Author

@sffc I'd like some more detail on ULE vs VarULE. The rough three-way split in #5127 (comment) came from browsing through impls and realizing it's not as simple as one macro that "just does ref-cast impls".

@sffc
Copy link
Member

sffc commented Aug 28, 2024

I wasn't distinguishing between ULE and VarULE.

I don't immediately see why that matters for the design of the solution: struct U8Wrap(u8) and struct StrWrap(str) both need ref-cast impls from &Inner to &Self. The impls should be private or allowed to have a validation function so that they can enforce their invariants. AsULE impls are reflexive.

The actual unsafe impl ULE and unsafe impl VarULE are slightly different codegen, but still the same general principle.

A fairly clean solution would be to say that it is expected for these types to have a function named try_from_raw or similar (function name could be an argument to the macro), and then the ULE and VarULE impls invoke that function.

I think we can largely depend on the signature of the function for enforcing ULE/VarULE invariants, maybe with a little extra work in the codegen. For example,

#[repr(transparent)]
pub struct LowercaseStr(str);

impl_stuff!(LowercaseStr, str);

impl LowercaseStr {
    pub fn try_from_raw(raw: &str) -> Result<&Self, ParseError> {
        if str.is_ascii_lowercase() {
            Ok(Self::from_raw_unchecked(raw))
        } else {
            Err(ParseError::NotLowercase)
        }
    }
}

And impl_stuff generates the following code

impl LowercaseStr {
    const fn from_raw_unchecked(raw: &str) -> &Self {
        unsafe { core::mem::transmute(raw) }  // or a pointer cast, etc.
    }
}

unsafe impl VarULE for LowercaseStr {
    fn validate_byte_slice(bytes: &[u8]) -> Result<(), UleError> {
        // Thought: we need to pass a fn to the macro that first validates [u8] -> str
        let raw = core::str::from_utf8(bytes)?;
        Self::try_from_raw(raw).map(|_| ())
    }
    unsafe fn from_byte_slice_unchecked(bytes: &[u8]) -> &Self {
        let raw = core::str::from_utf8_unchecked(bytes);
        Self::from_raw_unchecked(bytes)
    }
}

What is wrong with this?

@Manishearth
Copy link
Member Author

I don't immediately see why that matters for the design of the solution

You're listing two use cases: our codebase has more, in particular, the ULE use case you list is not what we do in icu_properties, the crate that spawned this discussion. See point 3 in my use case list in #5127 (comment)

I started looking at this issue with the assumption that all we needed was ref-casting and realized that properties was doing something different. Perhaps properties doesn't need to be doing that, I'm not sure if it does, but it is currently doing something that cannot be replicated with a simple ref-cast macro.

For wrapper ULE there's broadly four different kinds of struct impls I see in the wild, for situations without further validation code:

  1. struct Foo(Inner) where Inner is already ULE, and we just need an AsULE impl. No macro needed, fortunately.
  2. #[repr(transparent] struct Foo(Inner) where Inner is already ULE where we want Foo be ULE too, with Foo being reflexively AsULE.
  3. struct Foo(Inner) where Inner is already ULE where we want to generate a #[repr(transparent)] FooULE(Inner)
  4. struct Foo(Inner) where Inner is not ULE where we want to generate a #[repr(transparent)] FooULE(<Inner as AsULE>::ULE)

Not all of these cases may be strictly necessary, in particular it feels like 3 is not worth it when you can just do 2, reflexive ULE types are better. But 3 is what icu_properties does.

And that's just "ULE wrapper types". We also have "ULE structs", "ULE enums", and "VarULE wrapper types". These each probably have a single use case, and we can defer structs/enums till later, but that's still a fair number of things.

@sffc
Copy link
Member

sffc commented Aug 28, 2024

Some general principles for a wrapper type Foo(T) (ULE only, not VarULE):

T impls ULE (like u8) T does NOT impl ULE (like u16)
Has Invariants (private field) Foo::ULE = Foo Foo::ULE = FooULE
No Invariants (public field) Foo::ULE = T::ULE = T Foo::ULE = T::ULE

If we consistently applied these principles, would it make things cleaner/easier?

@sffc
Copy link
Member

sffc commented Aug 28, 2024

One thing that might be making properties special is that there are some properties that are u8 and others that are u16, so then we need to decide whether to be more consistent within icu_properties or more consistent with the principles above. For example:

#[repr(transparent)]
#[zerovec::make_ule(ScriptULE)]
pub struct Script(pub u16);

#[repr(transparent)]
#[zerovec::make_ule(HangulSyllableTypeULE)]
pub struct HangulSyllableType(pub u8);

I guess the simplest way to clean that up would be to make all of the structs use their inner integer's ULE for their AsULE. u8::ULE is reflexive, and u16::ULE = RawBytesULE<2>, so this seems fine and consistent with my table.

@Manishearth
Copy link
Member Author

If we consistently applied these principles, would it make things cleaner/easier?

Yes, I think!

However, there's a valid discussion to be had whether in the bottom left case Foo::ULE = Foo is preferred (with repr(transparent) on the Foo. This is more annoying to enforce by macro, but possible, and it has the upside of enabling some of the nice reflexive AsULE APIs that we have. I don't know if anyone actually ever needs those that often.

@Manishearth
Copy link
Member Author

Also worth noting that this doesn't even need ref_cast impls since there in your table there are no new repr(transparent) types being introduced. In the modification I list out above there are.

@Manishearth
Copy link
Member Author

Oh, also, note:

#[repr(transparent)]
pub struct LowercaseStr(str);

impl_stuff!(LowercaseStr, str);

This would have to be

impl_stuff!(pub struct LowercaseStr(str));

unfortunately since we cannot detect the presence of the repr(transparent).

@Manishearth
Copy link
Member Author

I guess the simplest way to clean that up would be to make all of the structs use their inner integer's ULE for their AsULE. u8::ULE is reflexive, and u16::ULE = RawBytesULE<2>, so this seems fine and consistent with my table.

In favor. Perhaps we should do this refactoring first, see what we end up with, and move on from there? I think we have other inconsistencies in our ULE impls.

And this refactoring is a breaking change so needs to happen pre 2.0.

@sffc
Copy link
Member

sffc commented Aug 28, 2024

However, there's a valid discussion to be had whether in the bottom left case Foo::ULE = Foo is preferred (with repr(transparent) on the Foo. This is more annoying to enforce by macro, but possible, and it has the upside of enabling some of the nice reflexive AsULE APIs that we have. I don't know if anyone actually ever needs those that often.

If the bottom left case was Foo::ULE = Foo, then HangulSyllableType::ULE = HangulSyllableType but Script::ULE = RawBytesULE<2> which seems not great.

In favor. Perhaps we should do this refactoring first, see what we end up with, and move on from there? I think we have other inconsistencies in our ULE impls.

And this refactoring is a breaking change so needs to happen pre 2.0.

SGTM

@Manishearth
Copy link
Member Author

If the bottom left case was Foo::ULE = Foo, then HangulSyllableType::ULE = HangulSyllableType but Script::ULE = RawBytesULE<2> which seems not great.

Ah, yeah. It's nicer for types that already contain reflexive ULEs. So I guess it's a choice between special casing those (and having nice reflexive ULE apis) or having cleaner and more consistent macros. I'm fine with either, I don't think those nice reflexive ULE APIs are that useful.

@sffc
Copy link
Member

sffc commented Aug 28, 2024

I realized that although we can/should reduce the number of impls of ULE when they are redundant (struct with public field), we need VarULE implemented in order for the type to show up in a VarZeroVec. I edited my previous post to clarify that it only applies to ULE. When VarULE is involved, we always need the VarULE impl (and we don't need an AsULE impl).

Manishearth added a commit that referenced this issue Sep 18, 2024
)

Some progress on #5127. Now
there are only two `make_ule`s in zerovec, for enums.

Followups from #5548
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants