Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Magic zeroslice! macros #3454

Closed
wants to merge 6 commits into from

Conversation

skius
Copy link
Member

@skius skius commented May 25, 2023

Part of #1935.
Depends on #3453.

Potentially makes the zeroslice!/zerovec! macros a lot more user-friendly:

// old:
const C: &ZeroSlice<char> = zeroslice![char; <char as AsULE>::ULE::from_array; 'e']; // works
const I: &ZeroSlice<i32> = zeroslice![i32; <i32 as AsULE>::ULE::from_array; -1]; // doesn't work

// new:
const C: &ZeroSlice<char> = zeroslice![char; 'e']; // works
const I: &ZeroSlice<i32> = zeroslice![i32; -1]; // works

The implementation is a bit complex, mostly due to the fact that the AsULE <=> AsULE::ULE relationship is many-to-one, as seen, for example, with i32/u32 and RawBytesULE<4>. This PR works around that issue by adding a new public trait ConstAsULE that redirects the AsULE type (i.e., i32 or u32) to a singular concrete type that implements const conversion between AsULE and AsULE::ULE. In other words, ConstAsULE <=> ConstAsULE::ConstConvert is one-to-one. If it wasn't for the many-to-one relationship, ConstAsULE::ConstConvert could have just been the AsULE::ULE type always, but this doesn't work as mentioned before.

The zeroslice!/zerovec! macros have thus been extended to allow a shorter invocation that uses the default path directly to the type specified in ConstAsULE, which makes the short hand form at the beginning of this description possible.

The long hand form that specifies a conversion function explicitly still exists.

The whole issue of many-to-one relationship could be avoided by simply asserting that, e.g., u32 <=> RawBytesULE<4> is the canonical relationship. However, I think that's unfriendly to the user, as the following shows:

// EXAMPLE WITHOUT `ConstAsULE`

zeroslice![u32; 1]; // works
zeroslice![i32; 1]; // doesn't work, is unclear why it doesn't work, and requires a lot more work to be implemented:

const fn i32_arr_to_ule_arr<const N: usize>(...) -> ... {...}
zeroslice![i32; i32_arr_to_ule_arr; 1]; // works

@skius
Copy link
Member Author

skius commented May 25, 2023

As touched upon in #3455, using ConstAsULE::ConstConvert instead of AsULE::ULE for doing the short-hand version of the macros would also allow for ZeroSlice<u8> creation, because we cannot implement a <u8 as AsULE>::ULE::from_array function ourselves.

If we do not care about the short-hand version of the macro, this point is moot, as one can use zeroslice![u8; core::convert::identity; ...bytes...].

@robertbastian
Copy link
Member

Hmm, not a fan of the ConstAsULE that has certain methods by convention (not by type).

Another approach would be to special-case types that have colliding ULE types. #[make_ule] always generates a fresh ULE type, and for users we can just say that the macro only works with one-to-one ULE types.

    (u8; $($x:expr),+ $(,)?) => (
        $crate::ZeroSlice::<u8>::from_ule_slice(
            {const X: &[u8] = &[$($x),+]; X}
        )
    );
    (char; $($x:expr),+ $(,)?) => (
        ...
    );

@robertbastian
Copy link
Member

robertbastian commented May 25, 2023

Please rebase this on #3453, this seems to iterate on that.

@skius skius force-pushed the add-magic-zeroslice-macros branch from a8119fb to 6469df6 Compare May 25, 2023 16:11
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • utils/zerovec/src/zerovec/mod.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@skius
Copy link
Member Author

skius commented May 25, 2023

@robertbastian thanks, I had not thought of special-casing in the macro itself. I agree that we should avoid the ConstAsULE trait and the cost in this case is that users cannot implement custom many-to-one ULE types, which sounds reasonable.

If these "magic" macros are something that we decide we want, I will implement that change.

@skius
Copy link
Member Author

skius commented May 25, 2023

Discussion with @robertbastian:

Alternatives to the ConstAsULE trait are as follows:
i. As mentioned above, special-casing every AsULE type that does not have a 1:1 relationship with its ULE type in the zeroslice! macro
ii. We could remove problematic 1:many types, by providing newtypes for every time a ULE is used, e.g., I32ULE and U32ULE. This would allow for a cleaner zeroslice! implementation.

In both cases users would not be able to (nicely) use zeroslice! on user-defined 1:many ULE types, but if we document zeroslice! appropriately, I don't think that's a big deal.

Solution i would require less code changes in the short-term, and would not change any <T as AsULE>::ULE relationships. Solution ii would necessarily change e.g. <i32 as AsULE>::ULE from RawBytes<4> to I32ULE.

@sffc @Manishearth what do you think?

EDIT:

Summary of what needs to be decided:

First, do we want a version of the macros that works by convention (i.e., the name of an associated pub const fn and on which type it lives), or should we leave it at the basic, explicit converter zeroslice![<type>; <const_converter>; <elts>] version?

  1. Basic
  2. Magic, implemented as follows:
    1. Special-casing every AsULE type that does not have a 1:1 relationship with its ULE type in the zeroslice! macro
    2. Remove problematic 1:many types, by providing newtypes for every AsULE impl with the same ULE, e.g., I32ULE and U32ULE
    3. Use the ConstAsULE workaround/convention as proposed by the current state of this PR

Personally, I'm somewhat strongly for 2, but I don't care about which 2.x specifically. Leaning towards 2.i.

@robertbastian robertbastian added the discuss Discuss at a future ICU4X-SC meeting label May 25, 2023
@robertbastian
Copy link
Member

My vote is also for 2.i. There aren't that many types that need to be special cased, it's basically just the integers.

@robertbastian
Copy link
Member

2.i isn't compatible with removing the type annotations, but I don't think that's possible anyway. For that the code would need to use generics, so would have to be completely expressible as a const constructor. That's not possible because we're calling a function on the generic type, which we cannot do in const, but can do in a macro.

@dpulls
Copy link

dpulls bot commented May 26, 2023

🎉 All dependencies have been resolved !

@Manishearth
Copy link
Member

Very much not a fan of ConstAsULE. In the long run, its purpose is served by ~const AsULE, once that is stable and bug-free.

I think 2.i is an okay route for now. But I'm honestly not convinced we need this magic macro to be so magic.

@sffc
Copy link
Member

sffc commented May 26, 2023

I think you can make a type-safe "ConstAsULE" using associated constants:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8c462d9f99c0b0b47a6ce98e69486c1c

@skius
Copy link
Member Author

skius commented May 26, 2023

I didn't know that, that's cool! Unfortunately, it won't work for our use-case, as they cannot be called in a const context: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=d7aec25104078f8bc017fa6bd842b6b9

@sffc
Copy link
Member

sffc commented May 26, 2023

ok, "error: an fn pointer type cannot be const". Almost!

In general, I'm still quite in favor of making const traits via protocols where we expect a function to exist, and if the function doesn't exist, it's just a compile error, which is fine. The ConstAsULE solution you've sketched out in this PR aligns well with this model. So I'd be in favor of 2.iii.

The problem with 2.i is that it is quite invasive. We've had the 1-to-many model for a long time. We can't really enforce 1-to-1 unless we change the traits to give trait ULE an AsULE associated type. Additionally, the types you're proposing to wrap in newtypes are the most common fundamental primitives. I'm not saying that I'm necessarily opposed with going in this direction; it just seems like a large change to fix a not-very-large problem.

2.ii seems like a fairly non-invasive solution, and a nice thing is that we can remove the special cases when we have a more general solution. However, the downside of 2.ii is that we still are establishing what is effectively a "canonical" AsULE for each ULE (the one that is the argument of from_array).

I'm also fine with option 1. The macro improves a bunch on the status quo, and we can wait for const traits and things to stabilize before trying to improve it further.

@sffc
Copy link
Member

sffc commented May 26, 2023

Here's the nightly solution with #![feature(const_trait_impl)]

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=85df44f8e92d12a7ecdc0f5e367da845

This solution can work on stable if we just say "implement a method named XYZ on the AsULE type", and in the medium term we can export the trait that we want people to const-implement. In cases where we can't implement the function on the AsULE type (impls on external types), we can have a special case in the macro (2.ii). I think this should cover everything, because the only place we can impl AsULE on an external type is in the zerovec crate itself, so we can likewise special-case all those same cases in zeroslice!.

@skius
Copy link
Member Author

skius commented May 26, 2023

@sffc - I think you swapped 2.i and 2.ii (that's on me - they're not the most visibly distinguishable). 2.ii is the solution wrapping ULEs in newtypes.

However, the downside of 2.ii is that we still are establishing what is effectively a "canonical" AsULE for each ULE (the one that is the argument of from_array).

I'm not sure I understand (assuming you're talking about 2.i). For cases where there is no "canonical" 1:1 mapping, we just special case the macro, as seen in the most recent wip commit, and use more explicitly named functions (i.e., from_unsigned_array and from_signed_array vs. from_array).

This solution can work on stable if we just say "implement a method named XYZ on the AsULE type", and in the medium term we can export the trait that we want people to const-implement. In cases where we can't implement the function on the AsULE type (impls on external types), we can have a special case in the macro (2.ii). I think this should cover everything, because the only place we can impl AsULE on an external type is in the zerovec crate itself, so we can likewise special-case all those same cases in zeroslice!.

Do I understand correctly that you're proposing exporting a trait, e.g., ConstAsULE, which has a non-const function ConstAsULE::to_unaligned_array (due to lack of missing const_trait), which we want clients to also implement in a const manner on the concrete type itself? Meaning the concrete::to_unaligned_array function would be the one we use in the macro because it's const, and the ConstAsULE::to_unaligned_array would just be there to make it clear to users what they have to implement?

The main benefit of that solution (I don't see many other differences to the others) being that once const_trait_impl is stable, we can simply switch to that without causing much (or any) headache to the user? I do like that idea!

Also, thank you for the sample implementation with the const_trait feature! Is there a reason (perhaps code size?) you put the const N: usize generic on the trait, instead of on the function? That would require one less feature: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=6683ea16722bc2101ec318e7be95d940 (because there is no way to turn the number of items passed to a repeating pattern into a literal number)

But I'm honestly not convinced we need this magic macro to be so magic.

I'm also fine with option 1. The macro improves a bunch on the status quo, and we can wait for const traits and things to stabilize before trying to improve it further.

I don't want to add ballast to the crate if the consensus is not necessarily clear. It does some add public functions/items that we would have to support once 1.0 happens (be that specialized functions required for the special-cased zeroslice! match arms (e.g., RawBytesULE::from_signed_array), or the ConstAsULE::ConstConvert shenanigans).

@sffc
Copy link
Member

sffc commented May 26, 2023

Oops, sorry, 2.i <-> 2.ii. My bad.

@sffc
Copy link
Member

sffc commented May 26, 2023

which we want clients to also implement in a const manner on the concrete type itself?

My proposal was that we don't actually export the trait and instead ask clients to implement a conforming function signature (duck typing); then, when able, we can make a breaking change and require the trait itself to be implemented, which won't require any changes to zeroslice! call sites. Note that we'll need a breaking change even if we export the trait, because clients will need to add a const to the impl.

Is there a reason (perhaps code size?) you put the const N: usize generic on the trait, instead of on the function?

I think I put the generic on the trait only because by default I avoid generics on trait functions because it means I can't take a dyn reference on the trait, but in this case I think it makes much more sense to put the generic on the function as you did.

@skius
Copy link
Member Author

skius commented May 26, 2023

Some more context: A part of the original motivation behind the "magic" version of this macro was to further improve human-readability of baked data, but after more discussion with @robertbastian, it's a not-so-trivial (if even possible in stable) problem to tackle, given we need to bake the type of T in ZeroVec<T> to be able to use the zeroslice![typ; ...] or also ZeroSlice::<typ>::from_ule_slice.

I still think it's a "cool" feature to have, and zerovec is a cool crate, but maybe there are not enough applications at the moment to justify the currently required (complexity, not implementation) cost. I have virtually no experience judging this.

@sffc
Copy link
Member

sffc commented May 26, 2023

I don't immediately see why it's infeasible to bake the T?

I'm happy landing this feature even if it isn't used in databake as a nice-to-have; the guarantee is that we make zeroslice! work for built-in types, and if you want to make it work for your own type, we tell you how to make it work, but it's not a core part of the trait system.

@skius
Copy link
Member Author

skius commented May 26, 2023

I don't immediately see why it's infeasible to bake the T?

We didn't spend a whole lot of time on this (or maybe @robertbastian spent some more alone), but the gist is that it seems like we'd have to fall back to std::any::type_name::<T>(), which is explicitly described to be for diagnostics use only. The reason we need the type at all is to refer to from_array (or whichever name it will have).

the guarantee is that we make zeroslice! work for built-in types, and if you want to make it work for your own type, we tell you how to make it work, but it's not a core part of the trait system.

Fair enough!

@robertbastian
Copy link
Member

bake_type would have to take &self to be dyn safe, but then you can't bake the type of empty arrays/slices/options. This might be fine for zeroslice, but not in general.

@sffc
Copy link
Member

sffc commented May 27, 2023

So is the issue that impl<T> Bake for &ZeroSlice<T> can't get the typename of T?

That seems like a solvable problem, and it's a general problem such that it will come to bite us again and again. A couple possible ways I can think of off the top of my head:

  1. Bake the first element of the list and have it put the typename into CrateEnv so that we can pull it out. If the list is empty, continue to use the ZeroSlice empty constructor such that we don't need the typename. Maybe CrateEnv itself is not the right place to put it, but there is most likely some place somewhere we can use to pass the token name up into the ZeroSlice bake impl.
  2. Add a trait like trait BakedName { const NAME: &'static str } and make it a required bound on impl<T: BakedName> Bake for &ZeroSlice<T>

Alternatively, if we go with option 1 on this PR, I think we don't need the typename for anything; you just need to explicitly pass the conversion function. But then you have the question of where do you get the name of the conversion function from.

@sffc
Copy link
Member

sffc commented May 27, 2023

Another thought: for the bake implementation, what if instead of using from_array you just called AsULE::to_unaligned on each element So, like, you expand to something closer to

ZeroSlice::from_ule_slice(&[
    (1i16).to_unaligned(),
    (2i16).to_unaligned(),
    (3i16).to_unaligned(),
]);

and of course to_unaligned would need to be the const function.

@skius
Copy link
Member Author

skius commented May 29, 2023

what if instead of using from_array you just called AsULE::to_unaligned on each element

It's certainly worth keeping in mind for when const traits land, but unless I'm missing something, your example also shows why it won't work in the interim? AFAIK we cannot currently implement const methods on foreign types (e.g. (1i16).to_unaligned()).

@sffc
Copy link
Member

sffc commented May 30, 2023

Well if we specialize for foreign types, we can avoid databaking T in the general case. We can sniff for the foreign type by checking core::any::type_id::<T>() == core::any::type_id::<u32>().

@@ -0,0 +1,10 @@
// expand me by cd'ing into properties, then `cargo expand expandme`

#[derive(Copy, Clone, Debug, Eq, PartialEq, Ord, PartialOrd)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: eventually should be removed, yes?

utils::wrap_field_inits(&const_from_aligned_conversions, &struc.fields);
let ule_impl = quote!(
impl #ule_name {
#[doc = #ule_doc]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Probably just directly include the doc comment here? There's no interpolation

($aligned:ty, $unaligned:ty, $default:expr, $single:path, $fn_name:ident) => {
#[doc = concat!("Convert an array of `", stringify!($aligned), "` to an array of `", stringify!($unaligned), "`.")]
pub const fn $fn_name<const N: usize>(arr: [$aligned; N]) -> [Self; N] {
let mut result = [$default; N];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: could we potentially use MaybeUninit to avoid the need for $default? arrays of MaybeUninit are transmutable to regular arrays

/// `crate::ule::constconvert::ConstConvert<T, T::ULE>`. Unfortunately, you will not be able
/// to reuse this type for your own `ConstAsULE` implementations, since you cannot provide
/// implementations for types outside of your crate.
pub trait ConstAsULE: AsULE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: doesn't look like we're using it anymore

(I am also in general against adding such a trait)

@skius
Copy link
Member Author

skius commented May 31, 2023

As it stands currently (with 2.i), I believe there would be some more limitations to the macro:

  1. We would have to define which syntaxes for types we support in the macro, e.g., zeroslice![UnvalidatedChar; ...] and zeroslice![zerovec::ule::UnvalidatedChar; ...] but not zeroslice![ule::UnvalidatedChar; ...] nor zeroslice![UCTypeAlias; ...]...
  2. Supporting types with generics (e.g., zeroslice![Option<char>; ...]) is difficult, because handling this in a general manner would require recursing in the macro (e.g., (Option<$inner:ty>; ...) => OptionULE(macro!($inner, ...)) in butchered syntax), since we need to special-case on the inner type. But recursing using the ty matcher is not possible without some external help, so we'd have to rely on the tt matcher, which would make the macro implementation more difficult for both generics and non-single-token type names (e.g., zerovec::ule::UnvalidatedChar).
  3. The generics limitation automatically infects #[make_ule] types that have a generic field, such as zerovec-derive/examples/make.rs StructULE, meaning they could not be used with the macro.
  4. Because not every type is const-convertible, we would likely have to give clients of #[make_ule] the option to specify if they want to have const conversion functions on their ULE type, e.g., #[make_ule(const MyStructULE), which would add some mental complexity for clients when trying to figure out if they can add the option (though I suppose a compile-error would make it quite clear).

Basically, the version of the macro implementable in what I would consider a reasonable time frame would be limited to types without generics and an overseeable number of different naming options for the same type, while at the same time requiring to document quite clearly to clients what types can be used in zeroslice! (and maybe why their make_ule type cannot be used).

I think these issues would mostly be resolved when const trait impls land, so I propose shelving this PR until we can implement this in a cleaner way.

What do you think? @robertbastian @sffc @Manishearth

@sffc
Copy link
Member

sffc commented May 31, 2023

@skius
Copy link
Member Author

skius commented May 31, 2023

@sffc How would you support (or decide not to support) cases like Y1, Y2 (for issue 1. from my comment) and Z1, Z2 (for issue 2. from my comment)?

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ea3f118bb97571284437f53e4517924f

@Manishearth
Copy link
Member

I think these issues would mostly be resolved when const trait impls land, so I propose shelving this PR until we can implement this in a cleaner way.

I generally agree. I don't think this is super super urgent.

@robertbastian robertbastian removed the discuss Discuss at a future ICU4X-SC meeting label Jun 2, 2023
@skius
Copy link
Member Author

skius commented Jun 2, 2023

Closing because this is difficult without const traits. See #3454 (comment) for more info

@skius skius closed this Jun 2, 2023
@sffc
Copy link
Member

sffc commented Jun 2, 2023

(thought I posted this earlier but it was still a draft)

I see, yeah, all of those cases would need to provide an explicit conversion function. We can cover 90% of use cases now, or stick with the less-ergonomic macro and cover 100% of cases when const traits are stable. I still don't think the explicit conversion function is that bad, so I'm fine with waiting, although according to @Manishearth we may still be waiting for a while.

However, we could at least get rid of the T argument by doing what I proposed (client provides a function to convert a single item in the array as opposed to a function to convert an entire array).

@skius
Copy link
Member Author

skius commented Jun 2, 2023

@sffc #3455 switched to the single-item-conversion-function.

The reason that doesn't allow us to also remove the T argument is two-fold:

  1. We need to have an inner const, because we are referencing a temporary (the [T::ULE]). Calling the macro in a non-const context would drop that temporary (i.e., let my_slice: &ZeroSlice<u32> = zeroslice![1]; println!("{my_slice:?}") would not compile).
  2. The inner const needs the explicit T for its type annotation.

@sffc
Copy link
Member

sffc commented Jun 3, 2023

ok. 1 is true for locals but not for statics:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=27d85fcce6ace7512a0c0641488157ec

We don't care about locales in databake, and I think most use cases are in statics. If we take an approach where we have "named arguments" like type: i16 in the macro invocation, as suggested in #3478, then we could make it optional argument.

@robertbastian
Copy link
Member

Clients can always use a local const as well.

@skius
Copy link
Member Author

skius commented Jun 5, 2023

Okay, yeah, given that clients can just use a const, I suppose that's a good point. Moving away from the inner const also allows constructing "runtime-dependent" (i.e., non-const arguments) owned ZeroVecs, which is not possible in main currently: https://github.com/unicode-org/icu4x/pull/3486/files#diff-5a522689fb3ea9ba0d0162e16d2a3238f5355a006c59aa13df60398c6a68029aR993

The only case where we might want an optional type: i16 argument that I can think of, is if a client wanted to directly use a borrowed value from the macro in expression position (i.e., without assigning it to a const first), in which the resulting expression outlives the scope of the created temporary. So, maybe niche enough that we don't need the type: i16 argument at all?

// works without inner cost
let _: i16 = sum_values(zeroslice![...]);

// does not work without explicit const, and could benefit from `type: i16` argument that creates the inner const
let sub = subborrow_from(zeroslice![...]);
consume(sub);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants