Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KindaSortaDangling internal type to make Yoke safe under strong protection #3735

Merged
merged 11 commits into from
Jul 26, 2023

Conversation

Manishearth
Copy link
Member

Miri is going to be turning on -Zmiri-retag-fields which means that certain memory pointery things need to have the appropriate internal metadata.

rust-lang/rfcs#3336 proposes a MaybeDangling type that allows one to mark memory regions as not needing that metadata. It does not exist yet, in the meantime I have defined an internal, simpler variant of it that achieves the task adequately.

See #3696

@Manishearth
Copy link
Member Author

Annoyingly this increases the sizes of some DataPayloads, presumably because the inner data is no longer eligible for niche optimizations. Though I can't really see the opportunities for niche optimizations anyway so I'm not fully clear what's going on.

However these sizes are less than the size in nightly anyway, and the nightly sizes don't change with this; I'm inclined to assume that what happened here is that the MaybeUninit is triggering whatever layout optimization Rust is now applying globally, just earlier.

components/datetime/src/lib.rs Outdated Show resolved Hide resolved
check_size_of!(6792 | 5456, DateTimeFormatter);
check_size_of!(7904 | 6480, ZonedDateTimeFormatter);
check_size_of!(1496 | 1320, TimeFormatter);
check_size_of!(5800 | 4632, DateFormatter);
Copy link
Member

@sffc sffc Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet the compiler was previously able to "merge" the T with whatever other fields were present on the struct, but now with the MaybeUninit it is forced to keep the payload as its own chunk of memory. A bit sad.

EDIT: Nevermind, that doesn't make sense, because it's still possible to get a &T, so the compiler doesn't have the liberty to fiddle with the layout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it only makes sense if there's an enum wrapping it, and the only enum I see is the thing that swaps between yokes and staticrefs, and I'm not quite sure how that could have a size impact.

Copy link

@danielhenrymantilla danielhenrymantilla Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if the payload of two variants have a common part, and if one of the payloads separately has enough of a niche for Rust to be able to stuff the enum's discriminant inside, then it does so, effectively reducing the size of the whole enum. In which case the niche removal of MaybeUninit does effectively impact the overall size.

use ::core::mem::{MaybeUninit as MU, size_of};

enum Enum<T> {
    A(T, u8),
    B(   u8),
}

assert_eq!(2, size_of::<Enum<    bool  >>());
assert_eq!(3, size_of::<Enum< MU<bool> >>());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha! thanks for investigating!

Though the DataPayload enum here doesn't have variants with common parts before it gets to the yoke

@sffc
Copy link
Member

sffc commented Jul 25, 2023

I kinda prefer using the crate maybe_dangling because this is a very specific and tricky type to implement and I think shifting nontrivial unsafe code out of ICU4X is a very valid reason to take on a dependency.

@Manishearth
Copy link
Member Author

I really don't think this type is specific or tricky: It's just a wrapped MaybeUninit where we say "actually, this is init"; and MaybeUninit is a type we're using in trickier scenarios already. If you're not comfortable reviewing this I could ask Alyssa or someone else to take a look, to me this is a very small encapsulated unsafe abstraction, which is far better than any other unsafe abstraction in ICU4X.

The main tricky thing I see here is whether or not this type has the effect we want it to have, which definitely is based on some tricky stuff. However, we are acting on the direct advice of the unsafe code guidelines group; I'm fine doing that.

@sffc
Copy link
Member

sffc commented Jul 25, 2023

I really don't think this type is specific or tricky: It's just a wrapped MaybeUninit where we say "actually, this is init"; and MaybeUninit is a type we're using in trickier scenarios already. If you're not comfortable reviewing this I could ask Alyssa or someone else to take a look, to me this is a very small encapsulated unsafe abstraction, which is far better than any other unsafe abstraction in ICU4X.

The main tricky thing I see here is whether or not this type has the effect we want it to have, which definitely is based on some tricky stuff. However, we are acting on the direct advice of the unsafe code guidelines group; I'm fine doing that.

What I mean is that I'm cross-referencing this with the comments in #3696 and the implementation in the maybe_dangling crate recommended in that thread; that's how I caught the Drop issue, but there's a lot of other stuff going on there, too. It's not entirely clear which things we need and which things we don't.

Also, it seems MaubeUninit is not actually what we want long term; it just happens to be a short-term fix that works. If we add a dependency, we can get rid of that dependency once it lands in stable.

@Manishearth
Copy link
Member Author

Manishearth commented Jul 25, 2023

What I mean is that I'm cross-referencing this with the comments in #3696 and the implementation in the maybe_dangling crate recommended in that thread; that's how I caught the Drop issue, but there's a lot of other stuff going on there, too. It's not entirely clear which things we need and which things we don't.

There's a lot going in that crate for primarily two reasons:

Firstly, it also attempts to expose a retag-safe ManuallyDrop. Currently miri does not treat ManuallyDrop as sufficiently special here; since that specialness is also something that that RFC is proposing. The crate instead first builds the future ManuallyDrop (out of MaybeUninit) and then builds MaybeDangling out of that. Quite a reasonable choice for a crate that plans to expose both, not a necessary bit of complexity for us.

The crate is also planning to build.rs a compat strategy as the right types become available; personally I'm fine with upgrading once the RFC lands and MSRV lets us because this stuff is not currently exploited by the compiler, is unlikely to be, and "miri clean on older compiler" does not seem like a goal we currently plan to hit given that "miri clean on a modern compiler" is not something we're doing across ICU4X 😄.

Secondly, it attempts to handle dropck issues. These show up when your parameters have lifetimes because Rust does let you make things like self-referencing arenas where the self-references have lifetimes are not allowed to have destructors that can observe the deletion-in-progress data (here's an example that may be illustrative, happy to expand if you're interested).

We don't need to do this, our yokeables are always 'static. Besides, the dropck behavior defaults to safe, the stuff the crate is doing is using the unstable eyepatch feature to make it support more behaviors, which we don't need. (We did need this in ZeroVec and ended up with ... a fun hack)

There are also a bunch of impls we don't need because we're not exposing it to users and Yoke will want to carefully manage its own impls.

I was intending to write Drop and kinda forgot, but also I mostly made the PR with an eye towards getting the safe stuff right and rechecking later (and I was planning on looking at the maybe_dangle crate later; I didn't want to do it before writing the abstraction).

Also, omitting Drop wasn't a safety issue, it was just silly: and in general I think that's what I mean here, this abstraction is not very complicated from an unsafe perspective.

It is weird from a "what it does" perspective but that's not in the actual unsafe code in this abstraction, it's in what we're using it for; and that's going to be a question if we decide to use a third-party crate as well; we'll have to determine if the crate does in fact fix the problem when we use it the way we are using it.

Also, it seems MaubeUninit is not actually what we want long term; it just happens to be a short-term fix that works. If we add a dependency, we can get rid of that dependency once it lands in stable.

It is not possible for MaybeUninit to be any more strict than MaybeDangling, because something that is potentially uninitialized most definitely cannot be required to be valid or dereferenceable or have any aliasing properties.

maybe_dangling also uses MaybeUninit to achieve things.

The main difference between the two is that MaybeDangling does not actually allow the type to be uninitialized, which means the layout optimizations could work again. MaybeUninit can't participate in niche optimizations.

@Manishearth
Copy link
Member Author

Ironically, the maybe_dangling crate is by Alice Ryhl and I basically implicitly trust her work anyway; in one of my own projects I would definitely use the dependency, I just think that here this does not rise to the level of warranting a new dependency.

I am happy to try and document the situation with more detail if that helps make this code more maintainable.

Copy link

@danielhenrymantilla danielhenrymantilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! (I love the name 😄) I'd just be careful with the current non-in-place drop of T

utils/yoke/src/kinda_sorta_dangling.rs Outdated Show resolved Hide resolved
utils/yoke/src/kinda_sorta_dangling.rs Outdated Show resolved Hide resolved
utils/yoke/src/kinda_sorta_dangling.rs Outdated Show resolved Hide resolved
utils/yoke/src/kinda_sorta_dangling.rs Show resolved Hide resolved
check_size_of!(6792 | 5456, DateTimeFormatter);
check_size_of!(7904 | 6480, ZonedDateTimeFormatter);
check_size_of!(1496 | 1320, TimeFormatter);
check_size_of!(5800 | 4632, DateFormatter);
Copy link

@danielhenrymantilla danielhenrymantilla Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if the payload of two variants have a common part, and if one of the payloads separately has enough of a niche for Rust to be able to stuff the enum's discriminant inside, then it does so, effectively reducing the size of the whole enum. In which case the niche removal of MaybeUninit does effectively impact the overall size.

use ::core::mem::{MaybeUninit as MU, size_of};

enum Enum<T> {
    A(T, u8),
    B(   u8),
}

assert_eq!(2, size_of::<Enum<    bool  >>());
assert_eq!(3, size_of::<Enum< MU<bool> >>());

Manishearth and others added 3 commits July 25, 2023 15:38
Co-authored-by: Ralf Jung <post@ralfj.de>
@Manishearth Manishearth mentioned this pull request Jul 25, 2023
@Manishearth Manishearth requested a review from sffc July 25, 2023 17:25
@Manishearth Manishearth merged commit bc0fc6a into unicode-org:main Jul 26, 2023
23 checks passed
@Manishearth Manishearth deleted the kinda-yoke branch July 26, 2023 22:43
@Manishearth Manishearth linked an issue Jul 27, 2023 that may be closed by this pull request
@Manishearth Manishearth mentioned this pull request Sep 21, 2023
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Yoke vs strong protection
4 participants