Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shouldn't exhume take a NonNull<Self> rather than a &mut Self? #17

Open
HadrienG2 opened this issue Sep 6, 2019 · 11 comments · May be fixed by #22
Open

Shouldn't exhume take a NonNull<Self> rather than a &mut Self? #17

HadrienG2 opened this issue Sep 6, 2019 · 11 comments · May be fixed by #22

Comments

@HadrienG2
Copy link

Rust references allow the compiler to assume that the data behind them is valid. One way in which rustc currently uses this is to tag the associated pointers with LLVM's dereferencable attibute, which allows the latter to prefetch from them to its heart's content. This kind of smart optimization should not be allowed before objects are exhumed, as it can lead to undefined behavior like LLVM following dangling pointers and segfaulting the program.

Therefore, I think exhume should not take its target object as a Rust reference, but as a NonNull pointer, which provides no guarantee of target data validity to rustc and therefore doesn't allow the compiler to muck around with it.

@HadrienG2 HadrienG2 changed the title Shouldn't exhume take a NonNull<Self> rather than an &mut Self? Shouldn't exhume take a NonNull<Self> rather than a &mut Self? Sep 6, 2019
@frankmcsherry
Copy link
Member

This sounds very likely correct. For context, much of this work was done before the Rust team had as clear a sense of undefined behavior or could enumerate what Rust promised to LLVM. I'll check this out, but my guess is that you are correct.

@frankmcsherry
Copy link
Member

Investigation suggests that this will involve a bit of a rewrite, as the control flow currently descends correcting pointers and such as it goes, whereas it will instead need to make the assignments as it ascends the recursive calls, because the data do not have the appearance of validity until the call is done. It's still a good thing to do, but not a 15 minute fix afaict.

@HadrienG2
Copy link
Author

HadrienG2 commented Sep 6, 2019

I can also tell you in advance another thing which will cause migration pain: it is very hard to get a field pointer from a struct pointer without going through a reference in current Rust...

@frankmcsherry
Copy link
Member

frankmcsherry commented Sep 6, 2019

That makes sense. I'm guessing that the unsafe code the Rust folks write is localized enough that, e.g. while building a Vec they can prep the known fields and not have to generalize over other patterns.

This is part of the reason I've been waiting for the unsafe team to sort things out; if they fail to produce something usable then I haven't sunk time into it. Is there an issue filed or a post on the unsafe zulip about the "difficulty of dereferencing" issue?

@HadrienG2
Copy link
Author

HadrienG2 commented Sep 6, 2019

I can't find the specific threads right now, but there were severaI discussions on github and/or the internals forum recently about this field-from-pointer issue.

On example was someone trying to build movable self-referential structs using field offsets as an alternative to pointers, and needing to figure out said field offset without having an actual valid object to dereference. Another was &mut aliasing UB triggered by people trying to access a field of data behind an &mut pointer-to-struct.

"Raw references" are frequently invoked as a possible future solution to this kind of problem.

Pinging @RalfJung, maybe he'll remember more than I do.

@frankmcsherry
Copy link
Member

Sweet. I just dropped a comment in the WG: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/Abomonation.20UB

I've not been able to keep up with the discussion there, or participate meaningfully, as it is essentially a full-time non-paying job to do so. Ideally I can just ping them occasionally and hope that they make informed decisions that don't close off aspects of performance-oriented computing.

@HadrienG2
Copy link
Author

HadrienG2 commented Sep 6, 2019

Good news, it seems that the &raw RFC is being accepted! So it will soon be possible to do this:

let field_ptr = &raw mut (*struct_ptr).field;
field_ptr.write(new_value);

In meantime, the widely used workaround is this...

let field_ptr = &mut (*struct_ptr).field as *mut _;
field_ptr.write(new_value);

...which is UB because an &mut to invalid data is created, but no UB-induced miscompilation has been reported with current versions of rustc and LLVM, most likely because LLVM has no reason to follow the assumed-valid field pointer in this scenario...

@HadrienG2
Copy link
Author

On the other hand, integrating this is going to be a little bit more ugly than replacing &mut self with self: NonNull<Self> because the latter is currently not supported by Rust. But that's okay, we can just give up on the nice trait dispatch of object.method(...) for now and go for Type::method(object, ...) instead.

I'm taking a closer look at this, hopefully will submit a PR later on.

@frankmcsherry
Copy link
Member

Neat, thanks!

I just checked the Zulip and it seems that there may not be as much UB as we think. At least, take a read:

https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/Abomonation.20UB

especially about Ralf's comments about dereferenceable not being transitive. That is, apparently a &mut Vec<T> needs to point at a valid 24 bytes, but it doesn't mean that its pointer needs to point at allocated memory.

Let me know if that is your read too.

@HadrienG2
Copy link
Author

HadrienG2 commented Sep 19, 2019

Yes, my understanding of Ralf's comment is that many unsafe-based constructs (Box, Vec...) are currently based on raw pointer varieties which are not subjected to LLVM's dereferenceable guarantee, unlike &[mut].

I'm not sure if I am comfortable with relying on this implementation detail too much, though. There is quite a big gap between the invariants of &mut T (non-null, well-aligned, unique, valid allocation, valid target data) and *mut T (almost nothing), and the Rust team have built several pointer types with intermediary guarantees before (NonNull, the unstable Unique, maybe others...). I wouldn't be surprised if it happened again with the dereferenceable guarantee in the future.

Further, I am not sure how safe abomonation currently is regarding &mut's other invariants, in particular the uniqueness one which even the Rust team got wrong with generators. I think it is common to end up with, for example, an &mut to a Box and an &mut to its target, without a clear reborrow that the compiler can identify as such inbetween.

So I think switching to a NonNull-based interface with much weaker pointer invariants attached would still be worthwhile for the sake of making Abomonation impls easier to write correctly and of localizing their (currently inconsequential) violation of the &mut data validity invariant in a smaller region of the code.

@frankmcsherry
Copy link
Member

That does sound worthwhile still, I just didn't want you to do a bit of work without seeing that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants