Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support external objects with references #600

Closed
wants to merge 4 commits into from

Conversation

jbdutton
Copy link
Contributor

@jbdutton jbdutton commented Aug 14, 2023

This is an experimental branch, meant to eventually fix #166.

It introduces a new trait, currently called AnyRef, which does not have the 'static lifetime constraints of the normal Any type. In order to access an AnyRef, it must be through the Projectable trait, which uses generic associated types (GATs) to constrain the returned lifetime.

crates/rune/src/any.rs Outdated Show resolved Hide resolved
@jbdutton jbdutton force-pushed the external-references branch 2 times, most recently from 0cb18af to 984bfe8 Compare August 14, 2023 21:34
@udoprog udoprog added the enhancement New feature or request label Aug 15, 2023
@udoprog
Copy link
Collaborator

udoprog commented Aug 15, 2023

Thanks for taking a stab at this!

Could you explain the need to introduce TypeId and the uuid field? This is probably the aspect I'm most reluctant towards since the Uuid part seems to serve the same purpose as a type hash already does (as returned by Any::type_hash).

The existing rule that will be part of the next release is that the hash of any external type must correspond to its path. The type hash is deterministic and can be used to statically address any type (including generic ones) and is checked during module installation to test for uniqueness.

There's another aspect which is more dire. Dropping the 'static constraint allows the caller to use a safe API exposed by Rune to break lifetime invariants:

use rune::Any;
use rune::runtime::{Shared, AnyObj};

#[derive(Any)]
#[rune(uuid = "ce4da313-5021-4ad3-a70f-844a6a70e2ae")]
struct External<'a> {
    field: &'a u32,
}

fn function() -> External<'static> {
    let field = 42u32;
    let shared = Shared::<AnyObj>::new(AnyObj::new(External { field: &field }));
    // This is allowed, since `External<'static>` and `External<'1>` are considered the same "type"
    shared.take_downcast().unwrap()
}

fn main() {
    let external = function();
    dbg!(external.field);
}

@jbdutton
Copy link
Contributor Author

jbdutton commented Aug 15, 2023

Could you explain the need to introduce TypeId and the uuid field? This is probably the aspect I'm most reluctant towards since the Uuid part seems to serve the same purpose as a type hash already does (as returned by Any::type_hash).

My bad! I thought maybe there was a distinction between the type hash and type id that was lost on me. If the hash is already meant to do this, then we can scrap the UUID attribute.

Dropping the 'static constraint allows the caller to use a safe API exposed by Rune to break lifetime invariants

Yeah, I think for this to work we'd need some extra protections. Probably a second, private type that is similar AnyObj (or some optional markers akin to !Send/!Sync), with some restrictions on how/whether it can be returned to Rust, and preventing it from being captured in an async context (I think this second part already panics?).

@udoprog
Copy link
Collaborator

udoprog commented Aug 15, 2023

You can avoid having the value be take by tweaking the AnyObjKind, which for the above is Owned (see this).

But the trickier part is the inner lifetime - something I'm genuinely not sure can be addressed with Rust. I don't think we have a way to generically construct a type which has a lifetime in its signature which also disallows 'static and other potentially problematic lifetime combinations.

At least that would be the first area to explore whether this can be solved. Ultimately what you want to ensure is that a value can only be coerced into one of:

fn takes_mut(external: &mut External<'_>)
fn takes_ref(external: &External<'_>)

And never something like 'static or possibly other dependent input lifetime combinations:

fn takes_ref(external: &External<'static>)
fn can_maybe_be_swapped<'a>(value: &mut &'a mut Value, external: &mut External<'a>)

And since AFAIU there's no way to constrain that a lifetime may only be the shortest possible temporary lifetime that might prove to be difficult.

@jbdutton
Copy link
Contributor Author

jbdutton commented Aug 15, 2023

To make sure I understand, you're saying something like this is fundamentally unsound:

let mut count = 0;
let external = External::new(&mut count);
let value = vm.call("main", (external,)).unwrap();
let external: External<'static> = rune::from_value(value);

Because we have no way to ensure the original lifetime is respected.

I guess I am wondering what the use case would be for such an API. In my mind, if you are passing a Rust struct containing references into a Rune script, it's unlikely that you would want to extract it back out as a return value.

One way to look at it: Rune could require that you pass the struct by reference. You retain ownership of External<'a>, and the VM either receives a &External or a &mut External. In either case, there's no need to then return the reference and reify it into a Rust type with an ambiguous lifetime using from_value.

Or, Rune can take ownership of an External<'a> but cannot return it to you. So we would take steps to make

let external: External = rune::from_value(value);

a compile-time failure, if possible (i.e. specifically when dealing with lifetimes).

If you need Rune to mutate External in such a way that is observable from Rust, then &mut External is suitable.

@udoprog
Copy link
Collaborator

udoprog commented Aug 17, 2023

I guess I am wondering what the use case would be for such an API. In my mind, if you are passing a Rust struct containing references into a Rune script, it's unlikely that you would want to extract it back out as a return value.

My feelings as well. It might be possible to deal with nested references by restricting what can be done with them.

Sorry about the long post, but I've had a few days now to think about this, and discovered both a few issues and maybe an opportunity.

In the process of thinking about your proposal I've discovered a subtle issue with the current implementation of references that needs to be dealt with. I would note that it also affects nested references and might be a deal breaker. At least to support them as regular arguments to closures and functions being registered in the virtual machine.

This is representative of what we currently do, but is wrong since we don't reject 'static references (or references with any lifetime):

trait Trait<A> {}
trait Arg {}

impl<T, A> Trait<A> for T where T: Fn(A) {}

impl Arg for &u32 {}
impl Arg for u32 {}

fn function1(_: u32) {}
fn function2(value: &u32) -> &u32 {}
fn function3(_: &'static u32) {}

fn receive<T, A>(_: T) where T: Trait<A> {}

fn main() {
    receive(function1);
    receive(function2);
    receive(function3);
}

What we need to do is this instead, use an HRTB:

trait Trait<A> { }
trait Arg {}

// Note the HRTB:
impl<T, A, O> Trait<A> for T where T: Fn(&A) -> O {}

impl Arg for u32 {}

fn function1(_: &u32) {}
fn function2(value: &u32) -> &u32 { value }
fn function3(_: &'static u32) {}

fn receive<T, A>(_: T) where T: Trait<A> {}

fn main() {
    receive(function1);
    // rejected:
    // receive(function2);
    // rejected:
    // receive(function3);
}

So this begs the question: where does this leave references nested in types? Rust currently doesn't have a way to project HRTB's generically into them or at least not to the extent that I am aware. To properly support this use case we'd need to somehow implement it generically (the first version) but as demonstrated that clearly leads to incorrect behavior so it also has to be constrained in a way which ensures that the reference doesn't escape the function call (using HRTB's).

What I did end up looking into is Bevy ECS, which has the ability to produce references to arbitrary complex types using GATs. This might be an option for nested references in Rune, but is not something I've explored deeply yet:

struct Reference<'a, T: ?Sized> {
    /* magic */
}

trait Projectable {
    type Item<'a>;
}

impl<'a, T: ?Sized> Reference<'a, T> where T: Projectable {
    /// This borrows from `&self`, which can only be borrowed locally in the function call.
    fn get(&self) -> T::Item<'_> {
        todo!()
    }
}

/// The `external` variable is constrained to the function through the anonymous `'_`, which can be used generically through an HRTB.
fn function(external: Reference<'_, External<'static>>) {
    let external = external.get();
}

Again, sorry for the dump. Hopefully I've demonstrated the problem, and maybe there's something useful to learn from Bevy ECS.

@jbdutton
Copy link
Contributor Author

Sorry, I haven't had a chance to dig into your comment yet! Just some quick notes before the weekend:

My intuition has been that, inside of a synchronous context, Rust lifetimes aren't an issue for Rune. As long as they outlive the vm.call(...), and you can't stash a reference somewhere inside the VM's state, then you're good. This assumes you cannot create a &T inside of a Rune script, only receive one as an argument.

Of course, stashing a non-static &T inside of an async context won't work. But I think the answer there is to wrap Rust references in a type that is the Rune equivalent of !Send+!Sync.

Maybe there is a subtlety here that I'm overlooking?

@jbdutton jbdutton force-pushed the external-references branch 3 times, most recently from f0e4856 to 1ad57c8 Compare August 26, 2023 20:08
@@ -657,7 +657,7 @@ where
impl #impl_generics #unsafe_to_ref for #ident #type_generics #where_clause {
type Guard = #raw_into_ref;

unsafe fn unsafe_to_ref<'a>(value: #value) -> #vm_result<(&'a Self, Self::Guard)> {
unsafe fn unsafe_to_ref<'__rune_g>(value: #value) -> #vm_result<(&'__rune_g Self, Self::Guard)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't strictly necessary, but I noticed that the lifetime here conflicts with the lifetime specified in the struct. Since 'a is a very common lifetime, I renamed it to an "internal" lifetime to make it less likely to conflict.

@udoprog
Copy link
Collaborator

udoprog commented Aug 26, 2023

I pushed an example indicating an issue with the current implementation.

It's unfortunately the same problem I discussed earlier, in this instance the inner reference Factory<'a> would be unconstrained in its UnsafeToRef impl, so we can freely use any reference we want (including 'static which my example uses) when registering the method. This somehow needs to be constrained, otherwise it'll allow references to outlive the method calls.

You can run the xample through miri:

cargo miri run --example static_reference
error: Undefined Behavior: Data race detected between (1) Read on thread `<unnamed>` and (2) Deallocate on thread `main` at alloc3639689. (2) just happened here
  --> examples\examples\static_reference.rs:67:5
   |
67 |     }
   |     ^ Data race detected between (1) Read on thread `<unnamed>` and (2) Deallocate on thread `main` at alloc3639689. (2) just happened here
   |
help: and (1) occurred earlier here
  --> examples\examples\static_reference.rs:17:28
   |
17 |           std::thread::spawn(move || {
   |  ____________________________^
18 | |             std::thread::sleep(std::time::Duration::from_secs(1));
19 | |             dbg!(widgets);
20 | |         });
   | |_________^
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE (of the first span):
   = note: inside `main` at examples\examples\static_reference.rs:67:5: 67:6

To fix this we somehow need to either:

  • Reject illegal references when registering the method (I don't think this is possible in Rust today).
  • Somehow only produce local references. This could maybe be done like Bevy ECS does it (I discussed that above).

@jbdutton
Copy link
Contributor Author

It's unfortunately the same problem I discussed earlier

Yes, for sure! Give me a few days and I will push up some ideas on how to go about fixing this

@jbdutton
Copy link
Contributor Author

@udoprog there's still a lot I need to do before this branch is viable, but pushing up what I have now so you can get a preview. There's several things I want to clean up, and it needs more thorough testing. Also I have not looked into how to prevent a lifetime from getting stored in an async context just yet.

Comment on lines +203 to +204
// TODO: separate TypeBuilder into two types?
builder.expand_ref().into()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably a lot of duplicate code between the Any and AnyRef macros that could be shared.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine!

let projection = Projection::new(value);

let data =
unsafe { ptr::NonNull::new_unchecked(Box::into_raw(Box::new(projection)) as *mut ()) };
Copy link
Contributor Author

@jbdutton jbdutton Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Projection is itself a ptr::NonNull, which I think is unnecessary. It can probably stored as the T: AnyRef itself, along with a new AnyObjKind. And then the Projection only needs to be returned when accessing the object.

There's maybe a world where Projection doesn't exist and Ref/Mut use the GATs, but I haven't investigated that...

@@ -336,6 +378,67 @@ impl Context {
}
},
});
} else if meta.path == GET_REF {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dealing with nested lifetimes (i.e. an AnyRef with an &AnyRef field), I needed a way to differentiate from a regular Any. But this could possibly be cleaned up.

unsafe fn unsafe_to_ref<'__rune_g>(value: #value) -> #vm_result<(&'__rune_g Self, Self::Guard)> {
let (projection, guard) = #vm_try!(value.into_projection());
let projection = projection.as_ref();
#vm_result::Ok((#lifetime::get(projection), guard))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to work, although I have not tested it thoroughly yet. By invoking the trait here, we're able to keep the Projection<T> type out of closure arguments, and avoid having to include it in impls.

@@ -72,7 +70,16 @@ pub use rune_macros::Any;
/// name: String,
/// }
/// ```
pub trait Any: Named + any::Any {
pub trait Any: Named + TypeHash + 'static {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leftover from when I was still getting my bearings on how all this worked. There's no need to drop any::Any from the trait, so this can be reverted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, please do once you have the time!

Copy link
Collaborator

@udoprog udoprog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice that you're investigating the Projectable approach.

I'll just be covering the issues I've found so far which are potential soundness issues so we can see if there's any way to address them. I'll go through a more thorough review once those have been addressed. I hope this is fine since you asked for a preview.

syn::Type::Reference(TypeReference{ mutability: None, ..}) => {
quote_spanned! { g.field.span() =>
module.field_function(#protocol, #field_name, |s: &Self| {
let s = Shared::new(AnyObj::new_projection(s.#field_ident));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it seems like these references are leaking?

To perform a projection like this, we need to access the Shared<AnyObj> which is invalidated at the end of the vm call. If we're constructing a completely new Shared here, it is no longer part of the same argument which is invalidated by GuardedArgs.

The reason passed in references is safe, is because the Shared<T> object that is produced from it is invalidated through GuardedArgs, so we must make sure any projections do the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple test you can try here is to return a reference (held by a Value) from the Vm::call. This should not be allowed, because it allows the caller to extend the lifetime beyond the object from where it was projected.

},
syn::Type::Reference(TypeReference{ mutability: Some(_), ..}) => {
quote_spanned! { g.field.span() =>
module.field_function(#protocol, #field_name, |s: &mut Self| {
Copy link
Collaborator

@udoprog udoprog Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks a bit sketchy to me. Since the return Shared<T> contains a mutable projection from &mut Self, we need to ensure that the same projection cannot legally be performed multiple time.

Consider what would happen if we mutably reference the same field many times.

What needs to be done is probably that we should construct a Mut<Self>, which prevents additional mutable references from being constructed and project from that using Mut::map. This is a bit unfortunate, since we can't hold onto one mutable projection at a time. I.e. if you try to run this Rune code the vm will panic:

let object = /* .. */;
let field1 = object.field1;
let field2 = object.field2; // Errors because we try to mutably access `object` which is being exclusively held through `field1`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all makes sense! I'm struggling to get it working, though.

We can't map a Mut<T> unless it's owned. There's a blanket implementation preventing us from using anything other than an Any to get an owned value for the field function:

impl<T> FromValue for Mut<T>
where
    T: Any,
{
    fn from_value(value: Value) -> VmResult<Self> {
        VmResult::Ok(vm_try!(vm_try!(value.into_any()).downcast_into_mut()))
    }
}

One option is to make a second kind of Mut<T> that works specifically with AnyRef. But, for FromValue to be satisfied, we'd still need to drop the 'static constraint.

Alternatively, there is UnsafeToRef and UnsafeToMut, but I don't think we're able to map from those.

@udoprog
Copy link
Collaborator

udoprog commented Aug 31, 2023

Also I have not looked into how to prevent a lifetime from getting stored in an async context just yet.

FTR, this is fine, as long as you store it inside of a Ref<T> or Mut<T> which would ensure that the object it was projected from stay alive and is locked from access. I think the projection already prevents storing normal references too.

@jbdutton jbdutton closed this Jan 3, 2024
@jbdutton jbdutton deleted the external-references branch January 3, 2024 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

request: using external objects containing references
2 participants