Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type safety: ffi pointer manipulation API #207

Merged
merged 5 commits into from
Dec 9, 2024

Conversation

muzarski
Copy link
Collaborator

@muzarski muzarski commented Nov 25, 2024

CassDataType interior mutability

CassDataType is the only type that is both shared, and mutable. We use of interior mutability pattern for this type. Even if the current code is sound in this matter, we should wrap the underlying data in UnsafeCell, to make (possibly)
invalid accesses to it more verbose and easier to detect by forcing the user to use an explicit unsafe block.

[Box/Arc/Ref]FFI API

The main motivation behind this PR is to:

  • reduce the lifetime of reference obtained from the pointer
    Current pointer-to-ref conversion API (ptr_to_ref_[mut]) returns a
    reference with a static lifetime. New API is parameterized by a lifetime,
    which is not 'static.

  • ensure that IF the memory associated with the pointer was allocated a certain way,
    the raw pointer will be converted to the corresponding Rust pointer primitive i.e. Box or Arc
    (or reference if the pointer was not obtained from an explicit allocation).

This is achieved, by (in addition, to the new API) disallowing the usage of [Box/Arc] API that operates on raw pointers.

However, this does not give us any guarantees about the origin of the pointer.
Consider following example:

// Rust

impl ArcFFI for Foo {}

fn extern "C" f1() -> *const Foo {
    // a pointer to stack variable
    // Also applies to some valid pointer obtained from the reference
    // to the field of some already heap-allocated object.
    // Decided to go with a stack variable to keep the example simple.
    let foo = Foo;
    &foo
}

fn extern "C" f2(foo: *const Foo) {
    let foo = ArcFFI::cloned_from_ptr(foo);
}

// C

Foo *foo = f1();
f2(foo);

// Segfault.
// Even if f1() returned a valid pointer, that points to some
// heap-allocated memory. The pointer was not obtained from an Arc allocation.
// I.e., it was not obtained via Arc::into_raw().

To guarantee this, we need to introduce a special type for pointer that
would represent the pointer's properties. This will be done in a follow-up PR.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have implemented Rust unit tests for the features/changes introduced.
  • [ ] I have enabled appropriate tests in .github/workflows/build.yml in gtest_filter.
  • [ ] I have enabled appropriate tests in .github/workflows/cassandra.yml in gtest_filter.

@muzarski muzarski marked this pull request as draft November 25, 2024 19:25
@muzarski muzarski force-pushed the type-safety branch 4 times, most recently from 3d8c76f to 8d0c437 Compare November 26, 2024 15:44
@muzarski muzarski self-assigned this Nov 26, 2024
@muzarski muzarski marked this pull request as ready for review November 28, 2024 19:52
@muzarski muzarski requested review from wprzytula and Lorak-mmk and removed request for wprzytula November 28, 2024 19:52
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, looks good.

scylla-rust-wrapper/src/cass_types.rs Show resolved Hide resolved
scylla-rust-wrapper/src/cass_types.rs Show resolved Hide resolved
scylla-rust-wrapper/src/cass_types.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
@muzarski
Copy link
Collaborator Author

muzarski commented Dec 2, 2024

Rebased on master

Lorak-mmk and others added 5 commits December 2, 2024 15:46
Previously `CassDataType` was just an enum, held inside `Arc`. User was
given a pointer to `CassDataType` using `Arc::as_ptr` or `Arc::into_ptr`.
There are however some functions that mutate the data - and they were
given the very same pointers.
Current code was most likely sound - but I'm not completely sure, Rust
reference is very confusing in this aspect.
It was however very confusing - when a programmer reads or writes
a function that that *mut CassDataType it is not obivious that this data
lies inside Arc and so has shared ownership.

To make this more explicit this commit puts `CassDataType` inside
UnsafeCell. Now each access needs to use `.get_unchecked()` and
`.get_mut_unchecked()` methods and an unsafe block / function, so it
will be easier to spot aliasing ^ mutability problems in the future.

In the future we can use `Arc::get_mut_unchecked()` for this purpose,
but it's not yet stabilised.
Implementation of `PartialEq` for `CassDataType` hides a
possibly unsafe operation. Let's make sure that we do not depend
on it in the code - only use it for test purposes.
The reasons behind the API (and this PR) are following:

- reduce the lifetime of reference obtained from the pointer
Current pointer-to-ref conversion API (ptr_to_ref_[mut]) returns a
reference with a static lifetime. New API is parameterized by a lifetime,
which is not 'static.

- Ensure that IF the memory associated with the pointer was allocated a certain way,
  the raw pointer will be converted to the corresponding Rust pointer primitive i.e. Box or Arc
  (or reference if the pointer was not obtained from an explicit allocation).
However, this does not give us any guarantees about the origin of the pointer.
Consider following example:

// Rust
impl ArcFFI for Foo {}

fn extern "C" f1() -> *const Foo {
    // a pointer to stack variable
    // Also applies to some valid pointer obtained from the reference
    // to the field of some already heap-allocated object.
    // Decided to go with a stack variable to keep the example simple.
    let foo = Foo;
    &foo
}

fn extern "C" f2(foo: *const Foo) {
    let foo = ArcFFI::cloned_from_ptr(foo);
}

// C
Foo *foo = f1();
f2(foo);

// Segfault.
// Even if f1() returned a valid pointer, that points to some
// heap-allocated memory. The pointer was not obtained from an Arc allocation.
// I.e., it was not obtained via Arc::into_raw().

To guarantee this, we need to introduce a special type for pointer that
would represent the pointer's properties. This will be done in a follow-up PR.
Implemented new traits for the types shared between C and Rust.
Adjusted all places where ptr-to-ref (and vice-versa) conversions
appear to use the new traits API.
@muzarski
Copy link
Collaborator Author

muzarski commented Dec 2, 2024

v2: addressed @wprzytula comments

@muzarski muzarski requested a review from wprzytula December 2, 2024 19:40
Comment on lines 642 to 653
pub unsafe extern "C" fn cass_data_type_set_type_name(
data_type: *mut CassDataType,
data_type: *const CassDataType,
type_name: *const c_char,
) -> CassError {
cass_data_type_set_type_name_n(data_type, type_name, strlen(type_name))
}

#[no_mangle]
pub unsafe extern "C" fn cass_data_type_set_type_name_n(
data_type_raw: *mut CassDataType,
data_type_raw: *const CassDataType,
type_name: *const c_char,
type_name_length: size_t,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Can you remind me / cite some source on what are the semantics of *mut vs *const? CassDataType* data_type is used on C side, which is not a const object, right? If that is the case, why do we use *const here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applied to other places where *mut was changed to *const.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of *const means that no one is going to modify the pointee through the pointer. Here, this is not the case.

Copy link
Collaborator

@Lorak-mmk Lorak-mmk Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does modify mean? Let's say we get a *const Mutex<something>. Are we allowed to lock the mutex and modify its contents? It only requires creating & from *const.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify on structs is defined by induction, by modifying one of its fields. And modyfing a value of a primitive type involves changing bits of their memory representation.

However, interior mutability is what makes an exception: if a type allows for internal mutability (in Rust, it basically means that it holds an UnsafeCell), then mutation inside that type are not considered mutation from outside. So answering your question, we are allowed to lock the Mutex through *const Mutex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that Wojciech answered your question. Notice, that all these *const and *mut will not matter after a follow-up PR, since raw pointers are going to be replaced by our new type representing a pointer.

@muzarski muzarski requested a review from Lorak-mmk December 4, 2024 18:16
@muzarski muzarski merged commit bfae66b into scylladb:master Dec 9, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants