From 81f9973234d667e5cc518826e77e08548d8cb6db Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Fri, 22 Mar 2019 01:05:45 +0100 Subject: [PATCH 01/10] Fill in the rfc for matching operations on pointers --- text/0000-pointer-match.md | 227 +++++++++++++++++++++++++++++++++++++ 1 file changed, 227 insertions(+) create mode 100644 text/0000-pointer-match.md diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md new file mode 100644 index 00000000000..9d1fe974fb6 --- /dev/null +++ b/text/0000-pointer-match.md @@ -0,0 +1,227 @@ +- Feature Name: `pointer-match` +- Start Date: 2019-03-21 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Extend match syntax and patterns by support for a limited set of operations for +pointers, which involve only address calculation and not actually reading +through the pointer value. Make it possible to use these matches to calculate +addresses of fields even for `repr(packed)` structs and possibly unaligned +pointers where an intermediate reference must not be created. + +# Motivation +[motivation]: #motivation + +To create a pointer to a field of a struct, there is currently no way in Rust +that avoids creating a temporary reference. Since reference semantics are +stricter, this may lead to silent undefined behaviour where that reference +should not be valid. Depending on the resolution of reference semantics this +affects: + +* Creating a pointer to a field of a packed struct, where the reference may be + unaligned (depending on . +* Pointing to fields of an uninitialized type, where the reference points to + uninitialized data. This may be complicated by unions, where it could be + possible that not a single variant is currently completely initialized, yet + one wants to access some subfield. See + . +* Doing pointer offset calculations where the references does not refer to the + same, or any, allocation. This is because reference calculations are + performed with `getelementptr inbounds`. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Match expression are extended from support for a reference binding mode, to a +pointer binding mode. Furthermore, a new pattern binds to a pointer, and +identifiers are extended to allow a new mode similar to `ref` and `ref mut` +binding to a reference. These patterns are called pointer pattern and raw +identifier for the remainder of the document. + +``` +#[repr(packed)] +struct Foo { + a: u16, + b: u32, +} + +fn ptr_b(foo: &mut Foo) -> *mut u32 { + let Foo { raw mut b, .. } = foo; + b +} +``` + +Note that pointer binding mode and pointer pattern requires `unsafe`, even when +it will never dereference the pointer. But the arithmetic on the pointer may +implicitely overflow. Furthermore, not all patterns are (yet) allowed, to avoid +implicitely performing an unintended, unsafe read through the pointer. Pointer +binding mode will at first only permit ultimately binding with `raw` and `ref` +and not actually reading the contained memory. + +The raw identifier pattern does not require `unsafe` on its own. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +The calculation of the value from a pointer pattern will not use an `inbounds` +qualifier when passed to llvm. + +There is no restriction on matching enum variants and slices, such that this is +possible: + +``` +#[repr(packed)] +Foo { + field: Enum, +} + +enum Enum { + A(usize), +} + +fn overwrite_packed_field(foo: &mut Foo ) { + // Actually safe! + let Foo { field: Enum::A(raw mut x), } = foo; + + // Write itself not safe, as we write to a pointer. + unsafe { ptr::write_unaligned(x, 0) }; +} +``` + +The new pattern forms a new kind of field binding, and should be inserted into +the grammar as an option for identifier and StructPatternField, next to +`id: pattern`, `tuple_index: pattern`, `ref? mut? identifier`. Pointer pattern +uses the obvious `*pattern` and is only allowed in unsafe blocks. + +Allowed [patterns](https://doc.rust-lang.org/reference/patterns.html) within +pointer patterns (and thus in the sugar of pointer binding mode) are: wildcard +patttern, path patterns that don't refer to enum variants or constants, struct +patterns, tuple patterns, fixed size array patterns, where the last three are +only not allowed to bind their fields with the new pointer pattern and with +`..`, potentially also with `ref mut? identifier`, but not `mut? identifier`. +Some further notes on (dis-)allowed patterns: + +* The restrictions don't apply to matching the pointer value itself, as that + is not inside a pointer pattern. +* enum variants and constants obviously read their memory. +* literal, identifier, and reference patterns also constitute a read of the + pointed-to place, and implicitely assert their type's invariants. Better to + keep those operations separate. +* no pointer patterns within pointer patterns, must also actually read memory. +* slice patterns would require size information not present in the pointer. +* `ref mut? identifier` may be useful, but may be too tempting sometimes. + +# Drawbacks +[drawbacks]: #drawbacks + +Match syntax is 'more heavy' than a place based syntax in some or many cases. +On the other side of the coin, initializing a struct often involves grabbing +pointers to all fields, where matching is much terser than each indivdual +expression. + +The additional pointer binding mode for match expressions may be confusing due +to the non-explicit pointer nature of its argument. + +The pointer retrieved from `raw mut` binding while matching a `&mut _` value +upholds more guarantees than aparent, as it is known to be writable with +`ptr::write_unaligned`. Some yet-to-be-proposed encapsulation could thus make +this completely safe to the programmer. This is a drawback because of the next +argument. + +Assigning semantics to the pattern matching of `*` and `raw` has the risk of +being too restricted for future operations but too constrained to allow +backwards compatible extension. Specifically, the type of `id` in a `raw id` +pattern may be hard to change but a pointer upholds almost no invariants on its +own. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +`&raw ` was also proposed to achieve getting a pointer to a field. The +pattern/match syntax has several advantages over place syntax: + +* Place expressions are overloaded with auto-deref, custom indexing + (`core::ops::Index`/`core::ops::IndexMut`), invoking arbitrary user code. A + solution with place syntax needs to explicitely forbid these forms of place + statements, both to disallow user code and avoid accidental reference + intermediates. The new statements thus resembles a very different other + statement. +* The initial dereferencing of the pointer necessary for a place expression + (`struct.field` is implicitely `(*struct).field` for a reference argument + `struct`) will not work with pointer arguments, which do no automatically + dereference even in unsafe code (and arguably should not, outside `&raw`). +* `raw` feels more natural when paralleling `ref` instead of appearing as yet + an *additional* qualifier on `&` that is not associated with pointers + in the first place and confusingly also requires `const` in spite of `&` + suggesting the opposite. +* It provides a clear pattern that extends to enum fields in packed structs, + which are not absolutely not expressible in place syntax. + +In contrast, patterns fully follow the structural nature of algebraic data +types without customization points in the form of `core::ops`. This makes them +a perfect match when the possibilities should be restricted to exactly those +options. + +Not doing this would keep surface level code for creating pointers error prone +or impossible, independent of the underlying MIR changes. + +# Prior art +[prior-art]: #prior-art + +C++ state-of-the-art, to my best knowledge, also uses the usual lvalue +expression for a pointer to a field. This has several pitfalls: Classes may +overwrite the pointer dereference operator `->`, and the pointer creation +operator `&`. Actually conformant generic code thus requires additional +artificial constructs and a syntax that does not resemble lvalue syntax. +Additionally, most of the operator are not defined while their target object is +not life, making them unfit for initialization of uninitialized objects. + +C (and C++ to an extent) also have `offsetof`, a macro based solution to get +the byte offset of a field. This only works reliably for [a very restricted set +of types](https://en.cppreference.com/w/cpp/named_req/StandardLayoutType). This +essentially is the analogue of `#[repr(C)]` in Rust. A `static_assert` based +solution can help unwittingly triggering undefined behaviour on other types. + +No other algebraic language with the memory model of Rust is known to the +author, thus comparisons in this way are sparse. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +The exact syntax for pointer patterns, while `raw` as a contextual keyword has +already some association with pointer to place it need not be the final answer. + +The restrictions on pointer binding mode that are only based on not implicitely +reading memory (enum variants, constants, references, bindings) do not add real +safety, as the matching must occur within an `unsafe` block in any case. +However, they likely do protect against accidental usage similar to auto-deref +in a place expression. They may arguable be more a nuisance than a safety help +nonetheless. + +Address calculation will likely depend on not overflow the pointer, i.e. behave +like `pointer::add` but could also utilize `pointer::wrapping_add` instead. +That would make the code safer but provide fewer optimization opportunities. +Also, wrapping addition could promote use to get (specific) field offsets, +within the limits of layout guarantees offered by rust. Since it occurs in an +unsafe block, the burden of fulfilling necessary preconditions ultimately +relies on the programmer. + +`ref mut? identifier` within pointer patterns may be disallowed or not. `raw +identifier` pattern. For half-initialized structs where validity and alignment +of the underlying struct has been checked but `&mut` referencing the complete +struct is not safe due to uninitialized fields this is also useful. +Alternatively, this could be disallowed if not useful enough or it seems to +promote undefined behaviour. + +# Future possibilities +[future-possibilities]: #future-possibilities + +Some pointer binding matches may be safer than the required `unsafe` suggests: +For example the pointer retrieved from `MaybeUninit` guarantees that the memory +is actually backed by some allocation and thus the offset calculations can both +utilize `inbounds` and will never overflow. It could be possible to remove the +need for an `unsafe` block around such matches if they don't use any of the +memory-reading-patterns discussed in unresolved questions. From f97ff9fc8b08352f5c4b7592848c8eca41e69188 Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Fri, 22 Mar 2019 01:10:49 +0100 Subject: [PATCH 02/10] Add reference to MIR pr --- text/0000-pointer-match.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md index 9d1fe974fb6..700041b441f 100644 --- a/text/0000-pointer-match.md +++ b/text/0000-pointer-match.md @@ -188,6 +188,9 @@ solution can help unwittingly triggering undefined behaviour on other types. No other algebraic language with the memory model of Rust is known to the author, thus comparisons in this way are sparse. +The PR [#2582](https://github.com/rust-lang/rfcs/pull/2582) contains the +necessary MIR operations to perform the address calculations themselves. + # Unresolved questions [unresolved-questions]: #unresolved-questions From 2aaf3e30fd2374d09b244db25fbc10d4320cf5d9 Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Fri, 22 Mar 2019 01:35:03 +0100 Subject: [PATCH 03/10] More motivating examples for adoption of the unsafe parts --- text/0000-pointer-match.md | 63 +++++++++++++++++++++++++++++++++----- 1 file changed, 56 insertions(+), 7 deletions(-) diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md index 700041b441f..9c6631f8093 100644 --- a/text/0000-pointer-match.md +++ b/text/0000-pointer-match.md @@ -61,7 +61,36 @@ implicitely performing an unintended, unsafe read through the pointer. Pointer binding mode will at first only permit ultimately binding with `raw` and `ref` and not actually reading the contained memory. -The raw identifier pattern does not require `unsafe` on its own. +The raw identifier pattern does not require `unsafe` on its own (as seen above, +where we safely match a `&mut` but bind to `*mut`). + +This is not only useful for packed fields, but also to access the fields of +*any* object that is only available via pointer because its state invariants +may not yet be fulfilled. Instead of manually doing pointer math: + +``` +/// Repr Rust, so no layout guarantees, no pointer operations to get to `a`. +struct Weird { + /// Always valid, no matter the memory content. + something_i_dont_care_about: u8, + + /// Must only be one of `true` and `false` for a &Weird. + a: bool, +} + +/// Unsafety invariants: `w` must +/// * point to some allocation of at least `std::mem::size_of::`. +/// * point to memory valid for the chosen lifetime `'a` +/// * be properly aligned. +unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> { + let Weird { raw ptr a, ..} = w; + match core::ptr::read(a) { + 0 | 1 => Some(std::mem::transmute(w)), + _ => None + } +} +``` + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -86,22 +115,23 @@ fn overwrite_packed_field(foo: &mut Foo ) { // Actually safe! let Foo { field: Enum::A(raw mut x), } = foo; - // Write itself not safe, as we write to a pointer. + // Write itself not safe, as we write to a pointer :/ unsafe { ptr::write_unaligned(x, 0) }; } ``` -The new pattern forms a new kind of field binding, and should be inserted into -the grammar as an option for identifier and StructPatternField, next to -`id: pattern`, `tuple_index: pattern`, `ref? mut? identifier`. Pointer pattern -uses the obvious `*pattern` and is only allowed in unsafe blocks. +The new pattern `raw (const|mut) identifier` forms a new kind of field binding, +and should be inserted into the grammar as an option for identifier and +StructPatternField, next to `id: pattern`, `tuple_index: pattern`, `ref? mut? +identifier`. Pointer pattern uses the obvious `*pattern` and is only allowed in +unsafe blocks. Allowed [patterns](https://doc.rust-lang.org/reference/patterns.html) within pointer patterns (and thus in the sugar of pointer binding mode) are: wildcard patttern, path patterns that don't refer to enum variants or constants, struct patterns, tuple patterns, fixed size array patterns, where the last three are only not allowed to bind their fields with the new pointer pattern and with -`..`, potentially also with `ref mut? identifier`, but not `mut? identifier`. +`..`, potentially also with `ref mut? identifier`, but not `mut? identifier`. Some further notes on (dis-)allowed patterns: * The restrictions don't apply to matching the pointer value itself, as that @@ -114,6 +144,25 @@ Some further notes on (dis-)allowed patterns: * slice patterns would require size information not present in the pointer. * `ref mut? identifier` may be useful, but may be too tempting sometimes. +Since pointer patterns are guaranteed to not rely on the pointed-to memory +invariants, it can also be used to match union fields in interesting ways. +Maybe this is interesting for custom enum-like-encapsulations? + +``` +union Mix { + f1: (bool, u8), + f2: (u8, bool), +} + +let mut m = Mix { f2: (3, true), }; +// f1.0 is not validly initialized, don't grab reference. +let Mix { f1: (raw f1_0_ptr, _), } = &m; +// Initialize f1.0 through valid f2.0 +m.f2.0 = 0; +// Now we can grab the reference. +let f1_0 = unsafe { &*f1_0_ptr }; +``` + # Drawbacks [drawbacks]: #drawbacks From 565487ec848cb590f5218846cbc7a1e54a60c13a Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Fri, 22 Mar 2019 01:48:38 +0100 Subject: [PATCH 04/10] Fix the raw-pattern being applied inconsistently --- text/0000-pointer-match.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md index 9c6631f8093..022b9419067 100644 --- a/text/0000-pointer-match.md +++ b/text/0000-pointer-match.md @@ -83,7 +83,7 @@ struct Weird { /// * point to memory valid for the chosen lifetime `'a` /// * be properly aligned. unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> { - let Weird { raw ptr a, ..} = w; + let Weird { raw const a, ..} = w; match core::ptr::read(a) { 0 | 1 => Some(std::mem::transmute(w)), _ => None @@ -156,7 +156,7 @@ union Mix { let mut m = Mix { f2: (3, true), }; // f1.0 is not validly initialized, don't grab reference. -let Mix { f1: (raw f1_0_ptr, _), } = &m; +let Mix { f1: (raw const f1_0_ptr, _), } = &m; // Initialize f1.0 through valid f2.0 m.f2.0 = 0; // Now we can grab the reference. @@ -245,6 +245,9 @@ necessary MIR operations to perform the address calculations themselves. The exact syntax for pointer patterns, while `raw` as a contextual keyword has already some association with pointer to place it need not be the final answer. +An alternative is, of course, a contextual keyword `ptr` for that pattern. +However, `ptr` will be more ambiguous should a similar syntax be adopted +outside of patterns. The restrictions on pointer binding mode that are only based on not implicitely reading memory (enum variants, constants, references, bindings) do not add real From 85440c548868120621ddea40b3a578f553c24dc0 Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Fri, 22 Mar 2019 14:42:21 +0100 Subject: [PATCH 05/10] Add missing pointer cast to read u8 instead of bool --- text/0000-pointer-match.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md index 022b9419067..db84492e389 100644 --- a/text/0000-pointer-match.md +++ b/text/0000-pointer-match.md @@ -84,7 +84,7 @@ struct Weird { /// * be properly aligned. unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> { let Weird { raw const a, ..} = w; - match core::ptr::read(a) { + match core::ptr::read(a as *const u8) { 0 | 1 => Some(std::mem::transmute(w)), _ => None } From a226af12c867f7540e669301baa0b7b950fbd8f0 Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Fri, 22 Mar 2019 15:14:59 +0100 Subject: [PATCH 06/10] Extend the description and explanatory examples in reference section --- text/0000-pointer-match.md | 67 ++++++++++++++++++++++++++++++++------ 1 file changed, 57 insertions(+), 10 deletions(-) diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md index db84492e389..38878fe5dd4 100644 --- a/text/0000-pointer-match.md +++ b/text/0000-pointer-match.md @@ -95,11 +95,39 @@ unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> { # Reference-level explanation [reference-level-explanation]: #reference-level-explanation +The newly introduced patterns are: + +* `raw (const|mut) identifier`; allowed for field bindings and identifier bindings. + These are allowed in the grammar where `ref? mut? identifier` is allowed + currently. For this purpose `raw` is a contextual keyword. +* `* `; to match a pointer not by value but to additionally use + structural patterns to get pointers to the fields of its underlying type. + Their use requires an `unsafe`-block around the expression in which they + appear, be it match or irrefutable bindings. However, `` does not + allow arbitrary content, this is subject to discussion and future options. + +In pointer binding mode, the top-level pattern is wrapped in `*` if it is a +non-reference and non-pointer pattern. This should be analogue to [reference +binding mode](https://doc.rust-lang.org/reference/patterns.html#binding-modes) +where the wrapping and existence of the pointer patterns serves as +disambiguation in fringe cases. + +``` +match (0 as *mut usize) { + // What's currently possible, this is a reference pattern and does no pointer-wrapping. + x => (), + // This reference pattern gets a pointer to the pointer. + raw const y => (), + // This explicit pointer-pattern gets a const pointer to the pointed-to place. + *raw const z => (), +} +``` + The calculation of the value from a pointer pattern will not use an `inbounds` qualifier when passed to llvm. -There is no restriction on matching enum variants and slices, such that this is -possible: +There is no restriction on raw-patterns appearing within matching of enum +variants and slices, such that this is possible: ``` #[repr(packed)] @@ -120,12 +148,6 @@ fn overwrite_packed_field(foo: &mut Foo ) { } ``` -The new pattern `raw (const|mut) identifier` forms a new kind of field binding, -and should be inserted into the grammar as an option for identifier and -StructPatternField, next to `id: pattern`, `tuple_index: pattern`, `ref? mut? -identifier`. Pointer pattern uses the obvious `*pattern` and is only allowed in -unsafe blocks. - Allowed [patterns](https://doc.rust-lang.org/reference/patterns.html) within pointer patterns (and thus in the sugar of pointer binding mode) are: wildcard patttern, path patterns that don't refer to enum variants or constants, struct @@ -141,8 +163,10 @@ Some further notes on (dis-)allowed patterns: pointed-to place, and implicitely assert their type's invariants. Better to keep those operations separate. * no pointer patterns within pointer patterns, must also actually read memory. -* slice patterns would require size information not present in the pointer. -* `ref mut? identifier` may be useful, but may be too tempting sometimes. +* `ref mut? identifier` may be useful, but may be too tempting sometimes. It + essentially performs a cast of pointer-to-reference and thus comes with the + same caveats: The programmer must ensure liveness and alignment. However, + cast with `transmute` or `as _` is much more explicit. Since pointer patterns are guaranteed to not rely on the pointed-to memory invariants, it can also be used to match union fields in interesting ways. @@ -163,6 +187,29 @@ m.f2.0 = 0; let f1_0 = unsafe { &*f1_0_ptr }; ``` +Match unsized values should also simply work, I don't see any complication over +matching those by reference as the pointer already includes the necessary +(length)-metadata. With regards to network protocols, this would become much, +much cooler with unsized unions but you can't have your cake and eat it, yet. + +``` +#[repr(C)] +struct net_pckt { + protocol_type: u8, + content: [u8], +}; + +unsafe { + // Works nicely even with changes to the packet structure. + let net_pckt { raw mut content, ..} = uninitialized_packet_ptr; + // Get a pointer two bytes into the content. The pointer has the necessary length-metadata. + match content { + [_, _, raw ptr] => /* Packet large enough */ (), + _ => return Err(Error::Truncated), + } +} +``` + # Drawbacks [drawbacks]: #drawbacks From 963486197ff6a3a2e0e849444b3d37f2c502ab24 Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Fri, 29 Mar 2019 15:13:16 +0100 Subject: [PATCH 07/10] Make pointer pattern const/mut qualification necessary It's more consistent with reference patterns that way and there is no inherent upside to not having to specify it that I know of. --- text/0000-pointer-match.md | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md index 38878fe5dd4..28a37e3bbd3 100644 --- a/text/0000-pointer-match.md +++ b/text/0000-pointer-match.md @@ -100,17 +100,19 @@ The newly introduced patterns are: * `raw (const|mut) identifier`; allowed for field bindings and identifier bindings. These are allowed in the grammar where `ref? mut? identifier` is allowed currently. For this purpose `raw` is a contextual keyword. -* `* `; to match a pointer not by value but to additionally use - structural patterns to get pointers to the fields of its underlying type. - Their use requires an `unsafe`-block around the expression in which they - appear, be it match or irrefutable bindings. However, `` does not - allow arbitrary content, this is subject to discussion and future options. - -In pointer binding mode, the top-level pattern is wrapped in `*` if it is a -non-reference and non-pointer pattern. This should be analogue to [reference -binding mode](https://doc.rust-lang.org/reference/patterns.html#binding-modes) -where the wrapping and existence of the pointer patterns serves as -disambiguation in fringe cases. +* `* (const|mut) `; to match a pointer not by value but to + additionally use structural patterns to get pointers to the fields of its + underlying type. Their use requires an `unsafe`-block around the expression + in which they appear, be it match or irrefutable bindings. However, + `` does not allow arbitrary content, this is subject to + discussion and future options. + +In pointer binding mode, the top-level pattern is wrapped in `* (const|mut)` if +it is a non-reference and non-pointer pattern. This should be analogue to +[reference binding +mode](https://doc.rust-lang.org/reference/patterns.html#binding-modes) where +the wrapping and existence of the pointer patterns serves as disambiguation in +fringe cases. ``` match (0 as *mut usize) { @@ -119,7 +121,7 @@ match (0 as *mut usize) { // This reference pattern gets a pointer to the pointer. raw const y => (), // This explicit pointer-pattern gets a const pointer to the pointed-to place. - *raw const z => (), + *mut raw const z => (), } ``` From 245370c37fdaa873677112d697f41169e4bd662d Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Sun, 31 Mar 2019 20:42:14 +0200 Subject: [PATCH 08/10] Ergonomic irrefutable considerations and adjusted notes on automatic pointer wrapping --- text/0000-pointer-match.md | 47 ++++++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 10 deletions(-) diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md index 28a37e3bbd3..517196857dd 100644 --- a/text/0000-pointer-match.md +++ b/text/0000-pointer-match.md @@ -49,7 +49,7 @@ struct Foo { } fn ptr_b(foo: &mut Foo) -> *mut u32 { - let Foo { raw mut b, .. } = foo; + let raw mut b = foo.b; b } ``` @@ -83,6 +83,7 @@ struct Weird { /// * point to memory valid for the chosen lifetime `'a` /// * be properly aligned. unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> { + // No need to deref `w`. let Weird { raw const a, ..} = w; match core::ptr::read(a as *const u8) { 0 | 1 => Some(std::mem::transmute(w)), @@ -107,24 +108,36 @@ The newly introduced patterns are: `` does not allow arbitrary content, this is subject to discussion and future options. -In pointer binding mode, the top-level pattern is wrapped in `* (const|mut)` if -it is a non-reference and non-pointer pattern. This should be analogue to +In pointer binding mode, non-top-level standard named bindings default to `raw +_` bindings depending on the pointer type. This should be analogue to [reference binding -mode](https://doc.rust-lang.org/reference/patterns.html#binding-modes) where -the wrapping and existence of the pointer patterns serves as disambiguation in -fringe cases. +mode](https://doc.rust-lang.org/reference/patterns.html#binding-modes). +Top-level bindings do not default to raw bindings and also need other +consideration for backwards compabitility. ``` +const NULL: *mut usize = core::ptr::null(); match (0 as *mut usize) { - // What's currently possible, this is a reference pattern and does no pointer-wrapping. + // What's currently possible, this is a binding pattern and does no pointer-wrapping. x => (), - // This reference pattern gets a pointer to the pointer. + // Also currently possible, top-level value pattern also does no pointer-wrapping. + NULL => (), + // This pointer binding pattern gets a pointer to the pointer. raw const y => (), - // This explicit pointer-pattern gets a const pointer to the pointed-to place. + // This explicit pointer-pattern gets a mut pointer to the pointed-to place. We are in + // mutable pointer binding mode and the binding is not top-level. This one is useless. + *mut z = (), + // This explicit pointer-pattern gets a const pointer to the pointed-to place, like cast. *mut raw const z => (), } ``` +In cases where the pattern is not a binding or a value pattern, pointer +patterns may be elided and automatically wrap inner patterns by the compiler as +required. This mechanism is currently used for references as well and there are +no expected collisions as the wrapper is added according to the requirements of +the matched type, which is either reference or pointer. + The calculation of the value from a pointer pattern will not use an `inbounds` qualifier when passed to llvm. @@ -170,6 +183,19 @@ Some further notes on (dis-)allowed patterns: same caveats: The programmer must ensure liveness and alignment. However, cast with `transmute` or `as _` is much more explicit. +Pointer patterns and raw bindings are also irrefutable patterns and can thus be +used in `let`-bindings and similar. This was used in the example in the +guide-level explanation: + +``` +fn ptr_b(foo: &mut Foo) -> *mut u32 { + // rhs of this let-binding is a place expression to mutable field `b`. + // `raw mut` is irrefutable and gets a pointer to it without reference. + let raw mut b = foo.b; + b +} +``` + Since pointer patterns are guaranteed to not rely on the pointed-to memory invariants, it can also be used to match union fields in interesting ways. Maybe this is interesting for custom enum-like-encapsulations? @@ -182,7 +208,7 @@ union Mix { let mut m = Mix { f2: (3, true), }; // f1.0 is not validly initialized, don't grab reference. -let Mix { f1: (raw const f1_0_ptr, _), } = &m; +let raw const f1_0_ptr = m.f1.0; // Initialize f1.0 through valid f2.0 m.f2.0 = 0; // Now we can grab the reference. @@ -201,6 +227,7 @@ struct net_pckt { content: [u8], }; +let uninitialized_packet_ptr: *const net_pckt = unimplemented!(); unsafe { // Works nicely even with changes to the packet structure. let net_pckt { raw mut content, ..} = uninitialized_packet_ptr; From 1be2a63a142444ec11b1e7ff524578861326d045 Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Sun, 31 Mar 2019 20:57:27 +0200 Subject: [PATCH 09/10] Expand on the rationale of auto-deref in place expressions --- text/0000-pointer-match.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md index 517196857dd..237687c3272 100644 --- a/text/0000-pointer-match.md +++ b/text/0000-pointer-match.md @@ -242,10 +242,10 @@ unsafe { # Drawbacks [drawbacks]: #drawbacks -Match syntax is 'more heavy' than a place based syntax in some or many cases. -On the other side of the coin, initializing a struct often involves grabbing -pointers to all fields, where matching is much terser than each indivdual -expression. +Match syntax is 'more heavy' than a place based expression syntax in some or +many cases. On the other side of the coin, initializing a struct often +involves grabbing pointers to all fields, where matching is much terser than +each indivdual expression. The additional pointer binding mode for match expressions may be confusing due to the non-explicit pointer nature of its argument. @@ -274,6 +274,14 @@ pattern/match syntax has several advantages over place syntax: statements, both to disallow user code and avoid accidental reference intermediates. The new statements thus resembles a very different other statement. + + Through a `raw` pattern, usable in irrefutable bindings, it is a choice of + the programmer to use auto-deref within the rhs place statement, just as + familar, but also to explicitely avoid it when used in the lhs-pattern. By + the introduced pointer pattern it is also never required to rely on + auto-deref within a pointer deref in an `unsafe` block, where one could + accidentally invoke a `Deref::deref` implementation on an unintended + reference to get a mentioned field (e.g. after a refactoring). * The initial dereferencing of the pointer necessary for a place expression (`struct.field` is implicitely `(*struct).field` for a reference argument `struct`) will not work with pointer arguments, which do no automatically From 0bd798a0119cdc87d481799bd511d87dc9760a3b Mon Sep 17 00:00:00 2001 From: HeroicKatora Date: Tue, 16 Apr 2019 09:46:26 +0200 Subject: [PATCH 10/10] Add examples for denied value matching in pointer patterns --- text/0000-pointer-match.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/text/0000-pointer-match.md b/text/0000-pointer-match.md index 237687c3272..3d8b54e8a24 100644 --- a/text/0000-pointer-match.md +++ b/text/0000-pointer-match.md @@ -183,6 +183,37 @@ Some further notes on (dis-)allowed patterns: same caveats: The programmer must ensure liveness and alignment. However, cast with `transmute` or `as _` is much more explicit. +Pattern structure is enforced *after* the automatic addition of reference and +pointer patterns. This ensures that even implicitely added pointer patterns can +not result in a read through the pattern. The simplest of these accidental +mistakes are however already prevent by the logic of adding pointer patterns as +the pointer itself could be matched by a value pattern. + +``` +fn no_match_on_value(b: *const bool) -> usize { + match b { + // Error: mismatching type, expected value of type `*const bool` ... + true => 0, + // ^^^^ ... for this pattern + _ => 1, + } +} + +struct Foo { + bar: usize, +} + +fn no_match_on_implicit_value(b: *const Foo) -> usize { + match b { + // Error: can not match by value within pointer pattern. + Foo { bar: 0 } => 0, + // ^^^ .. referring to this value pattern + // Note: pointer pattern occurs in expansion to `*const Foo { bar: 0 }` + _ => 1, + } +} +``` + Pointer patterns and raw bindings are also irrefutable patterns and can thus be used in `let`-bindings and similar. This was used in the example in the guide-level explanation: