From c4ba967b166b3751770c6a38c9f41573bda5bb92 Mon Sep 17 00:00:00 2001 From: Ramon de C Valle Date: Wed, 13 Apr 2022 20:45:24 -0700 Subject: [PATCH 1/5] Add text/0000-improve-c-types-for-cross-language-cfi.md --- ...-improve-c-types-for-cross-language-cfi.md | 222 ++++++++++++++++++ 1 file changed, 222 insertions(+) create mode 100644 text/0000-improve-c-types-for-cross-language-cfi.md diff --git a/text/0000-improve-c-types-for-cross-language-cfi.md b/text/0000-improve-c-types-for-cross-language-cfi.md new file mode 100644 index 00000000000..b3d1e96ebed --- /dev/null +++ b/text/0000-improve-c-types-for-cross-language-cfi.md @@ -0,0 +1,222 @@ +- Feature Name: `improve-c-types-for-cross-language-cfi` +- Start Date: 2022-07-25 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Improve C types to be able to identify C char and integer type uses at the time +types are encoded for cross-language LLVM CFI support. + +# Motivation +[motivation]: #motivation + +As the industry continues to explore Rust adoption, the absence of support for +forward-edge control flow protection in the Rust compiler is a major security +concern when migrating to Rust by gradually replacing C or C++ with Rust, and C +or C++ and Rust -compiled code share the same virtual address space. Thus, +support for forward-edge control flow protection needs to be added to the Rust +compiler and is a requirement for large-scale secure Rust adoption. For more +information about LLVM CFI and cross-language LLVM CFI support, see the design +document in the tracking issue [#89653][1][[1]]. + +## Type metadata +[type-metadata]: #type-metadata + +LLVM uses type metadata to allow IR modules to aggregate pointers by their +types.[[2]] This type metadata is used by LLVM Control Flow Integrity to test +whether a given pointer is associated with a type identifier (i.e., test type +membership). + +Clang uses the Itanium C++ ABI's[[3]] virtual tables and RTTI typeinfo +structure name[[4]] as type metadata identifiers for function pointers. The +typeinfo name encoding is a two-character code (i.e., “TS”) prefixed to the +type encoding for the function. + +For cross-language LLVM CFI support, a compatible encoding must be used by +either + + 1. Using a superset of types that encompasses types used by Clang (i.e., + Itanium C++ ABI's type encodings[[5]]), or at least types used at the FFI + boundary. + + 2. Reducing the types to the least common denominator between types used by + Clang (or at least types used at the FFI boundary) and the Rust compiler + (if even possible). + + 3. Creating a new encoding for cross-language CFI and using it for Clang and + Rust compilers (and possibly other compilers). + +Option (1) provides a more comprehensive protection than option (2) and (3) for +Rust-compiled only code and when interoperating with foreign code written in C +and possibly other languages. + +Option (2) may result in less comprehensive protection for Rust-compiled only +code, so it should be provided as an alternative to a Rust-specific encoding +for when mixing Rust and C and C++ -compiled code. + +Option (3) would require changes to Clang to use the new encoding and, +depending on its requirements, may result in less comprehensive protection for +Rust-compiled only code and when interoperating with foreign code written in C +and other languages, similarly to option (2), so it should also be provided as +an alternative to a Rust-specific encoding for when mixing Rust and other +languages -compiled code. + +## Defined type metadata identifiers (using Itanium C++ ABI) +[defined-type-metadata-1]: #defined-type-metadata-1 + +Option (1) is satisfied by using the Itanium C++ ABI with vendor extended type +qualifiers and types for Rust types that are not used at the FFI boundary. +Table II in the design document in the tracking issue [#89653][1][[1]] defines +type metadata identifiers for cross-language LLVM CFI support using option (1). + +## Defined type metadata identifiers (using new encoding for cross-language CFI) +[defined-type-metadata-2]: #defined-type-metadata-2 + +Option (3) was also explored with the Clang CFI team by defining a new encoding +for cross-language CFI. This new encoding needed to be language agnostic and +ideally compatible with any other language. It also needed to support extended +types in case it was used as the main encoding to provide forward-edge control +flow protection. + +To satisfy these requirements, however, this new encoding neither distinguishes +between certain types (e.g., bool, char, integers, and enums) nor discriminates +between pointed element types (the latter mainly because of C’s void * abuse). +This results in less comprehensive protection for Rust-compiled only code and +when interoperating with foreign code written in C, so this encoding will be +implemented and provided as an alternative option for interoperating with +foreign code written in languages other than C. + +Option (3) is satisfied by using this new encoding with extended types for Rust +types that are not used at the FFI boundary. Table III in the design document +in the tracking issue [#89653][1][[1]] defines type metadata identifiers for +cross-language LLVM CFI support using option (3). + +## Rust vs C char and integer types + +Rust defines char as a Unicode scalar value, which is different from C’s char. +On most modern systems, C’s char is either an 8-bit signed or unsigned integer. +The Itanium C++ ABI specifies a distinct encoding for it (i.e., ‘c’). + +Rust also uses explicitly-sized integer types (i.e., `i8`, `i16`, `i32`, ...) +while C uses abstract integer types (i.e., `char`, `short`, `long`, ...), which +actual sizes are implementation defined and may vary across different systems. +The Itanium C++ ABI specifies encodings for the C integer types (i.e., `char`, +`short`, `long`, ...), not their defined representations/sizes (i.e., 8-bit +unsigned integer, 16-bit unsigned integer, 32-bit unsigned integer, ...). + +For convenience, some C-like type aliases are provided by libcore and libstd +(and also by the libc crate) for use when interoperating with foreign code +written in C. For instance, one of these type aliases is `c_char`, which is a +type alias to Rust’s `i8`. + +To be able to encode these correctly, the Rust compiler must be able to +identify C char and integer type uses at the time types are encoded, and the C +type aliases may be used for disambiguation. However, at the time types are +encoded, all type aliases are already resolved to their respective `ty::Ty` +type representations[[6]] (i.e., their respective Rust aliased types), making +it currently not possible to identify C char and integer type uses from their +resolved types. + +The Rust compiler also assumes that C char and integer types and their +respective Rust aliased types can be used interchangeably. These assumptions +can not be maintained when forward-edge control flow protection is enabled, at +least not at the FFI boundary (i.e., for extern function types with the "C" +calling convention). + +To be able to use the defined type metadata identifiers defined using option +(1), the Rust compiler must be changed to: + + * be able to identify C char and integer type uses at the time types are + encoded. + + * not assume that C char and integer types and their respective Rust aliased + types can be used interchangeably when forward-edge control flow protection + is enabled, at least not at the FFI boundary. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +TBD. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +To be able to use the defined type metadata identifiers defined using option +(1), the Rust compiler must be changed to: + + * be able to identify C char and integer type uses at the time types are + encoded. + + * not assume that C char and integer types and their respective Rust aliased + types can be used interchangeably when forward-edge control flow protection + is enabled, at least not at the FFI boundary. + +This may be done by either: + + 1. creating a new set of transitional C types in `core::ffi` as user-defined + types using `repr(transparent)` to be used at the FFI boundary (i.e., for + extern function types with the "C" calling convention) when cross-language + CFI support is needed (and taking the opportunity to consolidate all C + types in `core::ffi`). + + 2. changing the currently existing C types in `std::os::raw` to user-defined + types using `repr(transparent)`. + + 3. changing C types to `ty::Foreign` and changing `ty::Foreign` to be able to + represent them. + + 4. creating a new `ty::C` for representing C types. + +Option (1) is opt in for when cross-language CFI support is needed, and +requires the user to use the new set of transitional C types for extern +function types with the "C" calling convention. + +Option (2), (3), and (4) are backward-compatibility breaking changes and will +require changes to existing code that use C types. + +# Drawbacks +[drawbacks]: #drawbacks + +The Rust compiler assumes that C char and integer types and their respective +Rust aliased types can be used interchangeably. These assumptions can not be +maintained when forward-edge control flow protection is enabled, at least not +at the FFI boundary (i.e., for extern function types with the "C" calling +convention). + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +## Why not use the v0 mangling scheme? + +Unfortunately, the v0 mandling scheme can not be used as an encoding for +cross-language CFI support due to the lack of support by other compilers, +mainly Clang. + +# Prior art +[prior-art]: #prior-art + +The author is currently not aware of any cross-language CFI implementation and +support by any other compiler and language. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +See [Reference-level explanation][reference-level-explanation]. + +# Future possibilities +[future-possibilities]: #future-possibilities + +The defined type metadata identifiers using Itanium C++ ABI not only allows +cross-language CFI support, but also provides a more comprehensive protection +than a new encoding for cross-language CFI, while also allowing further +improvements for both CFI and cross-language CFI support (e.g., increasing +granularity by adding information, etc.). + +[1]: "R. de C Valle. “Tracking Issue for LLVM Control Flow Integrity (CFI) Support for Rust #89653.” GitHub." +[2]: "\"Type Metadata.\" LLVM Documentation." +[3]: "\"Itanium C++ ABI\"." +[4]: "\"Virtual Tables and RTTI\". Itanium C++ ABI." +[5]: "\"Type Encodings\". Itanium C++ ABI." +[6]: "\"The ty module: representing types\". Guide to Rustc Development." From 0444aaa3ace67a6a5c674b552cef9b41f0f9157c Mon Sep 17 00:00:00 2001 From: Ramon de C Valle Date: Wed, 24 Aug 2022 12:12:16 -0700 Subject: [PATCH 2/5] Update text/0000-improve-c-types-for-cross-language-cfi.md --- ...-improve-c-types-for-cross-language-cfi.md | 462 +++++++++++++----- 1 file changed, 349 insertions(+), 113 deletions(-) diff --git a/text/0000-improve-c-types-for-cross-language-cfi.md b/text/0000-improve-c-types-for-cross-language-cfi.md index b3d1e96ebed..8b872b4a352 100644 --- a/text/0000-improve-c-types-for-cross-language-cfi.md +++ b/text/0000-improve-c-types-for-cross-language-cfi.md @@ -15,85 +15,171 @@ types are encoded for cross-language LLVM CFI support. As the industry continues to explore Rust adoption, the absence of support for forward-edge control flow protection in the Rust compiler is a major security concern when migrating to Rust by gradually replacing C or C++ with Rust, and C -or C++ and Rust -compiled code share the same virtual address space. Thus, -support for forward-edge control flow protection needs to be added to the Rust -compiler and is a requirement for large-scale secure Rust adoption. For more -information about LLVM CFI and cross-language LLVM CFI support, see the design -document in the tracking issue [#89653][1][[1]]. +or C++ and Rust -compiled code share the same virtual address space. + +A safe language -compiled code such as Rust, when sharing the same virtual +address space with an unsafe language -compiled code such as C or C++, may +degrade the security of a program because of different assumptions about +language properties and availability of security features such as exploit +mitigations. + +The issue this RFC aims to solve is an example of this, where entirely safe +Rust-compiled code, when sharing the same virtual address space with C or C++ +-compiled code with forward-edge control flow protection, may degrade the +security of the program because the indirect branches in Rust-compiled code are +not validated, allowing forward-edge control flow protection to be trivially +bypassed. + +This has been extensively discussed[[1]][[2]][[3]][[4]][[5]], and just recently +formalized[[6]] as a new class of attack (i.e., cross-language attacks). It was +also one of the major reasons that initiatives such as Rust GCC--which this +author also fully support--were funded[[5]]. + +Therefore, support for forward-edge control flow protection needs to be added +to the Rust compiler and is a requirement for large-scale secure Rust adoption. +For more information about this project, see the design document in the +tracking issue [#89653][7][[7]]. ## Type metadata [type-metadata]: #type-metadata LLVM uses type metadata to allow IR modules to aggregate pointers by their -types.[[2]] This type metadata is used by LLVM Control Flow Integrity to test +types.[[8]] This type metadata is used by LLVM Control Flow Integrity to test whether a given pointer is associated with a type identifier (i.e., test type membership). -Clang uses the Itanium C++ ABI's[[3]] virtual tables and RTTI typeinfo -structure name[[4]] as type metadata identifiers for function pointers. The +Clang uses the Itanium C++ ABI's[[9]] virtual tables and RTTI typeinfo +structure name[[10]] as type metadata identifiers for function pointers. The typeinfo name encoding is a two-character code (i.e., “TS”) prefixed to the type encoding for the function. For cross-language LLVM CFI support, a compatible encoding must be used by either - 1. Using a superset of types that encompasses types used by Clang (i.e., - Itanium C++ ABI's type encodings[[5]]), or at least types used at the FFI - boundary. - - 2. Reducing the types to the least common denominator between types used by - Clang (or at least types used at the FFI boundary) and the Rust compiler - (if even possible). - - 3. Creating a new encoding for cross-language CFI and using it for Clang and - Rust compilers (and possibly other compilers). - -Option (1) provides a more comprehensive protection than option (2) and (3) for -Rust-compiled only code and when interoperating with foreign code written in C -and possibly other languages. - -Option (2) may result in less comprehensive protection for Rust-compiled only -code, so it should be provided as an alternative to a Rust-specific encoding -for when mixing Rust and C and C++ -compiled code. - -Option (3) would require changes to Clang to use the new encoding and, -depending on its requirements, may result in less comprehensive protection for -Rust-compiled only code and when interoperating with foreign code written in C -and other languages, similarly to option (2), so it should also be provided as -an alternative to a Rust-specific encoding for when mixing Rust and other -languages -compiled code. - -## Defined type metadata identifiers (using Itanium C++ ABI) -[defined-type-metadata-1]: #defined-type-metadata-1 - -Option (1) is satisfied by using the Itanium C++ ABI with vendor extended type -qualifiers and types for Rust types that are not used at the FFI boundary. -Table II in the design document in the tracking issue [#89653][1][[1]] defines -type metadata identifiers for cross-language LLVM CFI support using option (1). - -## Defined type metadata identifiers (using new encoding for cross-language CFI) -[defined-type-metadata-2]: #defined-type-metadata-2 - -Option (3) was also explored with the Clang CFI team by defining a new encoding -for cross-language CFI. This new encoding needed to be language agnostic and -ideally compatible with any other language. It also needed to support extended -types in case it was used as the main encoding to provide forward-edge control -flow protection. - -To satisfy these requirements, however, this new encoding neither distinguishes -between certain types (e.g., bool, char, integers, and enums) nor discriminates -between pointed element types (the latter mainly because of C’s void * abuse). -This results in less comprehensive protection for Rust-compiled only code and -when interoperating with foreign code written in C, so this encoding will be -implemented and provided as an alternative option for interoperating with -foreign code written in languages other than C. - -Option (3) is satisfied by using this new encoding with extended types for Rust -types that are not used at the FFI boundary. Table III in the design document -in the tracking issue [#89653][1][[1]] defines type metadata identifiers for -cross-language LLVM CFI support using option (3). - -## Rust vs C char and integer types + 1. using Itanium C++ ABI mangling for encoding (which is currently used by + Clang). + + 2. creating a new encoding for cross-language CFI and using it for Clang and + the Rust compiler (and possibly other compilers). + +And + + * provide comprehensive protection for Rust-compiled only code if used as main + encoding (and not require an alternative Rust-specific encoding for + Rust-compiled only code). + + * provide comprehensive protection for C and C++ -compiled code when linking + foreign Rust-compiled code into a program written in C or C++. + + * provide comprehensive protection across the FFI boundary when linking + foreign Rust-compiled code into a program written in C or C++. + +### Providing comprehensive protection for Rust-compiled only code if used as main encoding +[protection-rust-compiled]: #protection-rust-compiled + +This item is satisfied by the encoding being able to comprehensively encode +Rust types. Both using Itanium C++ ABI mangling for encoding (1) and creating a +new encoding for cross-language CFI (2) may satisfy this item by providing +support for (language or vendor) extended types, by defining a comprehensive +encoding for Rust types using (language or vendor) extended types, and +implementing it in the Rust compiler. + +### Providing comprehensive protection for C and C++ -compiled code when linking foreign Rust-compiled code into a program written in C or C++ +[protection-c-cpp-compiled]: #protection-c-cpp-compiled + +This item is satisfied by the encoding being able to comprehensively encode C +and C++ types, and Clang being able to continue to use a comprehensive encoding +for C and C++ -compiled code when linking foreign Rust-compiled code into a +program written in C or C++. + +Both using Itanium C++ ABI mangling for encoding (1) and creating a new +encoding for cross-language CFI (2) may satisfy this item by providing support +for (language or vendor) extended types. However, a new encoding for +cross-language CFI (2) also requires defining a comprehensive encoding for C +and C++ types using (language or vendor) extended types, and implementing it in +Clang, so it is able to continue to use a comprehensive encoding for C and C++ +-compiled code when linking foreign Rust-compiled code into a program written +in C or C++. This introduces as much complexity and work as redefining Itanium +C++ ABI mangling and reimplementing it in Clang. + +Additionally, a new encoding for cross-language CFI (2), depending on its +requirements, may use a generalized encoding across the FFI boundary. This may +result in using a generalized encoding for all C and C++ -compiled code instead +of only across the FFI boundary, and may also require changes to Clang to use +the generalized encoding only across the FFI boundary (which may also require +new Clang extensions and changes to C and C++ code and libraries). + +Either using a generalized encoding for all C and C++ -compiled code or across +the FFI boundary do not satisfy this or the following item, and will degrade +the security of the program when linking foreign Rust-compiled code into a +program written in C or C++ because the program previously used a more +comprehensive encoding for all its compiled code. + +### Providing comprehensive protection across the FFI boundary when linking foreign Rust-compiled code into a program written in C or C++ +[protection-across-ffi-boundary]: #protection-across-ffi-boundary + +This item is satisfied by being able to encode uses of Rust or C types across +the FFI boundary by either + + * changing the Rust compiler to be able to identify and encode uses of C types + across the FFI boundary. + * changing Clang to be able to identify and encode uses of Rust types across + the FFI boundary. + +Both using Itanium C++ ABI mangling for encoding (1) and creating a new +encoding for cross-language CFI (2) require changing either the Rust compiler +or Clang to satisfy this item. + +It may also require changes to Rust or C and C++ code and libraries. Improving +C types for the Rust compiler to be able to identify C char and integer type +uses at the time types are encoded for cross-language LLVM CFI support is what +this RFC proposes. + +However, as described in the previous item, a new encoding for cross-language +CFI (2), depending on its requirements, may use a generalized encoding across +the FFI boundary, and while using a generalized encoding across the FFI +boundary does not require changing the Rust compiler or Clang to be able to +identify and encode uses of Rust or C types across the FFI boundary, it does +not satisfy this item either, and will degrade the security of the program when +linking foreign Rust-compiled code into a program written in C or C++ because +the program previously used a more comprehensive encoding for all its compiled +code. + +### Using Itanium C++ ABI mangling for encoding (1) versus creating a new encoding for cross-language CFI (2) +[itanium-vs-new-for-encoding]: #itanium-vs-new-for-encoding + +Using Itanium C++ ABI mangling for encoding (1) provides cross-language LLVM +CFI support with C and C++ -compiled code as is, provides more comprehensive +protection by satisfying all previous items, does not require changes to Clang, +and does not require any new Clang extensions and changes to C and C++ code and +libraries. + +While using Itanium C++ ABI mangling for encoding (1) requires the defining a +comprehensive encoding for Rust types using (language or vendor) extended types +and implementing it in the Rust compiler, creating a new encoding for +cross-language CFI (2) requires defining comprehensive encodings for both Rust +and C and C++ types using (language or vendor) extended types, and implementing +them in both the Rust compiler and Clang respectively. This introduces as much +complexity and work as redefining Itanium C++ ABI mangling and reimplementing +it in Clang. + +Additionally, a new encoding for cross-language CFI (2), depending on its +requirements, may provide less comprehensive protection by either using a +generalized encoding for all C and C++ -compiled code or across the FFI +boundary, not satisfying all previous items, requires changes to Clang, and may +require new Clang extensions and changes to C and C++ code and libraries. + +(See [Defined type metadata identifiers [creating a new encoding for +cross-language CFI]](defined-type-metadata-new).) + +## Defined type metadata identifiers (using Itanium C++ ABI mangling for encoding) +[defined-type-metadata-itanium]: #defined-type-metadata-itanium + +Table II in the design document in the tracking issue [#89653][7][[7]] defines +type metadata identifiers for cross-language LLVM CFI support using Itanium C++ +ABI mangling for encoding (1). + +### Rust vs C char and integer types Rust defines char as a Unicode scalar value, which is different from C’s char. On most modern systems, C’s char is either an 8-bit signed or unsigned integer. @@ -115,25 +201,54 @@ To be able to encode these correctly, the Rust compiler must be able to identify C char and integer type uses at the time types are encoded, and the C type aliases may be used for disambiguation. However, at the time types are encoded, all type aliases are already resolved to their respective `ty::Ty` -type representations[[6]] (i.e., their respective Rust aliased types), making +type representations[[11]] (i.e., their respective Rust aliased types), making it currently not possible to identify C char and integer type uses from their resolved types. The Rust compiler also assumes that C char and integer types and their respective Rust aliased types can be used interchangeably. These assumptions -can not be maintained when forward-edge control flow protection is enabled, at -least not at the FFI boundary (i.e., for extern function types with the "C" -calling convention). +can not be maintained across the FFI boundary (i.e., for extern function types +with the "C" calling convention passed as callbacks across the FFI boundary) +when forward-edge control flow protection is enabled. -To be able to use the defined type metadata identifiers defined using option -(1), the Rust compiler must be changed to: +To be able to use Itanium C++ ABI mangling for encoding (1) and provide +comprehensive protection across the FFI boundary when linking foreign +Rust-compiled code into a program written in C or C++, the Rust compiler must +be changed to * be able to identify C char and integer type uses at the time types are encoded. * not assume that C char and integer types and their respective Rust aliased - types can be used interchangeably when forward-edge control flow protection - is enabled, at least not at the FFI boundary. + types can be used interchangeably across the FFI boundary when forward-edge + control flow protection is enabled. + +## Defined type metadata identifiers (creating a new encoding for cross-language CFI) +[defined-type-metadata-new]: #defined-type-metadata-new + +Creating a new encoding for cross-language CFI (2) was also explored with the +Clang CFI team. This new encoding needed to be language agnostic and ideally +compatible with any other language. It also needed to support extended types in +case it was used as the main encoding to provide forward-edge control flow +protection. + +However, to satisfy these requirements, this new encoding neither distinguishes +between certain types (e.g., bool, char, integers, and enums) nor discriminates +between pointed element types (the latter mainly because of C’s void * abuse). + +This results in less comprehensive protection by either using a generalized +encoding for all C and C++ -compiled code or across the FFI boundary, and will +degrade the security of the program when linking foreign Rust-compiled code +into a program written in C or C++ because the program previously used a more +comprehensive encoding for all its compiled code. + +This encoding will be provided as an alternative option for interoperating with +foreign code written in languages other than C and C++ or that can not use +Itanium C++ ABI mangling for encoding. + +Table III in the design document in the tracking issue [#89653][7][[7]] defines +type metadata identifiers for cross-language LLVM CFI support creating a new +encoding for cross-language CFI (2). # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -143,57 +258,168 @@ TBD. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -To be able to use the defined type metadata identifiers defined using option -(1), the Rust compiler must be changed to: +The Rust compiler also assumes that C char and integer types and their +respective Rust aliased types can be used interchangeably. These assumptions +can not be maintained across the FFI boundary (i.e., for extern function types +with the "C" calling convention passed as callbacks across the FFI boundary) +when forward-edge control flow protection is enabled. + +To be able to use Itanium C++ ABI mangling for encoding (1) and provide +comprehensive protection across the FFI boundary when linking foreign +Rust-compiled code into a program written in C or C++, the Rust compiler must +be changed to * be able to identify C char and integer type uses at the time types are - encoded. + encoded. * not assume that C char and integer types and their respective Rust aliased - types can be used interchangeably when forward-edge control flow protection - is enabled, at least not at the FFI boundary. - -This may be done by either: - - 1. creating a new set of transitional C types in `core::ffi` as user-defined - types using `repr(transparent)` to be used at the FFI boundary (i.e., for - extern function types with the "C" calling convention) when cross-language - CFI support is needed (and taking the opportunity to consolidate all C - types in `core::ffi`). - - 2. changing the currently existing C types in `std::os::raw` to user-defined + types can be used interchangeably across the FFI boundary when forward-edge + control flow protection is enabled. + +This may be done by either + + 1. creating a new set of C types in `core::ffi::cfi` as user-defined types + using `repr(transparent)` to be used across the FFI boundary (i.e., for + extern function types with the "C" calling convention passed as callbacks + across the FFI boundary) when cross-language CFI support is needed, and + keep the existing C-like type aliases. + + 2. adding a new set of parameter attributes to specify the corresponding C + types to be used for encoding across the FFI boundary (i.e., for extern + function types with the "C" calling convention passed as callbacks across + the FFI boundary) when cross-language CFI support is needed. + + 3. creating a new set of transitional C types in `core::ffi` as user-defined + types using `repr(transparent)` to be used across the FFI boundary (i.e., + for extern function types with the "C" calling convention passed as + callbacks across the FFI boundary) when cross-language CFI support is + needed (and taking the opportunity to consolidate all C types in + `core::ffi`). + + 4. waiting for the work in progress in rust-lang/rust#97974 for + rust-lang/compiler-team#504 and use type alias information for + disambiguation and to specify the corresponding C types to be used for + encoding across the FFI boundary (i.e., for extern function types with the + "C" calling convention passed as callbacks across the FFI boundary) when + cross-language CFI support is needed. + + 5. changing the currently existing C types in `std::os::raw` to user-defined types using `repr(transparent)`. - 3. changing C types to `ty::Foreign` and changing `ty::Foreign` to be able to + 6. changing C types to `ty::Foreign` and changing `ty::Foreign` to be able to represent them. - 4. creating a new `ty::C` for representing C types. + 7. creating a new `ty::C` for representing C types. -Option (1) is opt in for when cross-language CFI support is needed, and -requires the user to use the new set of transitional C types for extern -function types with the "C" calling convention. +Options (1), (2), (3), (4) are opt in for when cross-language CFI support is +needed. These are not backward-compatibility breaking changes because the Rust +compiler currently does not support cross-language CFI (i.e., calls to extern +function types with the "C" calling convention passed as callbacks across the +FFI boundary). -Option (2), (3), and (4) are backward-compatibility breaking changes and will -require changes to existing code that use C types. +Option (5), (6), and (7) are backward-compatibility breaking changes because +they will require changes to existing code that use C types. # Drawbacks [drawbacks]: #drawbacks -The Rust compiler assumes that C char and integer types and their respective -Rust aliased types can be used interchangeably. These assumptions can not be -maintained when forward-edge control flow protection is enabled, at least not -at the FFI boundary (i.e., for extern function types with the "C" calling -convention). +The Rust compiler also assumes that C char and integer types and their +respective Rust aliased types can be used interchangeably. These assumptions +can not be maintained across the FFI boundary (i.e., for extern function types +with the "C" calling convention passed as callbacks across the FFI boundary) +when forward-edge control flow protection is enabled. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -## Why not use the v0 mangling scheme? +## Why not use the v0 mangling scheme for encoding? Unfortunately, the v0 mandling scheme can not be used as an encoding for cross-language CFI support due to the lack of support by other compilers, mainly Clang. +## Why not just creating a new encoding for cross-language CFI? + +(See [Defined type metadata identifiers [creating a new encoding for +cross-language CFI]](defined-type-metadata-new).) + +## Why not just use hardware-assisted forward-edge control flow protection? + +Newer processors provide hardware assistance for forward-edge control flow +protection, such as ARM Branch Target Identification (BTI), ARM Pointer +Authentication, and Intel Indirect Branch Tracking (IBT) as part of Intel +Control-flow Enforcement Technology (CET). However, ARM BTI and Intel IBT +-based implementations are less comprehensive than software-based +implementations such as [LLVM ControlFlowIntegrity +(CFI)](https://clang.llvm.org/docs/ControlFlowIntegrity.html), and the +commercially available [grsecurity/PaX Reuse Attack Protector +(RAP)](https://grsecurity.net/rap_faq). + +## What do you mean by less comprehensive protection? + +The less comprehensive the protection, the higher the likelihood it can be +bypassed. For example, Microsoft Windows Control Flow Guard (CFG) only tests +that the destination of an indirect branch is a valid function entry point, +which is the equivalent of grouping all function pointers in a single group, +and testing all destinations of indirect branches to be in this group. This is +also known as "coarse-grained CFI". + +(This is even less comprehensive than the initial support for LLVM CFI added to +the Rust compiler as part of this project, which aggregated function pointers +in groups identified by their number of parameters [i.e., +rust-lang/rust#89652], and provides protection only for the first example +listed in the partial results of this project in the design document in the +tracking issue [#89653][7][[7]]) + +It means that in an exploitation attempt, an attacker can change/hijack control +flow to any function, and the larger the program is, the higher the likelihood +an attacker can find a function they can benefit from (e.g., a small +command-line program vs a browser). + +This is unfortunately the implementation hardware assistance (e.g., ARM BTI and +Intel IBT) were initially modeled based on for forward-edge control flow +protection, and as such they provide equivalent protection with the addition of +specialized instructions. Microsoft Windows eXtended Flow Guard (XFG), ARM +Pointer Authentication -based forward-edge control flow protection, and Intel +Fine Indirect Branch Tracking (FineIBT) aim to solve this by combining hardware +assistance with software-based function pointer type testing similarly to LLVM +CFI. This is also known as "fine-grained CFI". + +(This is equivalent to the current support for LLVM CFI added to the Rust +compiler as part of this project, which aggregates function pointers in groups +identified by their return and parameter types [i.e., rust-lang/rust#95548]. +(See the partial results of this project in the design document in the tracking +issue [#89653][7][[7]].) + +## Why not just a generalized encoding across the FFI boundary? + +This results in less comprehensive protection, may result in using a +generalized encoding for all C and C++ -compiled code instead of only across +the FFI boundary depending whether Clang can be changed to use the generalized +encoding only across the FFI boundary (which may also require new Clang +extensions and changes to C and C++ code and libraries), and will degrade the +security of the program when linking foreign Rust-compiled code into a program +written in C or C++ because the program previously used a more comprehensive +encoding for all its compiled code. + +Finally, it does not completely solve the issue this RFC aims to solve, which +is that entirely safe Rust-compiled code, when sharing the same virtual address +space with C or C++ -compiled code with forward-edge control flow protection, +may degrade the security of the program because the indirect branches in +Rust-compiled code are not validated, allowing forward-edge control flow +protection to be trivially bypassed. + +## Are the changes proposed in this RFC backward-compatibility breaking changes? + +Options (1), (2), (3), (4) are opt in for when cross-language CFI support is +needed. These are not backward-compatibility breaking changes because the Rust +compiler currently does not support cross-language CFI (i.e., calls to extern +function types with the "C" calling convention passed as callbacks across the +FFI boundary). + +Option (5), (6), and (7) are backward-compatibility breaking changes because +they will require changes to existing code that use C types. + # Prior art [prior-art]: #prior-art @@ -208,15 +434,25 @@ See [Reference-level explanation][reference-level-explanation]. # Future possibilities [future-possibilities]: #future-possibilities -The defined type metadata identifiers using Itanium C++ ABI not only allows -cross-language CFI support, but also provides a more comprehensive protection -than a new encoding for cross-language CFI, while also allowing further -improvements for both CFI and cross-language CFI support (e.g., increasing -granularity by adding information, etc.). - -[1]: "R. de C Valle. “Tracking Issue for LLVM Control Flow Integrity (CFI) Support for Rust #89653.” GitHub." -[2]: "\"Type Metadata.\" LLVM Documentation." -[3]: "\"Itanium C++ ABI\"." -[4]: "\"Virtual Tables and RTTI\". Itanium C++ ABI." -[5]: "\"Type Encodings\". Itanium C++ ABI." -[6]: "\"The ty module: representing types\". Guide to Rustc Development." +Using Itanium C++ ABI mangling for encoding (1) provides cross-language LLVM +CFI support with C and C++ -compiled code as is, provides more comprehensive +protection, does not require changes to Clang, and does not require any new +Clang extensions and changes to C and C++ code and libraries. + +It allows further improvements for both CFI and cross-language CFI support +(e.g., increasing granularity by adding information, etc.), and also provides +the foundation for future implementations of cross-language hardware-assisted +and software-based -combined forward-edge control flow protection, such as ARM +Pointer Authentication -based forward-edge control flow protection. + +[1]: https://stanford-cs242.github.io/f17/assets/projects/2017/songyang.pdf "Y. Song. \"On Control Flow Hijacks of unsafe Rust.\" GitHub." +[2]: https://www.cs.ucy.ac.cy/~elathan/papers/tops20.pdf "M. Papaevripides and E. Athanasopoulos. \"Exploiting Mixed Binaries.\" Elias Athanasopoulos Publications." +[3]: https://github.com/rust-lang/rust/files/4723836/Control.Flow.Guard.for.Rust.pdf "A. Paverd. \"Control Flow Guard for Rust.\" GitHub." +[4]: https://github.com/rust-lang/rust/files/4723840/Control.Flow.Guard.for.LLVM.pdf "A. Paverd. \"Control Flow Guard for LLVM.\" GitHub." +[5]: https://opensrcsec.com/open_source_security_announces_rust_gcc_funding "B. Spengler. \"Open Source Security, Inc. Announces Funding of GCC Front-End for Rust.\" Open Source Security." +[6]: https://www.ndss-symposium.org/wp-content/uploads/2022-78-paper.pdf "S. Mergendahl, N. Burow, H. Okhravi. \"Cross-Language Attacks.\" NDSS Symposium 2022." +[7]: "R. de C Valle. \"Tracking Issue for LLVM Control Flow Integrity (CFI) Support for Rust #89653.\" GitHub." +[8]: "\"Type Metadata.\" LLVM Documentation." +[9]: "\"Itanium C++ ABI\"." +[10]: "\"Virtual Tables and RTTI\". Itanium C++ ABI." +[11]: "\"The ty module: representing types\". Guide to Rustc Development." From faa08c7251a826d5ca849a74125c59c71cc1d984 Mon Sep 17 00:00:00 2001 From: Ramon de C Valle Date: Wed, 26 Oct 2022 20:03:05 -0700 Subject: [PATCH 3/5] Update text/0000-improve-c-types-for-cross-language-cfi.md --- ...-improve-c-types-for-cross-language-cfi.md | 806 ++++++++++-------- 1 file changed, 433 insertions(+), 373 deletions(-) diff --git a/text/0000-improve-c-types-for-cross-language-cfi.md b/text/0000-improve-c-types-for-cross-language-cfi.md index 8b872b4a352..829496032f5 100644 --- a/text/0000-improve-c-types-for-cross-language-cfi.md +++ b/text/0000-improve-c-types-for-cross-language-cfi.md @@ -6,302 +6,348 @@ # Summary [summary]: #summary -Improve C types to be able to identify C char and integer type uses at the time -types are encoded for cross-language LLVM CFI support. +Improve C types for cross-language LLVM CFI support. # Motivation [motivation]: #motivation -As the industry continues to explore Rust adoption, the absence of support for -forward-edge control flow protection in the Rust compiler is a major security -concern when migrating to Rust by gradually replacing C or C++ with Rust, and C -or C++ and Rust -compiled code share the same virtual address space. +This RFC is part of the LLVM Control Flow Integrity (CFI) Support for Rust, and +is a requirement for cross-language LLVM CFI support. -A safe language -compiled code such as Rust, when sharing the same virtual -address space with an unsafe language -compiled code such as C or C++, may -degrade the security of a program because of different assumptions about -language properties and availability of security features such as exploit -mitigations. +For cross-language LLVM CFI support, the Rust compiler must be able to identify +and correctly encode C types in extern "C" function types indirectly called +(i.e., function pointers) across the FFI boundary when cross-language CFI +support is needed. -The issue this RFC aims to solve is an example of this, where entirely safe -Rust-compiled code, when sharing the same virtual address space with C or C++ --compiled code with forward-edge control flow protection, may degrade the -security of the program because the indirect branches in Rust-compiled code are -not validated, allowing forward-edge control flow protection to be trivially -bypassed. +For convenience, Rust provides some C-like type aliases for use when +interoperating with foreign code written in C, and these C type aliases may be +used for identification. However, at the time types are encoded, all type +aliases are already resolved to their respective Rust aliased types, making it +currently not possible to identify C type aliases use from their resolved types. -This has been extensively discussed[[1]][[2]][[3]][[4]][[5]], and just recently -formalized[[6]] as a new class of attack (i.e., cross-language attacks). It was -also one of the major reasons that initiatives such as Rust GCC--which this -author also fully support--were funded[[5]]. +For example, the Rust compiler currently is not able to identify that an -Therefore, support for forward-edge control flow protection needs to be added -to the Rust compiler and is a requirement for large-scale secure Rust adoption. -For more information about this project, see the design document in the -tracking issue [#89653][7][[7]]. +```rust +extern "C" { + fn func(arg: c_long); +} +``` -## Type metadata -[type-metadata]: #type-metadata +used the `c_long` type alias and is not able to disambiguate between it and an +`extern "C" fn func(arg: c_longlong)` in an LP64 or equivalent data model at the +time types are encoded. -LLVM uses type metadata to allow IR modules to aggregate pointers by their -types.[[8]] This type metadata is used by LLVM Control Flow Integrity to test -whether a given pointer is associated with a type identifier (i.e., test type -membership). +This motivates creating a new set of C types that their use can be identified at +the time types are encoded to be used in extern "C" function types indirectly +called across the FFI boundary when cross-language CFI support is needed. -Clang uses the Itanium C++ ABI's[[9]] virtual tables and RTTI typeinfo -structure name[[10]] as type metadata identifiers for function pointers. The -typeinfo name encoding is a two-character code (i.e., “TS”) prefixed to the -type encoding for the function. - -For cross-language LLVM CFI support, a compatible encoding must be used by -either - - 1. using Itanium C++ ABI mangling for encoding (which is currently used by - Clang). - - 2. creating a new encoding for cross-language CFI and using it for Clang and - the Rust compiler (and possibly other compilers). - -And - - * provide comprehensive protection for Rust-compiled only code if used as main - encoding (and not require an alternative Rust-specific encoding for - Rust-compiled only code). - - * provide comprehensive protection for C and C++ -compiled code when linking - foreign Rust-compiled code into a program written in C or C++. - - * provide comprehensive protection across the FFI boundary when linking - foreign Rust-compiled code into a program written in C or C++. - -### Providing comprehensive protection for Rust-compiled only code if used as main encoding -[protection-rust-compiled]: #protection-rust-compiled - -This item is satisfied by the encoding being able to comprehensively encode -Rust types. Both using Itanium C++ ABI mangling for encoding (1) and creating a -new encoding for cross-language CFI (2) may satisfy this item by providing -support for (language or vendor) extended types, by defining a comprehensive -encoding for Rust types using (language or vendor) extended types, and -implementing it in the Rust compiler. - -### Providing comprehensive protection for C and C++ -compiled code when linking foreign Rust-compiled code into a program written in C or C++ -[protection-c-cpp-compiled]: #protection-c-cpp-compiled - -This item is satisfied by the encoding being able to comprehensively encode C -and C++ types, and Clang being able to continue to use a comprehensive encoding -for C and C++ -compiled code when linking foreign Rust-compiled code into a -program written in C or C++. - -Both using Itanium C++ ABI mangling for encoding (1) and creating a new -encoding for cross-language CFI (2) may satisfy this item by providing support -for (language or vendor) extended types. However, a new encoding for -cross-language CFI (2) also requires defining a comprehensive encoding for C -and C++ types using (language or vendor) extended types, and implementing it in -Clang, so it is able to continue to use a comprehensive encoding for C and C++ --compiled code when linking foreign Rust-compiled code into a program written -in C or C++. This introduces as much complexity and work as redefining Itanium -C++ ABI mangling and reimplementing it in Clang. - -Additionally, a new encoding for cross-language CFI (2), depending on its -requirements, may use a generalized encoding across the FFI boundary. This may -result in using a generalized encoding for all C and C++ -compiled code instead -of only across the FFI boundary, and may also require changes to Clang to use -the generalized encoding only across the FFI boundary (which may also require -new Clang extensions and changes to C and C++ code and libraries). - -Either using a generalized encoding for all C and C++ -compiled code or across -the FFI boundary do not satisfy this or the following item, and will degrade -the security of the program when linking foreign Rust-compiled code into a -program written in C or C++ because the program previously used a more -comprehensive encoding for all its compiled code. - -### Providing comprehensive protection across the FFI boundary when linking foreign Rust-compiled code into a program written in C or C++ -[protection-across-ffi-boundary]: #protection-across-ffi-boundary - -This item is satisfied by being able to encode uses of Rust or C types across -the FFI boundary by either - - * changing the Rust compiler to be able to identify and encode uses of C types - across the FFI boundary. - * changing Clang to be able to identify and encode uses of Rust types across - the FFI boundary. - -Both using Itanium C++ ABI mangling for encoding (1) and creating a new -encoding for cross-language CFI (2) require changing either the Rust compiler -or Clang to satisfy this item. - -It may also require changes to Rust or C and C++ code and libraries. Improving -C types for the Rust compiler to be able to identify C char and integer type -uses at the time types are encoded for cross-language LLVM CFI support is what -this RFC proposes. - -However, as described in the previous item, a new encoding for cross-language -CFI (2), depending on its requirements, may use a generalized encoding across -the FFI boundary, and while using a generalized encoding across the FFI -boundary does not require changing the Rust compiler or Clang to be able to -identify and encode uses of Rust or C types across the FFI boundary, it does -not satisfy this item either, and will degrade the security of the program when -linking foreign Rust-compiled code into a program written in C or C++ because -the program previously used a more comprehensive encoding for all its compiled -code. - -### Using Itanium C++ ABI mangling for encoding (1) versus creating a new encoding for cross-language CFI (2) -[itanium-vs-new-for-encoding]: #itanium-vs-new-for-encoding - -Using Itanium C++ ABI mangling for encoding (1) provides cross-language LLVM -CFI support with C and C++ -compiled code as is, provides more comprehensive -protection by satisfying all previous items, does not require changes to Clang, -and does not require any new Clang extensions and changes to C and C++ code and -libraries. - -While using Itanium C++ ABI mangling for encoding (1) requires the defining a -comprehensive encoding for Rust types using (language or vendor) extended types -and implementing it in the Rust compiler, creating a new encoding for -cross-language CFI (2) requires defining comprehensive encodings for both Rust -and C and C++ types using (language or vendor) extended types, and implementing -them in both the Rust compiler and Clang respectively. This introduces as much -complexity and work as redefining Itanium C++ ABI mangling and reimplementing -it in Clang. - -Additionally, a new encoding for cross-language CFI (2), depending on its -requirements, may provide less comprehensive protection by either using a -generalized encoding for all C and C++ -compiled code or across the FFI -boundary, not satisfying all previous items, requires changes to Clang, and may -require new Clang extensions and changes to C and C++ code and libraries. - -(See [Defined type metadata identifiers [creating a new encoding for -cross-language CFI]](defined-type-metadata-new).) - -## Defined type metadata identifiers (using Itanium C++ ABI mangling for encoding) -[defined-type-metadata-itanium]: #defined-type-metadata-itanium - -Table II in the design document in the tracking issue [#89653][7][[7]] defines -type metadata identifiers for cross-language LLVM CFI support using Itanium C++ -ABI mangling for encoding (1). - -### Rust vs C char and integer types - -Rust defines char as a Unicode scalar value, which is different from C’s char. -On most modern systems, C’s char is either an 8-bit signed or unsigned integer. -The Itanium C++ ABI specifies a distinct encoding for it (i.e., ‘c’). - -Rust also uses explicitly-sized integer types (i.e., `i8`, `i16`, `i32`, ...) -while C uses abstract integer types (i.e., `char`, `short`, `long`, ...), which -actual sizes are implementation defined and may vary across different systems. -The Itanium C++ ABI specifies encodings for the C integer types (i.e., `char`, -`short`, `long`, ...), not their defined representations/sizes (i.e., 8-bit -unsigned integer, 16-bit unsigned integer, 32-bit unsigned integer, ...). - -For convenience, some C-like type aliases are provided by libcore and libstd -(and also by the libc crate) for use when interoperating with foreign code -written in C. For instance, one of these type aliases is `c_char`, which is a -type alias to Rust’s `i8`. - -To be able to encode these correctly, the Rust compiler must be able to -identify C char and integer type uses at the time types are encoded, and the C -type aliases may be used for disambiguation. However, at the time types are -encoded, all type aliases are already resolved to their respective `ty::Ty` -type representations[[11]] (i.e., their respective Rust aliased types), making -it currently not possible to identify C char and integer type uses from their -resolved types. - -The Rust compiler also assumes that C char and integer types and their -respective Rust aliased types can be used interchangeably. These assumptions -can not be maintained across the FFI boundary (i.e., for extern function types -with the "C" calling convention passed as callbacks across the FFI boundary) -when forward-edge control flow protection is enabled. - -To be able to use Itanium C++ ABI mangling for encoding (1) and provide -comprehensive protection across the FFI boundary when linking foreign -Rust-compiled code into a program written in C or C++, the Rust compiler must -be changed to - - * be able to identify C char and integer type uses at the time types are - encoded. - - * not assume that C char and integer types and their respective Rust aliased - types can be used interchangeably across the FFI boundary when forward-edge - control flow protection is enabled. - -## Defined type metadata identifiers (creating a new encoding for cross-language CFI) -[defined-type-metadata-new]: #defined-type-metadata-new - -Creating a new encoding for cross-language CFI (2) was also explored with the -Clang CFI team. This new encoding needed to be language agnostic and ideally -compatible with any other language. It also needed to support extended types in -case it was used as the main encoding to provide forward-edge control flow -protection. - -However, to satisfy these requirements, this new encoding neither distinguishes -between certain types (e.g., bool, char, integers, and enums) nor discriminates -between pointed element types (the latter mainly because of C’s void * abuse). - -This results in less comprehensive protection by either using a generalized -encoding for all C and C++ -compiled code or across the FFI boundary, and will -degrade the security of the program when linking foreign Rust-compiled code -into a program written in C or C++ because the program previously used a more -comprehensive encoding for all its compiled code. - -This encoding will be provided as an alternative option for interoperating with -foreign code written in languages other than C and C++ or that can not use -Itanium C++ ABI mangling for encoding. - -Table III in the design document in the tracking issue [#89653][7][[7]] defines -type metadata identifiers for cross-language LLVM CFI support creating a new -encoding for cross-language CFI (2). +For more information about and the motivation for the project, see the design +document in the tracking issue [#89653][1][[1]] and the [Appendix][appendix]. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -TBD. +This RFC proposes creating a new set of C types in `core::ffi::cfi` as +user-defined types using `repr(transparent)` to be used in extern "C" function +types indirectly called across the FFI boundary when cross-language CFI support +is needed, and keeping the existing C-like type aliases. + +These are not backward-compatibility breaking changes because the Rust compiler +currently does not support cross-language CFI (i.e., extern "C" function types +indirectly called across the FFI boundary when CFI is enabled). + +This example explains the issue and the solution this RFC proposes: + +example/src/main.rs +```rust +// This RFC proposes changing +use std::ffi::c_long; +// To use std::ffi::cfi::c_long so both the Rust compiler and Clang use the same +// encoding and the all indirect calls in this example work when CFI is enabled. + +#[link(name = "foo")] +extern "C" { + fn hello_from_c(_: c_long); + fn indirect_call_from_c(f: unsafe extern "C" fn(c_long), arg: c_long); +} + +unsafe extern "C" fn hello_from_rust(_: c_long) { + println!("Hello, world!"); +} + +unsafe extern "C" fn hello_from_rust_again(_: c_long) { + println!("Hello from Rust again!\n"); +} + +fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) { + unsafe { f(arg) } +} + +fn main() { + // This works + indirect_call(hello_from_rust, 1); + // This works when using rustc LTO, but does not work when using (proper) + // LTO because the Rust compiler and Clang use different encodings for + // hello_from_c and the test at the indirect call site at indirect_call. + indirect_call(hello_from_c, 2); + // This does not work because the Rust compiler and Clang use different + // encodings for hello_from_rust_again and the test at the indirect call + // site at indirect_call_from_c. + unsafe { + indirect_call_from_c(hello_from_rust_again, 3); + } +} +``` + +example/src/foo.c +```c +#include +#include + +void +hello_from_c(long arg) +{ + printf("Hello from C!\n"); +} + +void +indirect_call_from_c(void (*fn)(long), long arg) +{ + fn(arg); +} +``` # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -The Rust compiler also assumes that C char and integer types and their -respective Rust aliased types can be used interchangeably. These assumptions -can not be maintained across the FFI boundary (i.e., for extern function types -with the "C" calling convention passed as callbacks across the FFI boundary) -when forward-edge control flow protection is enabled. +## Type metadata +[type-metadata]: #type-metadata -To be able to use Itanium C++ ABI mangling for encoding (1) and provide -comprehensive protection across the FFI boundary when linking foreign -Rust-compiled code into a program written in C or C++, the Rust compiler must -be changed to +LLVM uses type metadata to allow IR modules to aggregate pointers by their +types.[[2]] This type metadata is used by LLVM Control Flow Integrity to test +whether a given pointer is associated with a type identifier (i.e., test type +membership). - * be able to identify C char and integer type uses at the time types are - encoded. +Clang uses the Itanium C++ ABI's[[3]] virtual tables and RTTI typeinfo structure +name[[4]] as type metadata identifiers for function pointers. + +For cross-language LLVM CFI support, a compatible encoding must be used. The +compatible encoding chosen for cross-language LLVM CFI support is the Itanium +C++ ABI mangling with vendor extended type qualifiers and types for Rust types +that are not used across the FFI boundary. + +## Encoding C integer types + +Rust defines `char` as an Unicode scalar value, while C defines `char` as an +integer type. Rust also defines explicitly-sized integer types (i.e., `i8`, +`i16`, `i32`, ...) while C defines abstract integer types (i.e., `char`, +`short`, `long`, ...), which actual sizes are implementation defined and may +vary across different data models. This causes ambiguity if Rust integer types +are used in extern "C" function types that represent C functions because the +Itanium C++ ABI specifies encodings for C integer types (e.g., `char`, `short`, +`long`, ...), not their defined representations (e.g., 8-bit signed integer, +16-bit signed integer, 32-bit signed integer, ...). + +For example, the Rust compiler currently is not able to identify if an + +```rust +extern "C" { + fn func(arg: i64); +} +``` + +represents a `void func(long arg)` or `void func(long long arg)` in an LP64 or +equivalent data model. + +For cross-language LLVM CFI support, the Rust compiler must be able to identify +and correctly encode C types in extern "C" function types indirectly called +across the FFI boundary when forward-edge control flow protection is enabled. + +For convenience, Rust provides some C-like type aliases for use when +interoperating with foreign code written in C, and these C type aliases may be +used for disambiguation. However, at the time types are encoded, all type +aliases are already resolved to their respective `ty::Ty` type +representations[[5]] (i.e., their respective Rust aliased types) making it +currently not possible to identify C type aliases use from their resolved types. + +For example, the Rust compiler currently is also not able to identify that an + +```rust +extern "C" { + fn func(arg: c_long); +} +``` + +used the `c_long` type alias and is not able to disambiguate between it and an +`extern "C" fn func(arg: c_longlong)` in an LP64 or equivalent data model at the +time types are encoded. + +This RFC proposes creating a new set of C types in `core::ffi::cfi` as +user-defined types using `repr(transparent)` to be used in extern "C" function +types indirectly called across the FFI boundary when cross-language CFI support +is needed, and keeping the existing C-like type aliases. + +These are not backward-compatibility breaking changes because the Rust compiler +currently does not support cross-language CFI (i.e., extern "C" function types +indirectly called across the FFI boundary when CFI is enabled). + +This example explains the issue and the solution this RFC proposes in detail: + +example/src/main.rs +```rust +// This RFC proposes changing +use std::ffi::c_long; +// To use std::ffi::cfi::c_long so both the Rust compiler and Clang use the same +// encoding and the all indirect calls in this example work when CFI is enabled. + +#[link(name = "foo")] +extern "C" { + // This declaration would have the type id "_ZTSFvlE", but at the time types + // are encoded, all type aliases are already resolved to their respective + // Rust aliased types, so this is encoded as either "_ZTSFvu3i32E" or + // "_ZTSFvu3i64E", depending to what type c_long type alias is resolved to, + // which currently uses the u vendor extended type + // encoding for the Rust integer types--this is the issue this RFC + // describes. + fn hello_from_c(_: c_long); + + // This declaration would have the type id "_ZTSFvPFvlElE", but is encoded + // as either "_ZTSFvPFvu3i32ES_E" (compressed) or "_ZTSFvPFvu3i64ES_E" + // (compressed), similarly to the hello_from_c declaration above--this can + // be ignored for the purposes of this example. + fn indirect_call_from_c(f: unsafe extern "C" fn(c_long), arg: c_long); +} + +// This definition would have the type id "_ZTSFvlE", but is encoded as either +// "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c declaration +// above. +unsafe extern "C" fn hello_from_rust(_: c_long) { + println!("Hello, world!"); +} + +// This definition would have the type id "_ZTSFvlE", but is encoded as either +// "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c declaration +// above. +unsafe extern "C" fn hello_from_rust_again(_: c_long) { + println!("Hello from Rust again!\n"); +} + +// This definition would also have the type id "_ZTSFvPFvlElE", but is encoded +// as either "_ZTSFvPFvu3i32ES_E" (compressed) or "_ZTSFvPFvu3i64ES_E" +// (compressed), similarly to the hello_from_c declaration above--this can be +// ignored for the purposes of this example. +fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) { + // This indirect call site tests whether the destinatin pointer is a member + // of the group derived from the same type id of the f declaration, which + // would have the type id "_ZTSFvlE", but is encoded as either + // "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c + // declaration above. + // + // Notice that since the test is at the call site and generated by the Rust + // compiler, the type id used in the test is encoded by the Rust compiler. + unsafe { f(arg) } +} + +// This definition has the type id "_ZTSFvvE"--this can be ignored for the +// purposes of this example. +fn main() { + // This demonstrates an indirect call within Rust-only code using the same + // encoding for hello_from_rust and the test at the indirect call site at + // indirect_call (i.e., "_ZTSFvu3i32E" or "_ZTSFvu3i64E"). + indirect_call(hello_from_rust, 1); + + // This demonstrates an indirect call across the FFI boundary with the Rust + // compiler and Clang using different encodings for hello_from_c and the + // test at the indirect call site at indirect_call (i.e., "_ZTSFvu3i32E" or + // "_ZTSFvu3i64E" vs "_ZTSFvlE"). + // + // When using rustc LTO (i.e., make using_rustc_lto), this works because the + // declaration used is the Rust-declared hello_from_c, which has the type id + // encoded by the Rust compiler (i.e., "_ZTSFvu3i32E" or "_ZTSFvu3i64E"). + // + // When using (proper) LTO (i.e., make), this does not work because the + // declaration used is the C-defined hello_from_c, which has the type id + // encoded by Clang (i.e., "_ZTSFvlE"). + indirect_call(hello_from_c, 2); + + // This demonstrates an indirect call to a function passed as a callback + // across the FFI boundary with the Rust compiler and Clang using different + // encodings for the passed-callback declaration and the test at the + // indirect call site at indirect_call_from_c (i.e., "_ZTSFvu3i32E" or + // "_ZTSFvu3i64E" vs "_ZTSFvlE"). + // + // When Rust functions are passed as callbacks across the FFI boundary to be + // called back from C code, the tests are also at the call site but + // generated by Clang instead, so the type ids used in the tests are encoded + // by Clang, which will currently not match the type ids of declarations + // encoded by the Rust compiler (e.g., hello_from_rust_again). (The same + // happens the other way around for C funtions passed as callbacks across + // the FFI boundary to be called back from Rust code.) + unsafe { + indirect_call_from_c(hello_from_rust_again, 3); + } +} +``` + +example/src/foo.c +```c +#include +#include + +// This definition has the type id "_ZTSFvlE". +void +hello_from_c(long arg) +{ + printf("Hello from C!\n"); +} + +// This definition has the type id "_ZTSFvPFvlElE"--this can be ignored for the +// purposes of this example. +void +indirect_call_from_c(void (*fn)(long), long arg) +{ + // This call site tests whether the destinatin pointer is a member of the + // group derived from the same type id of the fn declaration, which has the + // type id "_ZTSFvlE". + // + // Notice that since the test is at the call site and generated by Clang, + // the type id used in the test is encoded by Clang. + fn(arg); +} +``` - * not assume that C char and integer types and their respective Rust aliased - types can be used interchangeably across the FFI boundary when forward-edge - control flow protection is enabled. +# Drawbacks +[drawbacks]: #drawbacks -This may be done by either +The Rust compiler assumes that C char and integer types and their respective +Rust aliased types can be used interchangeably. These assumptions can not be +maintained for extern "C" function types indirectly called across the FFI +boundary when CFI is enabled. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +The alternatives considered were: 1. creating a new set of C types in `core::ffi::cfi` as user-defined types - using `repr(transparent)` to be used across the FFI boundary (i.e., for - extern function types with the "C" calling convention passed as callbacks - across the FFI boundary) when cross-language CFI support is needed, and - keep the existing C-like type aliases. - - 2. adding a new set of parameter attributes to specify the corresponding C - types to be used for encoding across the FFI boundary (i.e., for extern - function types with the "C" calling convention passed as callbacks across - the FFI boundary) when cross-language CFI support is needed. - - 3. creating a new set of transitional C types in `core::ffi` as user-defined - types using `repr(transparent)` to be used across the FFI boundary (i.e., - for extern function types with the "C" calling convention passed as - callbacks across the FFI boundary) when cross-language CFI support is - needed (and taking the opportunity to consolidate all C types in - `core::ffi`). + using `repr(transparent)` to be used in extern "C" function types + indirectly called across the FFI boundary when cross-language CFI support + is needed, and keeping the existing C-like type aliases. - 4. waiting for the work in progress in rust-lang/rust#97974 for + 2. waiting for the work in progress in rust-lang/rust#97974 for rust-lang/compiler-team#504 and use type alias information for - disambiguation and to specify the corresponding C types to be used for - encoding across the FFI boundary (i.e., for extern function types with the - "C" calling convention passed as callbacks across the FFI boundary) when - cross-language CFI support is needed. + disambiguation and to specify the corresponding C types in extern "C" + function types when cross-language CFI support is needed. + + 3. adding a new set of parameter attributes to specify the corresponding C + types to be used in extern "C" function types indirectly called across the + FFI boundary when cross-language CFI support is needed + + 4. creating a new set of transitional C types in `core::ffi` as user-defined + types using `repr(transparent)` to be used in extern "C" function types + indirectly called across the FFI boundary when cross-language CFI support + is needed (and taking the opportunity to consolidate all C types in + `core::ffi`). 5. changing the currently existing C types in `std::os::raw` to user-defined types using `repr(transparent)`. @@ -311,46 +357,114 @@ This may be done by either 7. creating a new `ty::C` for representing C types. -Options (1), (2), (3), (4) are opt in for when cross-language CFI support is -needed. These are not backward-compatibility breaking changes because the Rust -compiler currently does not support cross-language CFI (i.e., calls to extern -function types with the "C" calling convention passed as callbacks across the -FFI boundary). +Alternatives (1), (2), and (3) are opt in for when cross-language CFI support is +needed. These alternatives are not backward-compatibility breaking changes +because the Rust compiler currently does not support cross-language CFI (i.e., +extern "C" function types indirectly called across the FFI boundary when +forward-edge control flow protection is enabled). -Option (5), (6), and (7) are backward-compatibility breaking changes because -they will require changes to existing code that use C types. +Alternatives (4), (5), (6), and (7) are backward-compatibility breaking changes +because they will require changes to existing code that use C types. -# Drawbacks -[drawbacks]: #drawbacks +The solution this RFC proposes (1) is opt in, is not a backward-compatibility +breaking change, and is one of the less intrusive change to the language among +the alternatives listed. -The Rust compiler also assumes that C char and integer types and their -respective Rust aliased types can be used interchangeably. These assumptions -can not be maintained across the FFI boundary (i.e., for extern function types -with the "C" calling convention passed as callbacks across the FFI boundary) -when forward-edge control flow protection is enabled. +# Prior art +[prior-art]: #prior-art -# Rationale and alternatives -[rationale-and-alternatives]: #rationale-and-alternatives +The author is currently not aware of any cross-language CFI implementation and +support by any other compiler and language. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +None. + +# Future possibilities +[future-possibilities]: #future-possibilities + +The project this RFC is part of and solving the issue this RFC describes +provides the foundation for cross-language CFI support for the Linux kernel +(i.e., cross-language kCFI support) and Intel Fine Indirect Branch Tracking +(FineIBT), which use the same encoding and also depend on solving the issue this +RFC describes. + +It also provides the foundation for future implementations of cross-language +hardware-assisted and software-based -combined forward-edge control flow +protection, such as Microsoft Windows eXtended Flow Guard (XFG) and ARM Pointer +Authentication -based forward-edge control flow protection, that also depend on +the Rust compiler being able to identify C char and integer type uses at the +time types are encoded. + +# Appendix +[appendix]: #appendix + +As the industry continues to explore Rust adoption, the absence of support for +forward-edge control flow protection in the Rust compiler is a major security +concern when migrating to Rust by gradually replacing C or C++ with Rust, and C +or C++ and Rust -compiled code share the same virtual address space. + +A safe language -compiled code such as Rust, when sharing the same virtual +address space with an unsafe language -compiled code such as C or C++, may +degrade the security of a program because of different assumptions about +language properties and availability of security features such as exploit +mitigations. + +The issue the project this RFC is part of aims to solve is an example of this, +where entirely safe Rust-compiled code, when sharing the same virtual address +space with C or C++ -compiled code with forward-edge control flow protection, +may degrade the security of the program because the indirect branches in +Rust-compiled code are not validated, allowing forward-edge control flow +protection to be trivially bypassed. + +This has been extensively discussed[[6]][[7]][[8]][[9]][[10]], and just recently +formalized[[11]] as a new class of attack (i.e., cross-language attacks). It was +also one of the major reasons that initiatives such as Rust GCC--which this +author also fully support--were funded[[10]]. Therefore, support for +forward-edge control flow protection needs to be added to the Rust compiler and +is a requirement for large-scale secure Rust adoption. + +# Frequently asked questions (FAQ) +[faq]: #faq + +## Are the changes proposed in this RFC backward-compatibility breaking changes? + +These are not backward-compatibility breaking changes because the Rust compiler +currently does not support cross-language CFI (i.e., extern "C" function types +indirectly called across the FFI boundary when forward-edge control flow +protection is enabled). ## Why not use the v0 mangling scheme for encoding? -Unfortunately, the v0 mandling scheme can not be used as an encoding for -cross-language CFI support due to the lack of support by other compilers, -mainly Clang. +The v0 mandling scheme can not be used because it is not a compatible encoding +for cross-language LLVM CFI support. + +## Why not create a new encoding for cross-language CFI? -## Why not just creating a new encoding for cross-language CFI? +See Using Itanium C++ ABI mangling for encoding (1) versus creating a new +encoding for cross-language CFI (2) in the design document in the tracking issue +[#89653][1][[1]]. -(See [Defined type metadata identifiers [creating a new encoding for -cross-language CFI]](defined-type-metadata-new).) +## Why not use a generalized encoding across the FFI boundary? -## Why not just use hardware-assisted forward-edge control flow protection? +This results in less comprehensive protection, may result in using a generalized +encoding for all C and C++ -compiled code instead of only across the FFI +boundary depending whether Clang can be changed to use the generalized encoding +only across the FFI boundary (which may also require new Clang extensions and +changes to C and C++ code and libraries), and will degrade the security of the +program when linking foreign Rust-compiled code into a program written in C or +C++ because the program previously used a more comprehensive encoding for all +its compiled code. + +## Why not use hardware-assisted forward-edge control flow protection? Newer processors provide hardware assistance for forward-edge control flow protection, such as ARM Branch Target Identification (BTI), ARM Pointer Authentication, and Intel Indirect Branch Tracking (IBT) as part of Intel -Control-flow Enforcement Technology (CET). However, ARM BTI and Intel IBT --based implementations are less comprehensive than software-based -implementations such as [LLVM ControlFlowIntegrity +Control-flow Enforcement Technology (CET). However, ARM BTI and Intel IBT -based +implementations are less comprehensive than software-based implementations such +as [LLVM ControlFlowIntegrity (CFI)](https://clang.llvm.org/docs/ControlFlowIntegrity.html), and the commercially available [grsecurity/PaX Reuse Attack Protector (RAP)](https://grsecurity.net/rap_faq). @@ -360,16 +474,16 @@ commercially available [grsecurity/PaX Reuse Attack Protector The less comprehensive the protection, the higher the likelihood it can be bypassed. For example, Microsoft Windows Control Flow Guard (CFG) only tests that the destination of an indirect branch is a valid function entry point, -which is the equivalent of grouping all function pointers in a single group, -and testing all destinations of indirect branches to be in this group. This is -also known as "coarse-grained CFI". +which is the equivalent of grouping all function pointers in a single group, and +testing all destinations of indirect branches to be in this group. This is also +known as "coarse-grained CFI". (This is even less comprehensive than the initial support for LLVM CFI added to -the Rust compiler as part of this project, which aggregated function pointers -in groups identified by their number of parameters [i.e., -rust-lang/rust#89652], and provides protection only for the first example -listed in the partial results of this project in the design document in the -tracking issue [#89653][7][[7]]) +the Rust compiler as part of the project this RFC is also part of, which +aggregated function pointers in groups identified by their number of parameters +[i.e., rust-lang/rust#89652], and provides protection only for the first example +listed in the partial results in the design document in the tracking issue +[#89653][1][[1]]) It means that in an exploitation attempt, an attacker can change/hijack control flow to any function, and the larger the program is, the higher the likelihood @@ -386,73 +500,19 @@ assistance with software-based function pointer type testing similarly to LLVM CFI. This is also known as "fine-grained CFI". (This is equivalent to the current support for LLVM CFI added to the Rust -compiler as part of this project, which aggregates function pointers in groups -identified by their return and parameter types [i.e., rust-lang/rust#95548]. -(See the partial results of this project in the design document in the tracking -issue [#89653][7][[7]].) - -## Why not just a generalized encoding across the FFI boundary? - -This results in less comprehensive protection, may result in using a -generalized encoding for all C and C++ -compiled code instead of only across -the FFI boundary depending whether Clang can be changed to use the generalized -encoding only across the FFI boundary (which may also require new Clang -extensions and changes to C and C++ code and libraries), and will degrade the -security of the program when linking foreign Rust-compiled code into a program -written in C or C++ because the program previously used a more comprehensive -encoding for all its compiled code. - -Finally, it does not completely solve the issue this RFC aims to solve, which -is that entirely safe Rust-compiled code, when sharing the same virtual address -space with C or C++ -compiled code with forward-edge control flow protection, -may degrade the security of the program because the indirect branches in -Rust-compiled code are not validated, allowing forward-edge control flow -protection to be trivially bypassed. - -## Are the changes proposed in this RFC backward-compatibility breaking changes? - -Options (1), (2), (3), (4) are opt in for when cross-language CFI support is -needed. These are not backward-compatibility breaking changes because the Rust -compiler currently does not support cross-language CFI (i.e., calls to extern -function types with the "C" calling convention passed as callbacks across the -FFI boundary). - -Option (5), (6), and (7) are backward-compatibility breaking changes because -they will require changes to existing code that use C types. - -# Prior art -[prior-art]: #prior-art - -The author is currently not aware of any cross-language CFI implementation and -support by any other compiler and language. - -# Unresolved questions -[unresolved-questions]: #unresolved-questions - -See [Reference-level explanation][reference-level-explanation]. - -# Future possibilities -[future-possibilities]: #future-possibilities - -Using Itanium C++ ABI mangling for encoding (1) provides cross-language LLVM -CFI support with C and C++ -compiled code as is, provides more comprehensive -protection, does not require changes to Clang, and does not require any new -Clang extensions and changes to C and C++ code and libraries. - -It allows further improvements for both CFI and cross-language CFI support -(e.g., increasing granularity by adding information, etc.), and also provides -the foundation for future implementations of cross-language hardware-assisted -and software-based -combined forward-edge control flow protection, such as ARM -Pointer Authentication -based forward-edge control flow protection. - -[1]: https://stanford-cs242.github.io/f17/assets/projects/2017/songyang.pdf "Y. Song. \"On Control Flow Hijacks of unsafe Rust.\" GitHub." -[2]: https://www.cs.ucy.ac.cy/~elathan/papers/tops20.pdf "M. Papaevripides and E. Athanasopoulos. \"Exploiting Mixed Binaries.\" Elias Athanasopoulos Publications." -[3]: https://github.com/rust-lang/rust/files/4723836/Control.Flow.Guard.for.Rust.pdf "A. Paverd. \"Control Flow Guard for Rust.\" GitHub." -[4]: https://github.com/rust-lang/rust/files/4723840/Control.Flow.Guard.for.LLVM.pdf "A. Paverd. \"Control Flow Guard for LLVM.\" GitHub." -[5]: https://opensrcsec.com/open_source_security_announces_rust_gcc_funding "B. Spengler. \"Open Source Security, Inc. Announces Funding of GCC Front-End for Rust.\" Open Source Security." -[6]: https://www.ndss-symposium.org/wp-content/uploads/2022-78-paper.pdf "S. Mergendahl, N. Burow, H. Okhravi. \"Cross-Language Attacks.\" NDSS Symposium 2022." -[7]: "R. de C Valle. \"Tracking Issue for LLVM Control Flow Integrity (CFI) Support for Rust #89653.\" GitHub." -[8]: "\"Type Metadata.\" LLVM Documentation." -[9]: "\"Itanium C++ ABI\"." -[10]: "\"Virtual Tables and RTTI\". Itanium C++ ABI." -[11]: "\"The ty module: representing types\". Guide to Rustc Development." +compiler as part of the project this RFC is also part of, which aggregates +function pointers in groups identified by their return and parameter types +[i.e., rust-lang/rust#95548]. See the partial results in the design document in +the tracking issue [#89653][1][[1]].) + +[1]: "R. de C Valle. \"Tracking Issue for LLVM Control Flow Integrity (CFI) Support for Rust #89653.\" GitHub." +[2]: "\"Type Metadata.\" LLVM Documentation." +[3]: "\"Itanium C++ ABI\"." +[4]: "\"Virtual Tables and RTTI\". Itanium C++ ABI." +[5]: "\"The ty module: representing types\". Guide to Rustc Development." +[6]: https://stanford-cs242.github.io/f17/assets/projects/2017/songyang.pdf "Y. Song. \"On Control Flow Hijacks of unsafe Rust.\" GitHub." +[7]: https://www.cs.ucy.ac.cy/~elathan/papers/tops20.pdf "M. Papaevripides and E. Athanasopoulos. \"Exploiting Mixed Binaries.\" Elias Athanasopoulos Publications." +[8]: https://github.com/rust-lang/rust/files/4723836/Control.Flow.Guard.for.Rust.pdf "A. Paverd. \"Control Flow Guard for Rust.\" GitHub." +[9]: https://github.com/rust-lang/rust/files/4723840/Control.Flow.Guard.for.LLVM.pdf "A. Paverd. \"Control Flow Guard for LLVM.\" GitHub." +[10]: https://opensrcsec.com/open_source_security_announces_rust_gcc_funding "B. Spengler. \"Open Source Security, Inc. Announces Funding of GCC Front-End for Rust.\" Open Source Security." +[11]: https://www.ndss-symposium.org/wp-content/uploads/2022-78-paper.pdf "S. Mergendahl, N. Burow, H. Okhravi. \"Cross-Language Attacks.\" NDSS Symposium 2022." From 33aec025c3201f32cc65b1fb38377ae94830448e Mon Sep 17 00:00:00 2001 From: Ramon de C Valle Date: Wed, 2 Nov 2022 15:49:43 -0700 Subject: [PATCH 4/5] Update text/0000-improve-c-types-for-cross-language-cfi.md --- ...-improve-c-types-for-cross-language-cfi.md | 286 ++++++++++++++++-- 1 file changed, 266 insertions(+), 20 deletions(-) diff --git a/text/0000-improve-c-types-for-cross-language-cfi.md b/text/0000-improve-c-types-for-cross-language-cfi.md index 829496032f5..6e3d877dd2a 100644 --- a/text/0000-improve-c-types-for-cross-language-cfi.md +++ b/text/0000-improve-c-types-for-cross-language-cfi.md @@ -52,18 +52,19 @@ user-defined types using `repr(transparent)` to be used in extern "C" function types indirectly called across the FFI boundary when cross-language CFI support is needed, and keeping the existing C-like type aliases. +The new set of C types will make indirect calls to extern "C" function types +across the FFI boundary work when CFI is enabled. These indirect calls will +continue to not work when CFI is enabled unless the new set of C types are used. + These are not backward-compatibility breaking changes because the Rust compiler currently does not support cross-language CFI (i.e., extern "C" function types indirectly called across the FFI boundary when CFI is enabled). -This example explains the issue and the solution this RFC proposes: +For example: example/src/main.rs ```rust -// This RFC proposes changing use std::ffi::c_long; -// To use std::ffi::cfi::c_long so both the Rust compiler and Clang use the same -// encoding and the all indirect calls in this example work when CFI is enabled. #[link(name = "foo")] extern "C" { @@ -117,6 +118,102 @@ indirect_call_from_c(void (*fn)(long), long arg) } ``` +Will need to be changed to: + +example/src/main.rs +```rust +use std::ffi::c_long; +use std::ffi::cfi; + +#[link(name = "foo")] +extern "C" { + fn hello_from_c(_: cfi::c_long); + fn indirect_call_from_c(f: unsafe extern "C" fn(cfi::c_long), arg: c_long); +} + +unsafe extern "C" fn hello_from_rust(_: cfi::c_long) { + println!("Hello, world!"); +} + +unsafe extern "C" fn hello_from_rust_again(_: cfi::c_long) { + println!("Hello from Rust again!\n"); +} + +fn indirect_call(f: unsafe extern "C" fn(cfi::c_long), arg: c_long) { + unsafe { f(cfi::c_long(arg)) } +} + +fn main() { + // This will continue to work + indirect_call(hello_from_rust, 1); + // This will work both when using rustc LTO and when using (proper) LTO + // because the Rust compiler and Clang will use the same encoding for + // hello_from_c and the test at the indirect call site at indirect_call. + indirect_call(hello_from_c, 2); + // This will work because the Rust compiler and Clang will use the same + // encoding for hello_from_rust_again and the test at the indirect call site + // at indirect_call_from_c. + unsafe { + indirect_call_from_c(hello_from_rust_again, 3); + } +} +``` + +example/src/foo.c +```c +#include +#include + +void +hello_from_c(long arg) +{ + printf("Hello from C!\n"); +} + +void +indirect_call_from_c(void (*fn)(long), long arg) +{ + fn(arg); +} +``` + +Direct calls to extern "C" function types across the FFI boundary, whether CFI +is enabled or disabled, will continue to work whether Rust integer types or C +type aliases are used. + +For example: + +example/src/main.rs +```rust +// Optionally, use std::ffi::c_long; + +#[link(name = "foo")] +extern "C" { + fn hello_from_c(_: i64); + // Or fn hello_from_c(_: c_long); +} + +fn main() { + unsafe { hello_from_c(1); } +} +``` + +example/src/foo.c +```c +#include +#include + +void +hello_from_c(long arg) +{ + printf("Hello from C!\n"); +} +``` + +Will continue to work when `fn hello_from_c(_: i64)` or `fn hello_from_c(_: +c_long)` represents a `void hello_from_c(long arg)` in an LP64 or equivalent +data model. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -161,7 +258,7 @@ equivalent data model. For cross-language LLVM CFI support, the Rust compiler must be able to identify and correctly encode C types in extern "C" function types indirectly called -across the FFI boundary when forward-edge control flow protection is enabled. +across the FFI boundary when CFI is enabled. For convenience, Rust provides some C-like type aliases for use when interoperating with foreign code written in C, and these C type aliases may be @@ -187,18 +284,19 @@ user-defined types using `repr(transparent)` to be used in extern "C" function types indirectly called across the FFI boundary when cross-language CFI support is needed, and keeping the existing C-like type aliases. +The new set of C types will make indirect calls to extern "C" function types +across the FFI boundary work when CFI is enabled. These indirect calls will +continue to not work when CFI is enabled unless the new set of C types are used. + These are not backward-compatibility breaking changes because the Rust compiler currently does not support cross-language CFI (i.e., extern "C" function types indirectly called across the FFI boundary when CFI is enabled). -This example explains the issue and the solution this RFC proposes in detail: +For example: example/src/main.rs ```rust -// This RFC proposes changing use std::ffi::c_long; -// To use std::ffi::cfi::c_long so both the Rust compiler and Clang use the same -// encoding and the all indirect calls in this example work when CFI is enabled. #[link(name = "foo")] extern "C" { @@ -213,7 +311,7 @@ extern "C" { // This declaration would have the type id "_ZTSFvPFvlElE", but is encoded // as either "_ZTSFvPFvu3i32ES_E" (compressed) or "_ZTSFvPFvu3i64ES_E" - // (compressed), similarly to the hello_from_c declaration above--this can + // (compressed), similarly to the hello_from_c declaration above--this may // be ignored for the purposes of this example. fn indirect_call_from_c(f: unsafe extern "C" fn(c_long), arg: c_long); } @@ -234,7 +332,7 @@ unsafe extern "C" fn hello_from_rust_again(_: c_long) { // This definition would also have the type id "_ZTSFvPFvlElE", but is encoded // as either "_ZTSFvPFvu3i32ES_E" (compressed) or "_ZTSFvPFvu3i64ES_E" -// (compressed), similarly to the hello_from_c declaration above--this can be +// (compressed), similarly to the hello_from_c declaration above--this may be // ignored for the purposes of this example. fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) { // This indirect call site tests whether the destinatin pointer is a member @@ -248,7 +346,7 @@ fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) { unsafe { f(arg) } } -// This definition has the type id "_ZTSFvvE"--this can be ignored for the +// This definition has the type id "_ZTSFvvE"--this may be ignored for the // purposes of this example. fn main() { // This demonstrates an indirect call within Rust-only code using the same @@ -301,7 +399,106 @@ hello_from_c(long arg) printf("Hello from C!\n"); } -// This definition has the type id "_ZTSFvPFvlElE"--this can be ignored for the +// This definition has the type id "_ZTSFvPFvlElE"--this may be ignored for the +// purposes of this example. +void +indirect_call_from_c(void (*fn)(long), long arg) +{ + // This call site tests whether the destinatin pointer is a member of the + // group derived from the same type id of the fn declaration, which has the + // type id "_ZTSFvlE". + // + // Notice that since the test is at the call site and generated by Clang, + // the type id used in the test is encoded by Clang. + fn(arg); +} +``` + +Will need to be changed to: + +example/src/main.rs +```rust +use std::ffi::c_long; +use std::ffi::cfi; + +// The new set of C types in `core::ffi::cfi` as user-defined types using +// `repr(transparent)` will be equivalent to (using c_long as an example): +// +// pub mod cfi { +// #[allow(non_camel_case_types)] +// #[repr(transparent)] +// pub struct c_long(pub std::ffi::c_long); +// } + +#[link(name = "foo")] +extern "C" { + // This declaration will have the type id "_ZTSFvlE". + fn hello_from_c(_: cfi::c_long); + + // This declaration will have either the type id "_ZTSFvPFvlEu3i32E" or + // "_ZTSFvPFvlEu3i64E"--this may be ignored for the purposes of this + // example. + fn indirect_call_from_c(f: unsafe extern "C" fn(cfi::c_long), arg: c_long); +} + +// This definition will have the type id "_ZTSFvlE". +unsafe extern "C" fn hello_from_rust(_: cfi::c_long) { + println!("Hello, world!"); +} + +// This definition will have the type id "_ZTSFvlE". +unsafe extern "C" fn hello_from_rust_again(_: cfi::c_long) { + println!("Hello from Rust again!\n"); +} + +// This definition will also have either the type id "_ZTSFvPFvlEu3i32E" or +// "_ZTSFvPFvlEu3i64E"--this may be ignored for the purposes of this example. +fn indirect_call(f: unsafe extern "C" fn(cfi::c_long), arg: c_long) { + // This indirect call site tests whether the destinatin pointer is a member + // of the group derived from the same type id of the f declaration, which + // will have the type id "_ZTSFvlE". + // + // Notice that since the test is at the call site and generated by the Rust + // compiler, the type id used in the test is encoded by the Rust compiler. + unsafe { f(cfi::c_long(arg)) } +} + +// This definition has the type id "_ZTSFvvE"--this may be ignored for the +// purposes of this example. +fn main() { + // This demonstrates an indirect call within Rust-only code using the same + // encoding for hello_from_rust and the test at the indirect call site at + // indirect_call (i.e., "_ZTSFvlE"). + indirect_call(hello_from_rust, 1); + + // This demonstrates an indirect call across the FFI boundary with the Rust + // compiler and Clang using the same encoding for hello_from_c and the test + // at the indirect call site at indirect_call (i.e., "_ZTSFvlE"). + indirect_call(hello_from_c, 2); + + // This demonstrates an indirect call to a function passed as a callback + // across the FFI boundary with the Rust compiler and Clang using the same + // encoding for the passed-callback declaration and the test at the indirect + // call site at indirect_call_from_c (i.e., "_ZTSFvlE"). + unsafe { + indirect_call_from_c(hello_from_rust_again, 3); + } +} +``` + +example/src/foo.c +```c +#include +#include + +// This definition has the type id "_ZTSFvlE". +void +hello_from_c(long arg) +{ + printf("Hello from C!\n"); +} + +// This definition has the type id "_ZTSFvPFvlElE"--this may be ignored for the // purposes of this example. void indirect_call_from_c(void (*fn)(long), long arg) @@ -316,13 +513,58 @@ indirect_call_from_c(void (*fn)(long), long arg) } ``` +Direct calls to extern "C" function types across the FFI boundary, whether CFI +is enabled or disabled, will continue to work whether Rust integer types or C +type aliases are used. + +For example: + +example/src/main.rs +```rust +// Optionally, use std::ffi::c_long; + +#[link(name = "foo")] +extern "C" { + // This declaration will have the type id "_ZTSFvu3i64E". + fn hello_from_c(_: i64); + // This declaration will have either the type id "_ZTSFvu3i32E" or + // "_ZTSFvu3i64E". + // Or fn hello_from_c(_: c_long); +} + +// This definition has the type id "_ZTSFvvE"--this may be ignored for the +// purposes of this example. +fn main() { + // This will continue to work because direct call sites do not test type + // membership. + unsafe { hello_from_c(1); } +} +``` + +example/src/foo.c +```c +#include +#include + +// This definition has the type id "_ZTSFvlE". +void +hello_from_c(long arg) +{ + printf("Hello from C!\n"); +} +``` + +Will continue to work when `fn hello_from_c(_: i64)` or `fn hello_from_c(_: +c_long)` represents a `void hello_from_c(long arg)` in an LP64 or equivalent +data model. + # Drawbacks [drawbacks]: #drawbacks The Rust compiler assumes that C char and integer types and their respective Rust aliased types can be used interchangeably. These assumptions can not be maintained for extern "C" function types indirectly called across the FFI -boundary when CFI is enabled. +boundary when CFI is enabled and the new set of C types are used. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -341,7 +583,7 @@ The alternatives considered were: 3. adding a new set of parameter attributes to specify the corresponding C types to be used in extern "C" function types indirectly called across the - FFI boundary when cross-language CFI support is needed + FFI boundary when cross-language CFI support is needed. 4. creating a new set of transitional C types in `core::ffi` as user-defined types using `repr(transparent)` to be used in extern "C" function types @@ -360,14 +602,14 @@ The alternatives considered were: Alternatives (1), (2), and (3) are opt in for when cross-language CFI support is needed. These alternatives are not backward-compatibility breaking changes because the Rust compiler currently does not support cross-language CFI (i.e., -extern "C" function types indirectly called across the FFI boundary when -forward-edge control flow protection is enabled). +extern "C" function types indirectly called across the FFI boundary when CFI is +enabled). Alternatives (4), (5), (6), and (7) are backward-compatibility breaking changes because they will require changes to existing code that use C types. The solution this RFC proposes (1) is opt in, is not a backward-compatibility -breaking change, and is one of the less intrusive change to the language among +breaking change, and is one of the less intrusive changes to the language among the alternatives listed. # Prior art @@ -397,6 +639,11 @@ Authentication -based forward-edge control flow protection, that also depend on the Rust compiler being able to identify C char and integer type uses at the time types are encoded. +# Acknowledgment + +Thanks to pnkfelix (Felix Klock) and the Rust community for all their help on +this RFC. + # Appendix [appendix]: #appendix @@ -432,8 +679,7 @@ is a requirement for large-scale secure Rust adoption. These are not backward-compatibility breaking changes because the Rust compiler currently does not support cross-language CFI (i.e., extern "C" function types -indirectly called across the FFI boundary when forward-edge control flow -protection is enabled). +indirectly called across the FFI boundary when CFI is enabled). ## Why not use the v0 mangling scheme for encoding? From 9df92decfdb2c3f34b48bd39b80c9e1b967a8ea4 Mon Sep 17 00:00:00 2001 From: Ramon de C Valle Date: Thu, 3 Nov 2022 19:15:11 -0700 Subject: [PATCH 5/5] Update text/0000-improve-c-types-for-cross-language-cfi.md --- text/0000-improve-c-types-for-cross-language-cfi.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/text/0000-improve-c-types-for-cross-language-cfi.md b/text/0000-improve-c-types-for-cross-language-cfi.md index 6e3d877dd2a..56144b7d825 100644 --- a/text/0000-improve-c-types-for-cross-language-cfi.md +++ b/text/0000-improve-c-types-for-cross-language-cfi.md @@ -185,12 +185,14 @@ For example: example/src/main.rs ```rust -// Optionally, use std::ffi::c_long; +// Optionally, use std::ffi::c_long. (Note this is the C type alias, not +// the new C type.) #[link(name = "foo")] extern "C" { fn hello_from_c(_: i64); - // Or fn hello_from_c(_: c_long); + // Or fn hello_from_c(_: c_long). (Note this is the C type alias, + // not the new C type.) } fn main() { @@ -521,7 +523,8 @@ For example: example/src/main.rs ```rust -// Optionally, use std::ffi::c_long; +// Optionally, use std::ffi::c_long. (Note this is the C type alias, not +// the new C type.) #[link(name = "foo")] extern "C" { @@ -529,7 +532,8 @@ extern "C" { fn hello_from_c(_: i64); // This declaration will have either the type id "_ZTSFvu3i32E" or // "_ZTSFvu3i64E". - // Or fn hello_from_c(_: c_long); + // Or fn hello_from_c(_: c_long). (Note this is the C type alias, + // not the new C type.) } // This definition has the type id "_ZTSFvvE"--this may be ignored for the