Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid value checks does not check char surrogate values #3241

Open
Tracked by #3203
celinval opened this issue Jun 7, 2024 · 2 comments
Open
Tracked by #3203

Valid value checks does not check char surrogate values #3241

celinval opened this issue Jun 7, 2024 · 2 comments
Assignees
Labels
[C] Bug This is a bug. Something isn't working. [F] Soundness Kani failed to detect an issue Z-UnstableFeature Issues that only occur if a unstable feature is enabled

Comments

@celinval
Copy link
Contributor

celinval commented Jun 7, 2024

I tried this code:

#[kani::proof]
fn transmute_surrogate_ub() {
    unsafe {
        let val: u32 = kani::any();
        kani::assume(val < char::MAX.into());
        let c: char = std::mem::transmute::<u32, char>(val) as char;
        match val {
            0..0xD800 | 0xE000..0x110000 => assert!(char::from_u32(val).is_some()),
            _ => unreachable!(),
        }
    }
}

using the following command line invocation:

kani -Zvalid-value-checks char.rs

with Kani version: 0.52.0

I expected to see this happen: A valid value check failure

Instead, this happened: An unreachable block being reached due to UB

Failed Checks: internal error: entered unreachable code: 
 File: "char.rs", line 31, in transmute_surrogate_ub

VERIFICATION:- FAILED
Verification Time: 0.12348909s
@celinval celinval added the [C] Bug This is a bug. Something isn't working. label Jun 7, 2024
@celinval celinval added [F] Soundness Kani failed to detect an issue Z-UnstableFeature Issues that only occur if a unstable feature is enabled labels Jul 4, 2024
@adpaco-aws
Copy link
Contributor

I've been investigating this issue and here's what I've found out.

First, the ValidValueReq we process for char specifies a valid_range that doesn't take into account the surrogate gap in the middle: [ValidValueReq { offset: 0, size: MachineSize { num_bits: 32 }, valid_range: 0..=1114111 }]

Note: 1114111 is 0x10FFFF (i.e., char::MAX)

I tried to special-case the ty_req closure in ty_validity_per_offset to return two valid ranges when a char is found:

diff --git a/kani-compiler/src/kani_middle/transform/check_values.rs b/kani-compiler/src/kani_middle/transform/check_values.rs
index a7d0f14d270..253963a4ef6 100644
--- a/kani-compiler/src/kani_middle/transform/check_values.rs
+++ b/kani-compiler/src/kani_middle/transform/check_values.rs
@@ -839,8 +839,21 @@ pub fn ty_validity_per_offset(
     ty: Ty,
     current_offset: usize,
 ) -> Result<Vec<ValidValueReq>, String> {
+    debug!(typ=?ty, kind=?ty.kind(), "ty_validity_per_offset");
     let layout = ty.layout().unwrap().shape();
     let ty_req = || {
+        if ty.kind() == TyKind::RigidTy(RigidTy::Char) {
+            let shape = ty.layout().unwrap().shape();
+            let value_size = match shape.abi {
+                ValueAbi::Scalar(Scalar::Initialized { value, .. })
+                | ValueAbi::ScalarPair(Scalar::Initialized { value, .. }, _) => {
+                    Some(value.size(machine_info))
+                }
+                _ => None,
+            };
+            return vec![ValidValueReq { offset: 0, size: value_size.unwrap(), valid_range: WrappingRange { start: 0, end: 0xD7FF }},
+            ValidValueReq { offset: 0, size: value_size.unwrap(), valid_range: WrappingRange { start: 0xE000, end: 0x10FFFF }}]
+        };
         if let Some(mut req) = ValidValueReq::try_from_ty(machine_info, ty)

However, that didn't have the intended effect as each range check is being encoded separately. In other words, this change will generate two separate checks as follows:

assert!((0..=0xD7FF).contains(value));
assert!((0xE000..0x10FFFF).contains(value));

which doesn't work well because it's requiring value to be in BOTH ranges.

It's not clear to me that the check generation should be done this way for multiple ranges. For example, in this code

                SourceOp::BytesValidity { ranges, target_ty, rvalue } => {
                    let value = body.new_assignment(rvalue, &mut source, InsertPosition::Before);
                    let rvalue_ptr = Rvalue::AddressOf(Mutability::Not, Place::from(value));
                    for range in ranges {
                        let result = build_limits(body, &range, rvalue_ptr.clone(), &mut source);
                        let msg =
                            format!("Undefined Behavior: Invalid value of type `{target_ty}`",);
                        body.add_check(
                            tcx,
                            &self.check_type,
                            &mut source,
                            InsertPosition::Before,
                            result,
                            &msg,
                        );
                    }
                }

in which cases does it make sense to generate more than one check?

Another option is to extend ValidValueReq to be able to work with multiple ranges. This seems to be nontracked pending work according to the comments in the source code. Maybe replacing/wrapping it with another structure like RangeSet could save us work here.

@adpaco-aws
Copy link
Contributor

Unfortunately, the option of extending ValidValueReq is more involved than I initially expected for a couple reasons:

  1. It's based on WrappingRange which comes from the ABI API of StableMIR.
  2. WrappingRange doesn't consider gaps within a range like the ones that we'd need for char surrogate values.

It's not clear to me such changes would pay off if char is the only type with a gap.

Would it be OK to change the range loop above so that build_limits constructs a single check that ensures the value falls within one of the ranges? I've just realized this wouldn't be OK if the ranges have different offsets, but it doesn't make sense to generate multiple checks for a particular offset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[C] Bug This is a bug. Something isn't working. [F] Soundness Kani failed to detect an issue Z-UnstableFeature Issues that only occur if a unstable feature is enabled
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants