Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arithmetic conversions can result in LLVM undefined values #993

Closed
Praetonus opened this issue Jul 1, 2016 · 23 comments
Closed

Arithmetic conversions can result in LLVM undefined values #993

Praetonus opened this issue Jul 1, 2016 · 23 comments
Assignees

Comments

@Praetonus
Copy link
Member

Currently, integer to floating point and floating point to integer conversions compile down to LLVM uitofp, sitofp, fptoui and fptosi instructions. In case of overflow on the destination type, these instructions produce undefined values. I think this is not something we want in Pony and we need to address this concern, while impacting the existing API as little as possible.

In my opinion, the best solution would be to make the relevant functions partial and to error if over/underflow would occur. We'd also have to split the _ArithmeticConvertible trait in more specific versions so that functions are made partial only when necessary (e.g. F32.u16() should be partial but U8.u16() should not). Numeric unions (Number, Signed, etc) should still work correctly since a non-partial function is a subtype of a partial one and we could still have a supertype for all of this.

While the LLVM issue is with conversions to/from floating point, I feel like this is more general at the language level and should also apply to conversions between integers. First, for consistency with the above fix for floating points and second, because the only reason 1178 in an U16 space is mapped to 154 in an U8 space is because our hardware is truncating it as a binary number (so the behaviour here isn't depending on the language semantics).

As a side note, if we do make functions for integers partial, the old truncating behaviour will still be possible with modular arithmetic (which is optimised to truncation or bitwise operations by LLVM if corresponding to machine word sizes).

To sum up the general idea: for every arithmetic conversion function from type A to type B, if some values of A don't fit in B, make the A.b() function partial and check for overflow.

@jemc
Copy link
Member

jemc commented Jul 1, 2016

Definitely agreed that having the result of these operations be formally undefined is a big problem, however, given that the effect that many of us (myself included) have been depending on is truncation, I'm currently thinking it might make more sense to make the language semantics formally specify that the conversion performs a truncation if the source type is larger width than the target type.

Under your plan, how would it look if I had a U16 and wanted to truncate to a U8? You say that truncation can be done with %, but the result of a % operation on a U16 would still be a U16, so you'd still have to call u8() on it, which would still be partial - what would I put in the else that guards the partial?

@Praetonus
Copy link
Member Author

I don't really like the idea of the default behaviour being errors for floating point conversions and truncation in base 2 for integer conversions. I feel like conversion and truncation should be separated in different functions, maybe a fun truncate(digits: USize, base: USize) for integers.

After trying various things with LLVM and optimisations, it seems that % isn't the best candidate here. Using bit masking and and operations always compile down to a single trunc instruction and can even optimise out the overflow check (and the error handling code). For example:

primitive U16
  fun u8(): U8 ? =>
    compile_intrinsic // Check for overflow and use LLVM trunc instruction

actor Main
  new create(env: Env) =>
    let x: USize = env.args.size() and 0xFF
    let y: U8 = try
      x.u8()
    else
      0 // Unreachable
    end

In a program like this, y would directly be assigned to a truncation of env.args.size(). The masking with 0xFF and the conditional in U16.u8() would be removed. You still have to put the else for the try but implementing the truncate function above would remove the need to do the trick manually.

This integer conversion/truncation overhaul requires a RFC, I'll do that.

Do we agree on the fix for floating point conversions being partial functions and overflow check?

@jemc
Copy link
Member

jemc commented Jul 2, 2016

Right now, I don't really think any of the conversion functions should be partial - they should just all have defined semantics.

@jemc
Copy link
Member

jemc commented Jul 2, 2016

It would be good to hear @sylvanc's perspective on this, though.

@Praetonus
Copy link
Member Author

The big issue is that the only alternative to partial functions to fix the floating point issue would be to return an arbitrary value if the input can't be represented by the integer type. That includes trying to convert NaN, and I don't see any sane default value for this (and it's undefined behaviour in C).

@sylvanc
Copy link
Contributor

sylvanc commented Jul 5, 2016

Integer truncation and extension should remain as it is. Pony integers are not abstract numbers: they are real machine words with a bit size that perform modular arithmetic :) As a result, U8(256) is 0, and by the same token, U16(256).u8() should be zero. Truncation is simply a fast way to handle the required modular arithmetic.

Think of it in pure functional terms: generate a U8 with an imaginary successor function succ, calling it 256 times with an initial value of 0. The result is of course 0.

That leaves the core problem that @Praetonus found, which I do think is important: fptoui and fptosi. I'm not concerned about uitofp or sitofp, as they produce reasonable results, even when they are inexact. Since most floating point operation are inexact, I think that's fine. But fptoui and fptosi can produce undefined, and that's not fine :)

Using a partial function for float to int conversion is problematic, I think. While in one sense there is "no value in the codomain" (i.e. a partial function) for some float to int conversions, in another sense most values in the codomain are approximations anyway. Should F32(1.3).u32() be partial? No, it should return 1, and we're all comfortable with that.

So I lean towards saturation being the right way to go, from a "fully defined values" point of view. However, I'm concerned about performance. Float to int conversion is a real performance problem in games, video/audio codecs, etc, and C programmers spend a non-trivial amount of time optimising it to do what Pony currently does (i.e. use SSE registers and pipeline conversions, not worrying about overflow).

Interestingly, for any value that isn't a compile time constant, Pony won't generate an LLVM undef, and uses cvttsd2si on x64, which returns 0x80000000 on overflow:

http://www.felixcloutier.com/x86/CVTTSD2SI.html

However, this doesn't appear as a consistent result because the optimiser can do unexpected things in intermediate code.

In summary: saturation seems correct, but introducing runtime bounds checking could make it necessary to provide alternate functions that do not perform runtime bounds checking.

@sylvanc
Copy link
Contributor

sylvanc commented Sep 28, 2016

Discussed in a sync call. undef is dangerous to leave around, but runtime bounds checking is too expensive, so we'll special case compile-time constant fptoui and fptosi to do saturation, and leave the non-compile-time constant case as it is now.

@jemc
Copy link
Member

jemc commented Sep 28, 2016

Discussed on the sync call, @sylvanc proposed a plan in which we could treat compile time constants as a special case, and make sure that the value is not undef in LLVM, but we can't have the overflow semantics for floating point to integer be well-defined in a crossplatform way and still be fast.

So after setting up the special case logic, we can document that the resulting value is undefined (may vary on different platforms), but is guaranteed not to leak information from a security standpoint (as it does have the potential to do now).

We can also consider an RFC that adds a separate set of float-to-integer conversion methods that are slower, but have defined result semantics for the entire domain of input.

@Praetonus
Copy link
Member Author

Doing a special case for compile-time constants won't be possible because the optimiser can propagate and synthetise constants.

Also, it turns out there are some inconsistencies with unsafe maths.

  • Integer arithmetic doesn't use nuw/nsw flags and division by 0 is checked. Safe.
  • Bitwise shifting doesn't check for overflow, which is undefined behaviour in LLVM. Unsafe.
  • clz/ctz use safe LLVM intrinsics. Safe.
  • All floating point operations use fast math flags by default. Unsafe.
  • As discussed in this issue, floating point conversions are unsafe.

We're currently offering performance over potential undefined results in most cases. To me, that doesn't align with Pony's "correctness first" philosophy. I'm tempted to recategorise the issue as a bug and to do the following.

  • Add runtime checks on every UB-inducing math operation and remove fast math flags on floating points.
  • Add a set of unsafe functions for performance. That would cover
    • Integer operations with nsw/nuw, unchecked division
    • Unchecked bitwise shifting
    • ctz/clz with unsafe LLVM intrinsics
    • Floating point operations with fast math flags
    • Unchecked floating point conversions

That way, users would get correctness by default, but could still pull the "unsafe hatch" for performance critical operations.

@jemc @sylvanc Thoughts?

@jemc
Copy link
Member

jemc commented Oct 13, 2016

I'm tempted to recategorise the issue as a bug

Yes, I agree - recategorizing as a bug and marking with needs discussion during sync, though if we manage to iron out the details before the sync, we can remove the latter label.


I'm also inclined to agree with your suggestion of fixing all of the default methods for correctness, and adding "fast"/"unsafe" variants as needed for performance-critical arithmetic. I think this will be a good approach, provided that we have clear documentation for the semantics of each operation, including what happens in an overflow. We need to make it very clear what the user is "giving up" when choosing the "fast"/"unsafe" variants over the default ones, so that they can determine whether it is acceptable (or even relevant) for their use case.

As part of our discussion here, I think it would be useful to continue to hammer out and clarify these details (the precise semantics of overflow in the safe variants, the limits of safety in the unsafe variants, etc).


As an aside, I'm not wild about using unsafe_ as a name / prefix for these methods, as I think it implies a lack of memory safety or type safety. Using fast_ comes to mind as a possibility, though there may be other good options.

@SeanTAllen
Copy link
Member

@jemc i generally agree with you on the unsafe_.

@Praetonus
Copy link
Member Author

Here's what I'm thinking, based on LLVM semantics.

Safe variants

  • Integer arithmetic is modular with two's complement. Division by 0 yields 0. No change
  • Bitwise shifting
    • Left shift and logical right shift would yield 0 when shifting by more bits than the number's bit width
    • Same reasoning for the arithmetic right shift but filled with the sign bit
  • Count leading/trailing zero yields the expected result for every possible value. No change
  • Floating point operations would conform strictly to IEEE 754. This is the current behaviour when compiling with --ieee-math.
  • Floating point to integer conversions
    • Converting NaN would yield 0
    • On overflow/underflow, do saturation

Fast/unsafe variants

  • Integer arithmetic operations with nuw/nsw flags. If the operation overflows (respectively unsigned and signed overflow), the result is undefined. Division by 0 would be unchecked and the result would be undefined.
  • Bitwise shifting
    • Left shift with nuw/nsw. If the operation overflows, the result is undefined
    • Logical and arithmetic right shift with exact semantics. If non-zero bits are shifted out, the result is undefined
    • In addition, for every shifting type, shifting by more bits than the number's bit width yields an undefined result
  • Count leading/trailing zero. If the operand is 0, the result is undefined.
  • Floating point operations with fast math flags. The optimiser can assume infinity/NaN never occurs and can perform algebraically equivalent transformations that may not make sense for floating point numbers. Current behaviour
  • Floating point to integer conversions. On overflow, the result is undefined. Current behaviour.

@Praetonus
Copy link
Member Author

Also, I think we should incorporate the fast/unsafe variants into the language trust boundary and only allow them in safe packages.

@jemc
Copy link
Member

jemc commented Oct 13, 2016

I think we should incorporate the fast/unsafe variants into the language trust boundary and only allow them in safe packages.

I'm not sure how I feel about that - can you explain your reasoning more? Are they unsafe in any sense other than possibly giving back a wrong answer to a calculation? Is it because the undefined values could be used on some platforms in an attack to return uninitialized memory that could contain sensitive data?

Also, another question to ask is - if I have a package that I want to allow to do "fast math", is it always the case that I also want to allow it to do FFI? Is it possibly better to think of it as a distinct and separate trust boundary?

@jemc
Copy link
Member

jemc commented Oct 13, 2016

Also, thank you for providing the thorough and explicit writeup on the safe vs unsafe semantics in your previous comment!

@Praetonus
Copy link
Member Author

I'm not sure how I feel about that - can you explain your reasoning more? Are they unsafe in any sense other than possibly giving back a wrong answer to a calculation? Is it because the undefined values could be used on some platforms in an attack to return uninitialized memory that could contain sensitive data?

My main concern is hardware trap values and exceptions. For example, the unsafe division by 0 would raise an hardware exception on both x86 and ARM, which is very likely to halt the program.

Also, another question to ask is - if I have a package that I want to allow to do "fast math", is it always the case that I also want to allow it to do FFI? Is it possibly better to think of it as a distinct and separate trust boundary?

That's a good point. Maybe that would be better.

@jemc
Copy link
Member

jemc commented Oct 13, 2016

My main concern is hardware trap values and exceptions. For example, the unsafe division by 0 would raise an hardware exception on both x86 and ARM, which is very likely to halt the program.

Fair enough - I think I'm on board with calling these operations "unsafe" now 🔥

@Praetonus
Copy link
Member Author

Praetonus commented Oct 15, 2016

On a somewhat related matter, should rotl and rotr be defined for signed numbers? These functions are implemented with bitwise shifting, which means arithmetic shifting for signed numbers, which means the result is wrong in many cases. Also, the functions don't guarantee associativity.

@DemiMarie
Copy link

Note that the rotate semantics you are proposing require a runtime check on x86 that could be costly. I don't know the behavior on ARM.

A potentially better-performing approach would be to define shift and rotation as masking off the excess high-order bits from the shift amount. That doesn't need any branches on any processor.

@Praetonus
Copy link
Member Author

Is that check done in the hardware instruction itself or in the potential boilerplate required around it? For reference, here is the implementation of rotl in Pony (rotr being the symmetrical).

fun rotl(y: A): A => (this << y) or (this >> (bitwidth() - y))

This implementation isn't currently safe so rotl will need a safe and an unsafe variant if we keep that base implementation.

@Praetonus
Copy link
Member Author

Praetonus commented Oct 16, 2016

Also, I've been thinking about defining real Pony semantics for the "undefined results" here.

First, we need to define the transformations permitted by the compiler on the program. I'm going to mostly draw from the C++ as-if rule here.

The Pony compiler is permitted to perform any changes to the program as long as the following remains true for every behaviour in the program:

  • At the end of a behaviour, the associated actor is exactly in the same state it would be if the program was executed as written.
  • If an object has a finaliser, that finaliser will eventually be called exactly once.
  • Messages are sent as if the behaviour was executed as written, in the same order and with the same contents.
  • FFI calls to well-defined C functions are executed as written, in the same order and with the same arguments. The runtime functions pony_ctx, pony_alloc, pony_alloc_small, pony_alloc_large, pony_alloc_final and pony_realloc are free from this rule and from the following one.
  • Message sends and FFI calls are not reordered with respect to each other.
  • Object and reference capability security are maintained.

Undefined results (placeholder name):

If an expression with undefined results is evaluated:

  • Every subsequent expression depending on undefined results also has undefined results.
  • If the constructor of an object with a finaliser has undefined results, that finaliser isn't guaranteed to be called.
  • If a message send or a FFI call has undefined result, it is free from the as-if rule.
  • The implementation is allowed to trap or to generate hardware exceptions (n.b. I'm not sure if this is adequate in a "formal specification". Maybe we should only refer to the "Pony abstract machine", but I don't know what the wording could be in that case.)
  • Capability security is always maintained.

Trust boundary:

  • If any behaviour in a Pony program does a FFI call to a C function which violates capability security, then the capability security guarantees are void for every behaviour in the Pony program.
  • If any behaviour in a Pony program does a FFI call to a C function with undefined behaviour (as defined by the ISO C standard), then every behaviour in the Pony program is free from the as-if rule, including the capability security guarantees.

If we keep that, it will need some refinement to work with distributed Pony.

Thoughts?

@Praetonus
Copy link
Member Author

We discussed this during the sync call. The above semantics are fine, we'll probably end up with a similar version somewhere in the documentation.

We also talked about adding new operators for the unsafe operations. I'll open a RFC with a few possibilities for the operators syntax.

@Praetonus
Copy link
Member Author

This was resolved in #1395.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants