-
-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arithmetic conversions can result in LLVM undefined values #993
Comments
Definitely agreed that having the result of these operations be formally undefined is a big problem, however, given that the effect that many of us (myself included) have been depending on is truncation, I'm currently thinking it might make more sense to make the language semantics formally specify that the conversion performs a truncation if the source type is larger width than the target type. Under your plan, how would it look if I had a |
I don't really like the idea of the default behaviour being errors for floating point conversions and truncation in base 2 for integer conversions. I feel like conversion and truncation should be separated in different functions, maybe a After trying various things with LLVM and optimisations, it seems that primitive U16
fun u8(): U8 ? =>
compile_intrinsic // Check for overflow and use LLVM trunc instruction
actor Main
new create(env: Env) =>
let x: USize = env.args.size() and 0xFF
let y: U8 = try
x.u8()
else
0 // Unreachable
end In a program like this, This integer conversion/truncation overhaul requires a RFC, I'll do that. Do we agree on the fix for floating point conversions being partial functions and overflow check? |
Right now, I don't really think any of the conversion functions should be partial - they should just all have defined semantics. |
It would be good to hear @sylvanc's perspective on this, though. |
The big issue is that the only alternative to partial functions to fix the floating point issue would be to return an arbitrary value if the input can't be represented by the integer type. That includes trying to convert NaN, and I don't see any sane default value for this (and it's undefined behaviour in C). |
Integer truncation and extension should remain as it is. Pony integers are not abstract numbers: they are real machine words with a bit size that perform modular arithmetic :) As a result, Think of it in pure functional terms: generate a That leaves the core problem that @Praetonus found, which I do think is important: Using a partial function for float to int conversion is problematic, I think. While in one sense there is "no value in the codomain" (i.e. a partial function) for some float to int conversions, in another sense most values in the codomain are approximations anyway. Should So I lean towards saturation being the right way to go, from a "fully defined values" point of view. However, I'm concerned about performance. Float to int conversion is a real performance problem in games, video/audio codecs, etc, and C programmers spend a non-trivial amount of time optimising it to do what Pony currently does (i.e. use SSE registers and pipeline conversions, not worrying about overflow). Interestingly, for any value that isn't a compile time constant, Pony won't generate an LLVM http://www.felixcloutier.com/x86/CVTTSD2SI.html However, this doesn't appear as a consistent result because the optimiser can do unexpected things in intermediate code. In summary: saturation seems correct, but introducing runtime bounds checking could make it necessary to provide alternate functions that do not perform runtime bounds checking. |
Discussed in a sync call. |
Discussed on the sync call, @sylvanc proposed a plan in which we could treat compile time constants as a special case, and make sure that the value is not So after setting up the special case logic, we can document that the resulting value is undefined (may vary on different platforms), but is guaranteed not to leak information from a security standpoint (as it does have the potential to do now). We can also consider an RFC that adds a separate set of float-to-integer conversion methods that are slower, but have defined result semantics for the entire domain of input. |
Doing a special case for compile-time constants won't be possible because the optimiser can propagate and synthetise constants. Also, it turns out there are some inconsistencies with unsafe maths.
We're currently offering performance over potential undefined results in most cases. To me, that doesn't align with Pony's "correctness first" philosophy. I'm tempted to recategorise the issue as a bug and to do the following.
That way, users would get correctness by default, but could still pull the "unsafe hatch" for performance critical operations. |
Yes, I agree - recategorizing as a bug and marking with I'm also inclined to agree with your suggestion of fixing all of the default methods for correctness, and adding "fast"/"unsafe" variants as needed for performance-critical arithmetic. I think this will be a good approach, provided that we have clear documentation for the semantics of each operation, including what happens in an overflow. We need to make it very clear what the user is "giving up" when choosing the "fast"/"unsafe" variants over the default ones, so that they can determine whether it is acceptable (or even relevant) for their use case. As part of our discussion here, I think it would be useful to continue to hammer out and clarify these details (the precise semantics of overflow in the safe variants, the limits of safety in the unsafe variants, etc). As an aside, I'm not wild about using |
@jemc i generally agree with you on the |
Here's what I'm thinking, based on LLVM semantics. Safe variants
Fast/unsafe variants
|
Also, I think we should incorporate the fast/unsafe variants into the language trust boundary and only allow them in safe packages. |
I'm not sure how I feel about that - can you explain your reasoning more? Are they unsafe in any sense other than possibly giving back a wrong answer to a calculation? Is it because the undefined values could be used on some platforms in an attack to return uninitialized memory that could contain sensitive data? Also, another question to ask is - if I have a package that I want to allow to do "fast math", is it always the case that I also want to allow it to do FFI? Is it possibly better to think of it as a distinct and separate trust boundary? |
Also, thank you for providing the thorough and explicit writeup on the safe vs unsafe semantics in your previous comment! |
My main concern is hardware trap values and exceptions. For example, the unsafe division by 0 would raise an hardware exception on both x86 and ARM, which is very likely to halt the program.
That's a good point. Maybe that would be better. |
Fair enough - I think I'm on board with calling these operations "unsafe" now 🔥 |
On a somewhat related matter, should |
Note that the rotate semantics you are proposing require a runtime check on x86 that could be costly. I don't know the behavior on ARM. A potentially better-performing approach would be to define shift and rotation as masking off the excess high-order bits from the shift amount. That doesn't need any branches on any processor. |
Is that check done in the hardware instruction itself or in the potential boilerplate required around it? For reference, here is the implementation of fun rotl(y: A): A => (this << y) or (this >> (bitwidth() - y)) This implementation isn't currently safe so |
Also, I've been thinking about defining real Pony semantics for the "undefined results" here. First, we need to define the transformations permitted by the compiler on the program. I'm going to mostly draw from the C++ as-if rule here.
Undefined results (placeholder name):
Trust boundary:
If we keep that, it will need some refinement to work with distributed Pony. Thoughts? |
We discussed this during the sync call. The above semantics are fine, we'll probably end up with a similar version somewhere in the documentation. We also talked about adding new operators for the unsafe operations. I'll open a RFC with a few possibilities for the operators syntax. |
This was resolved in #1395. |
Currently, integer to floating point and floating point to integer conversions compile down to LLVM
uitofp
,sitofp
,fptoui
andfptosi
instructions. In case of overflow on the destination type, these instructions produce undefined values. I think this is not something we want in Pony and we need to address this concern, while impacting the existing API as little as possible.In my opinion, the best solution would be to make the relevant functions partial and to error if over/underflow would occur. We'd also have to split the
_ArithmeticConvertible
trait in more specific versions so that functions are made partial only when necessary (e.g.F32.u16()
should be partial butU8.u16()
should not). Numeric unions (Number
,Signed
, etc) should still work correctly since a non-partial function is a subtype of a partial one and we could still have a supertype for all of this.While the LLVM issue is with conversions to/from floating point, I feel like this is more general at the language level and should also apply to conversions between integers. First, for consistency with the above fix for floating points and second, because the only reason 1178 in an
U16
space is mapped to 154 in anU8
space is because our hardware is truncating it as a binary number (so the behaviour here isn't depending on the language semantics).As a side note, if we do make functions for integers partial, the old truncating behaviour will still be possible with modular arithmetic (which is optimised to truncation or bitwise operations by LLVM if corresponding to machine word sizes).
To sum up the general idea: for every arithmetic conversion function from type
A
to typeB
, if some values ofA
don't fit inB
, make theA.b()
function partial and check for overflow.The text was updated successfully, but these errors were encountered: