Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAN constants have different signs on compiler versus interpreter #12382

Closed
HertzDevil opened this issue Aug 13, 2022 · 10 comments · Fixed by #12400
Closed

NAN constants have different signs on compiler versus interpreter #12382

HertzDevil opened this issue Aug 13, 2022 · 10 comments · Fixed by #12400

Comments

@HertzDevil
Copy link
Contributor

In compiled mode, not-a-numbers have a positive sign from their binary representation:

def h(x)
  Slice.new(pointerof(x), 1).dup.to_unsafe_bytes.reverse!.hexstring
end

Float32::NAN                        # => NaN
(-Float32::NAN)                     # => NaN
Math.copysign(Float32::NAN, 1_f32)  # => NaN
Math.copysign(Float32::NAN, -1_f32) # => -NaN
Math.copysign(1_f32, Float32::NAN)  # => 1.0
Math.copysign(-1_f32, Float32::NAN) # => 1.0

h Float32::NAN                        # => "7fc00000"
h (-Float32::NAN)                     # => "7fc00000"
h Math.copysign(Float32::NAN, 1_f32)  # => "7fc00000"
h Math.copysign(Float32::NAN, -1_f32) # => "ffc00000"
h Math.copysign(1_f32, Float32::NAN)  # => "3f800000"
h Math.copysign(-1_f32, Float32::NAN) # => "3f800000"

The same code shows that Float32::NAN is negative on the interpreter:

Float32::NAN                        # => -NaN
(-Float32::NAN)                     # => -NaN
Math.copysign(Float32::NAN, 1_f32)  # => NaN
Math.copysign(Float32::NAN, -1_f32) # => -NaN
Math.copysign(1_f32, Float32::NAN)  # => -1.0
Math.copysign(-1_f32, Float32::NAN) # => -1.0

h Float32::NAN                        # => "ffc00000"
h (-Float32::NAN)                     # => "ffc00000"
h Math.copysign(Float32::NAN, 1_f32)  # => "7fc00000"
h Math.copysign(Float32::NAN, -1_f32) # => "ffc00000"
h Math.copysign(1_f32, Float32::NAN)  # => "bf800000"
h Math.copysign(-1_f32, Float32::NAN) # => "bf800000"

This affects #12242 and #12244.

@asterite
Copy link
Member

Interestingly, on my Mac the output is the same for both compiled and interpreted mode. Which makes things even stranger, as the interpreted should be platform agnostic in my mind...

@I3oris
Copy link
Contributor

I3oris commented Aug 15, 2022

This seems to come from:

in .f32?
put_i32 value.to_f32.unsafe_as(Int32), node: node
in .f64?
put_i64 value.to_f64.unsafe_as(Int64), node: node

Because in crystal compiled mode (on my machine):

pp! Float32::NAN.to_f32.unsafe_as(Int32)     # => 2143289344 (0x7fc00000)
pp! (-Float32::NAN).to_f32.unsafe_as(Int32) # => 2143289344 (0x7fc00000)

Which is very weird because NaN and -NaN representation cannot be the same!?..
ditto for:

# without to_f32
pp! Float32::NAN.unsafe_as(Int32)     # => 2143289344 (0x7fc00000)
pp! (-Float32::NAN).unsafe_as(Int32) # => 2143289344 (0x7fc00000)

# with UInt32
pp! Float32::NAN.unsafe_as(UInt32)     # => 2143289344 (0x7fc00000)
pp! (-Float32::NAN).unsafe_as(UInt32) # => 2143289344 (0x7fc00000)

So there a problem how crystal cast NaN to Int32 on certain platforms. I wonder if this works well on mac too.

however:

pp! 0x7fc00000.unsafe_as(Float32) # => NaN
pp! 0xffc00000.unsafe_as(Float32) # => -NaN

Works on both compiled and interpreter.

crystal -v

Crystal 1.5.0 (2022-07-21)

LLVM: 14.0.6
Default target: x86_64-pc-linux-gnu

@straight-shoota
Copy link
Member

straight-shoota commented Aug 15, 2022

@I3oris

Which is very weird because NaN and -NaN representation cannot be the same!?..

There is no -NaN here. They're all just NaN. You can never get -NaN anywhere in Crystal unless you artificially construct it (for example via copysign). That's similar in most languages. A distinction between positive and negative NaN just doesn't make much sense. When it's not a number, how could it be positive or negative? 🤷
However, the floating point format still assigns a sign to every value (even NaN), so there are technically positive and negative NaN values due to the encoding. In practice, only the positive ones are typically used. But if you end up with a negative one, it should work nevertheless.

@I3oris
Copy link
Contributor

I3oris commented Aug 15, 2022

Okay, thanks for explanations, that make more sense on what going on. The error stay mysterious however.

@I3oris
Copy link
Contributor

I3oris commented Aug 15, 2022

Float32::NAN is defined like this (0_f32 / 0_f32) which give NaN. However on interpreter 0_f32/0_f32 = -NaN. The result -NaN come from:

div_f32: {
pop_values: [a : Float32, b : Float32],
push: true,
code: a / b,

We could see that with:

code: begin
          pp! a, b
          pp! (a / b)
        end,

The thing is that in compiled mode:

  0f32 / 0f32 # => NaN
  
  a = 0f32
  a / a    # => -NaN
  a / 0f32 # => -NaN
  0f32 / a # => -NaN

and also:

0i32 / 0i32 # => -NaN

The sign differs if argument come from literal or from a variable.

When the code 0_f32/0_f32 is interpreted and passes by the (a / b) division, it gives -NaN instead of NaN because it passes by variable instead of literal.

On way to solve the issue is to define Float32::NAN = 0x7fc00000.unsafe_as(Float32), to avoid weird behavior of division, and ditto for Float64.

However a=0f32; a/0f32 != 0f32/0f32 is a new issue for me.

@straight-shoota
Copy link
Member

straight-shoota commented Aug 15, 2022

That's a great catch. Looking at the LLVM IR that these expressions produce, it shows that LLVM just hard codes 0 / 0 as 0x7fc00000. So apparently when an actual div operation is performed, it has a negative sign in the result.

literals: ; 0f32/0f32
        movss   .LCPI2209_0(%rip), %xmm0
        retq

variable: ; a = 0f32; a / a
        xorps   %xmm0, %xmm0
        movss   %xmm0, -4(%rsp)
        movss   -4(%rsp), %xmm0
        divss   -4(%rsp), %xmm0
        retq

I'm not sure what's the best way to deal about this, but at least it seems to be clear whats happening and why.

@HertzDevil HertzDevil changed the title Not-a-number has different signs on compiler versus interpreter NAN constants have different signs on compiler versus interpreter Aug 16, 2022
@HertzDevil
Copy link
Contributor Author

HertzDevil commented Aug 16, 2022

The thing is that in compiled mode:

  0f32 / 0f32 # => NaN
  
  a = 0f32
  a / a    # => -NaN
  a / 0f32 # => -NaN
  0f32 / a # => -NaN

and also:

0i32 / 0i32 # => -NaN

The sign differs if argument come from literal or from a variable.

The same snippet actually gives all positive NaNs in compiled release mode. Similar results happen in C: https://godbolt.org/z/d4v1hn5EG

The sign bit of NaNs does matter on a few IEEE 754 operations, because they "treat floating-point numbers and NaNs alike"; those are copy (trivial copy by value, #dup), negate (#-), abs (#abs), and copySign (Math.copysign). In other contexts the sign bit shouldn't matter. Since the standard library specs should cover the NAN constants in those methods' specs, the NANs should have a fixed sign bit, and while not mentioned yet, they should also be quiet and have no payload. So I think the unsafe_as approach is acceptable as it shows the constants' binary representations.

An alternative is defining NAN = Math.copysign(0.0 / 0.0, 1).

@straight-shoota
Copy link
Member

Not sure if hard coding a specific NaN binary representation is a good idea. Letting LLVM figure that out seems like a good move.

Perhaps we should just ignore the signedness of NaN, as well as any explicit value. There's no guarantee what you'll get at runtime. And it just doesn't matter.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Aug 16, 2022

IEEE 754-2019 says:

5.5.1 Sign bit operations

For numeric x,

...

  • copySign(x, y) has the same representation as x, but with the sign of y if y is numeric and with unspecified sign if y is a NaN.

For x a NaN, the results of the operations have the same representation as x (qNaN or sNaN).

So #12242 and #12244 invoke unspecified behavior 🙄 This clause is not in IEEE 754-2008, but if we rely on LibM we don't really know which revision it conforms to.

@HertzDevil
Copy link
Contributor Author

At this rate I believe we should display all NaNs as positive, in both #to_s and #inspect. Anyone who needs to operate with NaNs at the binary representation level probably knows to do it through #unsafe_as, not #to_s. If there is further demand we could expose Float::Printer::IEEE or something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants