Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float64 to Float16 conversion is slow #41161

Open
JeffBezanson opened this issue Jun 9, 2021 · 6 comments
Open

Float64 to Float16 conversion is slow #41161

JeffBezanson opened this issue Jun 9, 2021 · 6 comments
Labels

Comments

@JeffBezanson
Copy link
Member

julia> @btime Float16(rand())
  15.878 ns (0 allocations: 0 bytes)

julia> @btime Float16(Float32(rand()))
  5.928 ns (0 allocations: 0 bytes)

julia> @btime Float16(rand(Float32))
  4.282 ns (0 allocations: 0 bytes)

I believe we are calling compiler-rt for this. Of course this can't be implemented by converting via Float32 since that rounds twice, but it's frustrating that that method is so much faster. Would be nice to have a better implementation of this. See also #40315.

@oscardssmith
Copy link
Member

Yeah. This is one of those functions that should be really easy to implement, but is surprisingly hard to get correct and fast. It's been on my list for a while.

@vchuravy
Copy link
Member

I believe we are calling compiler-rt for this.

julia> @code_native Float16(rand())
	.text
; ┌ @ float.jl:180 within `Float16'
	pushq	%rax
	movabsq	$__truncdfhf2, %rax
	callq	*%rax
	popq	%rcx
	retq
	nop
; └

Which is the compiler-rt name for it, but it should end up in

extern "C" JL_DLLEXPORT uint16_t __truncdfhf2(double param)

Looking at it closely it internally converts to float and then uses our implementation of float_to_half. Whereas doing the conversion on the Julia level will use the x86 intrinsic to go from Float32->Float16. We might want to try compiler-rt (especially since nowadays OrcV2 let's you add static libraries to an ExecutionSession instead of having to turn the compiler-rt archive into a shared library as did in https://github.com/JuliaLang/julia/pull/17344/files#diff-c9f616510e5e877240287257026b05d8fb29270feead033f6a87ccf6213dd66bR566)

@fingolfin
Copy link
Member

Interestingly this is already fast on M1 macs (so ARM), with Julia 1.9.1

julia> @btime Float16(rand())
  3.083 ns (0 allocations: 0 bytes)
Float16(0.952)

julia> @btime Float16(Float32(rand()))
  3.083 ns (0 allocations: 0 bytes)
Float16(0.798)

julia> @btime Float16(rand(Float32))
  3.083 ns (0 allocations: 0 bytes)
Float16(2.164e-5)

It is still slow on an x86_64 machine (also using Julia 1.9.1):

julia> @btime Float16(rand())
  18.348 ns (0 allocations: 0 bytes)
Float16(0.906)

julia> @btime Float16(Float32(rand()))
  5.037 ns (0 allocations: 0 bytes)
Float16(0.1693)

julia> @btime Float16(rand(Float32))
  4.571 ns (0 allocations: 0 bytes)
Float16(0.6675)

@gbaraldi
Copy link
Member

@oscardssmith should we do the double conversion, define Float16(Float64) as Float16(Float32(Float64)) or is the double rounding wrong?

@oscardssmith
Copy link
Member

double rounding is wrong

@timholy
Copy link
Member

timholy commented Jun 28, 2023

Demo: round 0.499 to 2 digits: you get 0.50. Now round to 1 digit: you get 1 (with "round up"). But round 0.499 to 1 digit immediately: you get 0, even with round up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants