Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use native Float16 #69

Merged
merged 3 commits into from
Feb 2, 2021
Merged

Use native Float16 #69

merged 3 commits into from
Feb 2, 2021

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Jan 27, 2021

@thomasfaingnaert:

I had to use some ugly workarounds to get around the fact that Float16 was mapped to i16 instead of half, but that shouldn't be a problem in Julia 1.6, so that code can be cleaned up significantly.

Which other workaround are there?

@thomasfaingnaert
Copy link
Member

Off the top of my head:

  • transform_shared_to_regs_c = Transform.Elementwise(x -> x * (beta / alpha)),
    transform_regs_to_shared_d = Transform.Elementwise(x -> x * alpha),
    : avoids FP16 multiplication of WMMA fragments by calculating D = alpha * (A * B + beta / alpha * C) instead of D = alpha * A * B + beta * C. (ref Add workaround for FP16 multiplication #27)
  • @inline @generated function vloada(::Type{Vec{N, T}}, ptr::Core.LLVMPtr{T, AS}, i::Integer = 1) where {N, T, AS}
    alignment = sizeof(T) * N
    vec_len = (sizeof(T) * N) ÷ sizeof(Float32)
    return quote
    vec_ptr = Base.bitcast(Core.LLVMPtr{NTuple{$vec_len, VecElement{Float32}}, AS}, ptr)
    return unsafe_load(vec_ptr, (i-1) ÷ N + 1, Val($alignment))
    end
    end
    @inline @generated function vstorea!(::Type{Vec{N, T}}, ptr::Core.LLVMPtr{T, AS}, x, i::Integer = 1) where {N, T, AS}
    alignment = sizeof(T) * N
    vec_len = (sizeof(T) * N) ÷ sizeof(Float32)
    return quote
    vec_ptr = Base.bitcast(Core.LLVMPtr{NTuple{$vec_len, VecElement{Float32}}, AS}, ptr)
    return unsafe_store!(vec_ptr, x, (i-1) ÷ N + 1, Val($alignment))
    end
    end
    : the explicit vectorisation functions need to load/store using a NTuple{4, VecElement{Float32}} instead of a NTuple{8, VecElement{Float16}} because the latter was converted to <8 x i16>, which NVPTX refuses to vectorise completely. That unfortunately meant that this "wrong" type is propagated upwards the entire call hierarchy.
  • if VERSION < v"1.6.0-DEV.1236"
    @inline bitcast_helper(x::NTuple{8, VecElement{Float16}}) = Base.llvmcall(
    "
    %ret = bitcast <8 x i16> %0 to <4 x float>
    ret <4 x float> %ret
    ", NTuple{4, VecElement{Float32}}, Tuple{NTuple{8, VecElement{Float16}}}, x)
    else
    @inline bitcast_helper(x::NTuple{8, VecElement{Float16}}) = Base.llvmcall(
    "
    %ret = bitcast <8 x half> %0 to <4 x float>
    ret <4 x float> %ret
    ", NTuple{4, VecElement{Float32}}, Tuple{NTuple{8, VecElement{Float16}}}, x)
    end
    @inline function load(::Type{Diagonal{T}}, workspace, tile::Tile{size}) where {T, size}
    N = 16 ÷ sizeof(T)
    # The row index is given by t.index[1] + (k - 1), the column index is given by t.index[2] (0-based).
    # Only load on the diagonal, i.e. if row and column are equal.
    # Note that t.index[2] is 0-based, so we need to add 1 before loading from workspace.
    # TODO: Remove the <4 x float> everywhere, so we don't have to do this ugly casting all over the place.
    return bitcast_helper(ntuple(k -> VecElement{Float16}(tile.index[1] + k - 1 == tile.index[2] ? @inbounds(workspace[tile.index[2] + 1]) : 0), Val(8)))
    end
    : the bitcast_helper is not needed anymore, because the vectorisation functions no longer expect an NTuple{4, VecElement{Float32}}.

Copy link
Contributor

@DilumAluthge DilumAluthge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also change the Julia compat entry in the Project.toml file to be julia = "1.6".

Copy link
Contributor

@DilumAluthge DilumAluthge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, one of the jobs should submit coverage. Now that the 1.5 job is gone, maybe the 1.6-nightly job should submit coverage.

@maleadt maleadt force-pushed the tb/half branch 2 times, most recently from e8089f2 to 630308e Compare January 28, 2021 11:40
@maleadt maleadt marked this pull request as ready for review January 28, 2021 11:46
src/layout.jl Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants