Half-Precision Intrinsics #871

iyaja · 2021-04-27T12:17:31Z

This adds support for some half-precision intrinsics as mentioned in #391.

The implementations so far:

Alternative definitions for some functions like expm1 which did not have a fallback implementation for Float16 arguments in Base. Some Base math functions also rely on libm, which does not work for the CUDA definitions.
shfl_recurse for Float16 via reinterpret, as in done for Float64.
atomic_add! for Float16. PTX 6.3, which is the current supported version, only has an f16 atomic add instruction. Other atomics can be implemented in via atomic_cas!, but it seems like like CUDA headers (i.e. cuda_fp16.h) only provides a definition for __half atomicAdd(__half *address, __half val) and none of the other atomics.

The device/intrinsics test suite has also been updated to include the above definitions.

src/device/intrinsics/math.jl

codecov · 2021-04-27T18:24:38Z

Codecov Report

Merging #871 (92a661f) into master (57577a1) will increase coverage by 0.06%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #871      +/-   ##
==========================================
+ Coverage   77.06%   77.13%   +0.06%     
==========================================
  Files         121      121              
  Lines        7530     7517      -13     
==========================================
- Hits         5803     5798       -5     
+ Misses       1727     1719       -8

Impacted Files	Coverage Δ
lib/cudadrv/memory.jl	`85.87% <0.00%> (+0.48%)`	⬆️
src/pointer.jl	`77.94% <0.00%> (+5.57%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b871a0...92a661f. Read the comment docs.

src/device/intrinsics/math.jl

maleadt · 2021-04-28T14:55:29Z

All green :-) Let's merge this.

iyaja added 4 commits April 27, 2021 17:32

feat(math): half-precision intrinsics

4a4d9cf

feat(warp_shuffle): half-precision intrinsics

dd2e47a

feat(atomics): half-precision intrinsics

46db62a

test(device/intrinsics): half-precision intrinsics

e3a0b6b

iyaja marked this pull request as draft April 27, 2021 12:17

maleadt reviewed Apr 27, 2021

View reviewed changes

src/device/intrinsics/math.jl Show resolved Hide resolved

src/device/intrinsics/math.jl Outdated Show resolved Hide resolved

src/device/intrinsics/math.jl Outdated Show resolved Hide resolved

maleadt added cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request labels Apr 27, 2021

iyaja added 2 commits April 27, 2021 20:03

fix(math): use base mod

41616fe

fix(test): check compute capability for fp16

8ff955f

vchuravy reviewed Apr 27, 2021

View reviewed changes

src/device/intrinsics/math.jl Outdated Show resolved Hide resolved

feat(test): add fma testset

717744a

iyaja force-pushed the master branch 2 times, most recently from 51bc209 to 717744a Compare April 28, 2021 08:46

merge: upstream changes

db42787

maleadt reviewed Apr 28, 2021

View reviewed changes

src/device/intrinsics/math.jl Outdated Show resolved Hide resolved

chore(math): use ccall for fma

92a661f

iyaja marked this pull request as ready for review April 28, 2021 12:22

maleadt merged commit 91db76e into JuliaGPU:master Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half-Precision Intrinsics #871

Half-Precision Intrinsics #871

iyaja commented Apr 27, 2021

codecov bot commented Apr 27, 2021 •

edited

Loading

maleadt commented Apr 28, 2021

Half-Precision Intrinsics #871

Half-Precision Intrinsics #871

Conversation

iyaja commented Apr 27, 2021

codecov bot commented Apr 27, 2021 • edited Loading

Codecov Report

maleadt commented Apr 28, 2021

codecov bot commented Apr 27, 2021 •

edited

Loading