Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half-Precision Intrinsics #871

Merged
merged 9 commits into from
Apr 28, 2021
Merged

Half-Precision Intrinsics #871

merged 9 commits into from
Apr 28, 2021

Conversation

iyaja
Copy link
Contributor

@iyaja iyaja commented Apr 27, 2021

This adds support for some half-precision intrinsics as mentioned in #391.

The implementations so far:

  • Alternative definitions for some functions like expm1 which did not have a fallback implementation for Float16 arguments in Base. Some Base math functions also rely on libm, which does not work for the CUDA definitions.
  • shfl_recurse for Float16 via reinterpret, as in done for Float64.
  • atomic_add! for Float16. PTX 6.3, which is the current supported version, only has an f16 atomic add instruction. Other atomics can be implemented in via atomic_cas!, but it seems like like CUDA headers (i.e. cuda_fp16.h) only provides a definition for __half atomicAdd(__half *address, __half val) and none of the other atomics.

The device/intrinsics test suite has also been updated to include the above definitions.

@iyaja iyaja marked this pull request as draft April 27, 2021 12:17
src/device/intrinsics/math.jl Show resolved Hide resolved
src/device/intrinsics/math.jl Outdated Show resolved Hide resolved
src/device/intrinsics/math.jl Outdated Show resolved Hide resolved
@maleadt maleadt added cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request labels Apr 27, 2021
@codecov
Copy link

codecov bot commented Apr 27, 2021

Codecov Report

Merging #871 (92a661f) into master (57577a1) will increase coverage by 0.06%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #871      +/-   ##
==========================================
+ Coverage   77.06%   77.13%   +0.06%     
==========================================
  Files         121      121              
  Lines        7530     7517      -13     
==========================================
- Hits         5803     5798       -5     
+ Misses       1727     1719       -8     
Impacted Files Coverage Δ
lib/cudadrv/memory.jl 85.87% <0.00%> (+0.48%) ⬆️
src/pointer.jl 77.94% <0.00%> (+5.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b871a0...92a661f. Read the comment docs.

@iyaja iyaja force-pushed the master branch 2 times, most recently from 51bc209 to 717744a Compare April 28, 2021 08:46
@iyaja iyaja marked this pull request as ready for review April 28, 2021 12:22
@maleadt
Copy link
Member

maleadt commented Apr 28, 2021

All green :-) Let's merge this.

@maleadt maleadt merged commit 91db76e into JuliaGPU:master Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants