-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global math mode for easy use of lower-precision functionality #424
Conversation
@denizyuret How is the CUDNN rework progressing? Do you have a PR somewhere? I'm holding off on changing the wrappers to avoid conflicts, but this PR would require some changes there (notably, using the latest +function math_type()
+ math_mode = CUDA.math_mode()
+ if math_mode == CUDA.PEDANTIC_MATH
+ CUDNN_DEFAULT_MATH
+ elseif math_mode == CUDA.DEFAULT_MATH
+ CUDNN_TENSOR_OP_MATH
+ elseif math_mode == CUDA.FAST_MATH
+ CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION
+ end
+end That's for implicit use of tensor cores; for explicit use I had the CUBLAS changes in #417, and CUDNN probably needs to be adapted too in order to support explicit (B)Float16 inputs. Are you taking care of those as part of your rework? |
dd1fb3e
to
399be99
Compare
Codecov Report
@@ Coverage Diff @@
## master #424 +/- ##
==========================================
- Coverage 79.75% 79.59% -0.16%
==========================================
Files 170 170
Lines 9051 9088 +37
==========================================
+ Hits 7219 7234 +15
- Misses 1832 1854 +22
Continue to review full report at Codecov.
|
Hi Tim,
You can see my work so far in a Knet PR:
denizyuret/Knet.jl#614
It is about 50% done, I covered the basic operations and ended up spending
a lot of time on MultiHeadedAttn (which is important for language models
like BERT, GPT etc and which CUDNN started to support natively). Still to
go is conv, rnn, batchnorm - but I have already implemented them elsewhere
so this should not be too much more work if I can find the time to sit down.
Again, this is a Knet PR and I still keep changing the high level
interface, so for now (1) only forw functions have high level counterparts,
I use back functions as is in the AutoGrad rules, (2) I use AutoGrad to
define gradients, (3) I try to define everything array-type agnostic so
things work with both KnetArrays and CuArrays. To turn this into a CUDA.jl
PR (if/when we want to go that way):
0. I would need to complete all high level methods.
1. I would need to possibly define high level methods for back functions?
2. Either do not define gradients or define them elsewhere?
3. Get rid of KnetArrays and just define them for CuArrays?
Any suggestions are welcome.
On Mon, Sep 14, 2020 at 3:12 PM Tim Besard ***@***.***> wrote:
@denizyuret <https://github.com/denizyuret> How is the CUDNN rework
progressing? Do you have a PR somewhere? I'm holding off on changing the
wrappers to avoid conflicts, but this PR would require some changes there
(notably, using the latest v8 descriptor constructors and passing in a
math mode).
I am good with whatever cudnn version(s) you want to support. For
descriptors I found myself repeating a lot of code so wrote a macro:
https://github.com/denizyuret/Knet.jl/blob/f133587a35e94b776730ca25464d55d6ef6ce2a3/src/cudnn/common.jl#L61
That way I can define a descriptor in one line:
@cudnnDescriptor(Tensor, cudnnSetTensorNdDescriptorEx)
The second (optional) argument determines the setter function and the
arguments of the default constructor are set to be the arguments to the
setter function. Also, all descriptors are memoized which I found out helps
with performance.
That's for implicit use of tensor cores; for explicit use I had the CUBLAS
changes in #417 <#417>, and CUDNN
probably needs to be adapted too in order to support explicit (B)Float16
inputs. Are you taking care of those as part of your rework?
I do want to make using tensor cores default/easy. So for example here I
define the math type to be the fastest allowed by cudnn v8 doc:
https://github.com/denizyuret/Knet.jl/blob/f133587a35e94b776730ca25464d55d6ef6ce2a3/src/cudnn/multiheadattn.jl#L256
I still haven't tested this approach on other cudnn versions and nvidia
chips, so still some work to see how well it will generalize.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#424 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN43JQ4XECOVYM56U5C3XDSFYCD5ANCNFSM4RLQJX7Q>
.
|
Fixes #354:
Note the
CUBLAS_COMPUTE_32F_FAST_16F
TODO: same treatment for CUDNN