-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu: nvidia: ip: adjust benchdnn error threshold #2479
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -278,7 +278,16 @@ void skip_invalid_prb(const prb_t *prb, res_t *res) {} | |
|
||
void setup_cmp(compare::compare_t &cmp, const prb_t *prb, data_kind_t kind, | ||
const args_t &ref_args) { | ||
cmp.set_threshold(0.f); | ||
// The nvidia implementation has precision issues in some cases | ||
// for large problems with post-op sum | ||
if (is_nvidia_gpu() | ||
&& prb->attr.post_ops.find(attr_t::post_ops_t::kind_t::SUM) != -1 | ||
&& prb->dst_dt() == dnnl_f16) { | ||
const float trh = epsilon_dt(prb->dt[2]); | ||
cmp.set_threshold(trh); | ||
} else { | ||
cmp.set_threshold(0.f); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you know why this difference ? Is sum post-op applied over f32 intermediate value or over f16 values for NV backend? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say this change can fly in only in case sum post-op is done through a native cuDNN fusion (single call) with f16 accumulation internally, otherwise, the issue is likely inside the implementation that doesn't convert the output to f32 and accumulate pieces in f32. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The sum post-op is implemented through the @dzarukin I investigated if there are any issues with the implementation but couldn't find any. Also, I noticed that changing the input values makes the test pass, e.g. when using whole numbers as the input (still in f16 datatype). To me it seems to be some sort of a precision/rounding issue. The expected values computed by oneDNN are rounded down, while in the cuDNN case they are rounded up, e.g.
The values in full precision in the above example are not representable as f16 (e.g. https://float.exposed/0x641c), which makes me think cublas is doing incorrect rounding? Also I found this discussion where someone is asking about how the scaling parameters in cublas work, but there was no response. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sgeor255, thanks for looking into implementation details, that's a good start. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When changing data addresses the issue it always means rounding/accumulation mechanics stands on its way. Smaller ranges usually lead to situations when final numbers remain exact and conversion to f16/f32 and back don't change the number and the check passes. When exp number if x.5, in the reality, it can be x.5002, which would be rounded towards |
||
} | ||
|
||
std::vector<int> supported_exec_args(dir_t dir) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call it precision issues, wasn't our conclusion that this is differences in rounding modes that we can't change in cuDNN?