WIP debug test only: AMD OpenCL peak flops patch #3006

RichardHaselgrove · 2019-02-06T12:38:45Z

Fixes #2988 and #3001 (test only)

Description of the Change
A recent fault in AMD OpenCL driver releases results in a meaningless overflow value being generated for the device GFLOPS peak. This in turn causes tasks to fail (for new machines or new applications) with Error 197: EXIT_TIME_LIMIT_EXCEEDED

Alternate Designs
Research and apply alternative ATI driver metrics to replace the OpenCL 'Max clock frequency'

Release Notes
For testing only. GFLOPS Peak should be capped at 1.e12

…alo/boinc into JuhaSointusalo-client-gpu-detection

This takes the active ingredient from David's #3001 'fix possible overflow' and applies it in lib/coproc.cpp at COPROC_ATI::set_peak_flops() Posting so that tests can be made on the AppVeyor build: I don't have the necessary hardware.

RichardHaselgrove · 2019-02-06T14:39:22Z

SU user has confirmed that this build now reports

06-Feb-2019 14:29:16 (low) [] OpenCL: AMD/ATI GPU 0: AMD Radeon(TM) Vega 8 Graphics (driver version 2766.5 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2766.5), 6567MB, 6567MB available, 1000 GFLOPS peak)

JuhaSointusalo · 2019-02-07T18:32:49Z

Easier to organize thoughts when writing so putting this here and now instead of in meeting. I would suggest that

the patch to be moved so that it covers all vendors. Just because AMD released buggy driver this time doesn't mean others won't ever do it.
a message added that tells the computed peak_flops was rejected and replaced by a default value. Look at what David did to see how the message is added. I don't think there is much point in listing which input value was bad because it could have been any of them but include what the rejected peak_flops was.
the default value lowered significantly. For example, my mobile Intel GPU has 192 GFLOPS peak. Having peak_flops lower than in reality should be safe I think.
the patch to be redone so that it includes only what's in master.

JuhaSointusalo · 2019-02-07T20:33:16Z

Look at what David did to see how the message is added.

My reply to that was a bit short. I had not noticed the patch was in library code. Getting useful messages out of library code is not something that is explained in one sentence, or a five minute job, or one line code change. Right now I couldn't tell how to do it and it's quite possible the changes necessary to do it would be too large to fit in to a hotfix.

RichardHaselgrove · 2019-02-07T21:11:17Z

The easiest way might be to pass back a 'value limited' flag to the calling routine, and let the client code respond with message code when the flag is present.

JuhaSointusalo · 2019-02-12T21:28:31Z

#3001 implemented a workaround for the driver bug in a different way. Closing this as unnecessary.

RichardHaselgrove added 5 commits September 23, 2018 13:23

Merge branch 'client-gpu-detection' of https://github.com/JuhaSointus…

ab94ce9

…alo/boinc into JuhaSointusalo-client-gpu-detection

Merge branch 'JuhaSointusalo-client-gpu-detection'

4c6a526

Merge branch 'master' of https://github.com/BOINC/boinc

1044b73

Merge branch 'master' of https://github.com/BOINC/boinc

f8ff353

WIP debug test only: AMD OpenCL peak flops patch

31eaa0d

This takes the active ingredient from David's #3001 'fix possible overflow' and applies it in lib/coproc.cpp at COPROC_ATI::set_peak_flops() Posting so that tests can be made on the AppVeyor build: I don't have the necessary hardware.

JuhaSointusalo closed this Feb 12, 2019

JuhaSointusalo deleted the RDH_AMD_flops_patch branch February 12, 2019 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP debug test only: AMD OpenCL peak flops patch #3006

WIP debug test only: AMD OpenCL peak flops patch #3006

RichardHaselgrove commented Feb 6, 2019

RichardHaselgrove commented Feb 6, 2019 •

edited

Loading

JuhaSointusalo commented Feb 7, 2019

JuhaSointusalo commented Feb 7, 2019

RichardHaselgrove commented Feb 7, 2019

JuhaSointusalo commented Feb 12, 2019

WIP debug test only: AMD OpenCL peak flops patch #3006

WIP debug test only: AMD OpenCL peak flops patch #3006

Conversation

RichardHaselgrove commented Feb 6, 2019

RichardHaselgrove commented Feb 6, 2019 • edited Loading

JuhaSointusalo commented Feb 7, 2019

JuhaSointusalo commented Feb 7, 2019

RichardHaselgrove commented Feb 7, 2019

JuhaSointusalo commented Feb 12, 2019

RichardHaselgrove commented Feb 6, 2019 •

edited

Loading