-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for running TensorFlow CPU and GPU tests separately and enhance test failure reporting #2312
Add support for running TensorFlow CPU and GPU tests separately and enhance test failure reporting #2312
Conversation
ed75407
to
2b599dd
Compare
@Flamefire #2293 merged, please rebase so we can get this in too before the next EasyBuild release, since it makes a couple of changes compared to #2263 (and #2292) |
2b599dd
to
07cbbf9
Compare
easybuild/easyblocks/t/tensorflow.py
Outdated
|
||
# TF can only run tests explicitely marked as 'gpu' on GPUs and those don't work on cpu | ||
# See https://github.com/tensorflow/tensorflow/issues/45664 | ||
if device == 'cpu': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be gpu
?
If not, please clarify the comment, since this doesn't make sense at first glance...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's -1
... 🤦
OK, please mention what -1
does in the comment, it's easy to overlook that as 1
which strongly suggests "enable"...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the comment to say "Disable GPUs..." (or so), is that enough?
Test report by @Flamefire Overview of tested easyconfigs (in order)
Build succeeded for 3 out of 3 (3 easyconfigs in total) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Flamefire Some minor tweaks done in Flamefire#5, please check/merge so we can (finally) get this show on the road :)
Test report by @boegel Overview of tested easyconfigs (in order)
Build succeeded for 2 out of 2 (2 easyconfigs in total) |
As discussed at easybuilders/easybuild-easyconfigs#11637 (comment) I added 2 more changes to a) allow running on non-GPU nodes and b) propagate cuda cache variables, see also easybuilders/easybuild-framework#3569 |
Test report by @boegel Overview of tested easyconfigs (in order)
Build succeeded for 20 out of 20 (1 easyconfigs in total) |
Test report by @Flamefire Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (1 easyconfigs in total) |
@Flamefire This needs non-trivial merge conflict fixing after the merging of #2314... |
… failure reporting
…leanup and enhanced comments
…sted on systems without GPUs
42aeda3
to
71582c9
Compare
Rebased. Merge conflict was minor ( |
Also found 2 small issues when double-checking the code. I hope that's all, otherwise I'll check back on Monday morning |
Test report by @boegel Overview of tested easyconfigs (in order)
Build succeeded for 0 out of 1 (1 easyconfigs in total) edit: failed due to |
Test report by @boegel Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (1 easyconfigs in total) |
This has been extensively tested, both with existing So let's get this merged, thanks a lot for your efforts here @Flamefire! |
(created using
eb --new-pr
)Based on
create less temporary directories for TensorFlow by (only) using --output_user_root #2293 (needs rebase to clean up changes)