UAT test scenario to use GPU #128

nishant-dash · 2024-10-03T09:04:08Z

Context

Can we have a UAT test (or set of tests) that helps run a test workload that utilizes one or more gpus? Ideally this would be both a notebook and a pipeline job.
It may be tricky to make it generic enough to run in any environment out of the box, but just having a test there with minimal assumptions is better than having no tests for it.

This will be super helpful to have when running validation on various cloud deployments.

What needs to get done

UAT test (or set of tests) that helps run a test workload that utilizes one or more gpus
Ideally this would be both a notebook and a pipeline job.

Definition of Done

Working UATs for notebook and pipelines that test gpu utilization successfully.

syncronize-issues-to-jira · 2024-10-03T09:04:16Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6358.

This message was autogenerated

orfeas-k · 2024-11-01T10:40:02Z

Thank you for the proposal @nishant-dash . We will use this issue to implement the notebook itself. First, we will have to enable the driver though in order to enable running the notebook in an automated way, which means this will be worked on after #130, #131, #132.

EDIT: The design should be linked to the KF113 epic.

orfeas-k · 2024-11-27T15:37:50Z

The proposed notebook in #139 uses kfp SDK in order to create an experiment and run it. The pipeline:

Schedules runs in a GPU node. That means that if there is no NVIDIA GPU, the run's pod will remain pending and cause the test to timeout and fail.
Runs code which uses a Tensorflow framework function in order to detect if it can find a GPU in the node. If it doesn't find one, it raises resulting the test to timeout and fail.

nishant-dash added the enhancement New feature or request label Oct 3, 2024

orfeas-k mentioned this issue Nov 27, 2024

feat: Add kfp-tensorflow notebook to confirm NVIDIA GPU access #139

Merged

orfeas-k closed this as completed in #139 Nov 29, 2024

orfeas-k closed this as completed in 96bd591 Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UAT test scenario to use GPU #128

UAT test scenario to use GPU #128

nishant-dash commented Oct 3, 2024

syncronize-issues-to-jira bot commented Oct 3, 2024

orfeas-k commented Nov 1, 2024 •

edited

Loading

orfeas-k commented Nov 27, 2024

UAT test scenario to use GPU #128

UAT test scenario to use GPU #128

Comments

nishant-dash commented Oct 3, 2024

Context

What needs to get done

Definition of Done

syncronize-issues-to-jira bot commented Oct 3, 2024

orfeas-k commented Nov 1, 2024 • edited Loading

orfeas-k commented Nov 27, 2024

orfeas-k commented Nov 1, 2024 •

edited

Loading