Add workload services #732

AmartC · 2022-10-27T23:40:17Z

This PR adds the specs for the workload and pretrained DRAIN services as well as the specs for the CPU and GPU inferencing services and the training controller.

dbason

In general this looks fine.

To fix the compilation errors you will need to update the ConvertSpec function in /pkg/resources/opnicluster/util.go and replace

services.Drain = v1beta2.DrainServiceSpec(input.Services.Drain)

with

services.Drain = v1beta2.DrainServiceSpec{
    ImageSpec: input.Services.Drain.ImageSpec,
    Enabled: input.Services.Drain.Enabled,
    NodeSelector: input.Services.Drain.NodeSelector,
    Tolerations: input.Services.Drain.Tolerations,
    Replicas: input.Services.Drain.Replicas,
}

You will also need to remove the commented lines from /controllers/ai_opnicluster_controller_test.go. Also please look into adding tests for the new services you have added.

pkg/resources/opnicluster/services.go

dbason

You also need to remove the section in /pkg/resourcees/opnicluster/opnicluster.go that logs about GPU learning not being supported. It starts on line 166

pkg/resources/opnicluster/util.go

pkg/resources/opnicluster/services.go

…ller service for Opni log anomaly configuration

…service

…ithin training controller deployment

…figured

…inference deployment automatically

kralicky

LGTM

AmartC requested review from dbason and kralicky October 27, 2022 23:40

AmartC force-pushed the add-workload-services branch 2 times, most recently from a7636ef to d6320bf Compare November 1, 2022 23:10

dbason suggested changes Nov 2, 2022

View reviewed changes

pkg/resources/opnicluster/services.go Outdated Show resolved Hide resolved

kralicky requested changes Nov 2, 2022

View reviewed changes

pkg/resources/opnicluster/services.go Outdated Show resolved Hide resolved

pkg/resources/opnicluster/services.go Show resolved Hide resolved

AmartC force-pushed the add-workload-services branch from 817d77b to d5c68cc Compare November 2, 2022 18:22

AmartC requested review from kralicky and dbason November 2, 2022 18:22

AmartC mentioned this pull request Nov 2, 2022

Move environment variables into a config file for AI services. #756

Closed

AmartC force-pushed the add-workload-services branch 3 times, most recently from e21c8df to eb7d092 Compare November 3, 2022 04:21

dbason suggested changes Nov 3, 2022

View reviewed changes

pkg/resources/opnicluster/util.go Show resolved Hide resolved

AmartC requested a review from dbason November 3, 2022 20:40

AmartC marked this pull request as ready for review November 3, 2022 21:00

kralicky previously approved these changes Nov 4, 2022

View reviewed changes

AmartC dismissed kralicky’s stale review via 2a2ab08 November 14, 2022 03:30

AmartC force-pushed the add-workload-services branch 10 times, most recently from bddba7f to b22fe46 Compare November 18, 2022 19:24

kralicky previously approved these changes Nov 18, 2022

View reviewed changes

dbason suggested changes Nov 21, 2022

View reviewed changes

pkg/resources/opnicluster/services.go Outdated Show resolved Hide resolved

dbason previously approved these changes Nov 22, 2022

View reviewed changes

AmartC dismissed dbason’s stale review via 1c798a4 November 24, 2022 00:25

AmartC requested a review from dbason November 28, 2022 19:24

AmartC force-pushed the add-workload-services branch from 55d5049 to f4a38c0 Compare November 28, 2022 19:25

AmartC and others added 21 commits November 28, 2022 12:31

initial commit for opnicluster

e41d006

Add workload and pretrained DRAIN services as well as training contro…

f0fb059

…ller service for Opni log anomaly configuration

Update code in ai apis directory

7bd4364

Update services spec to include training controller

6c472cc

Update services.go file to have custom deployment for workload DRAIN …

48216cb

…service

Add workload drain service to Opni AI services configuration

295c290

Remove setting of NODE_TLS_REJECT_UNAUTHORIZED environment variable w…

7ba6fc8

…ithin training controller deployment

Update suite tests to include training controller service

e02d07c

Update opnicluster.go and util.go to support training controller module

3b13fd6

Update crds for training controller service

f36d1dd

Minify crd yaml in mage crdgen

fa0b649

Update crds created

12903c4

Prepend document separators to minified crd yaml files

a0f103c

Update crds

505c161

Update training controller service spec bug

4bec8f3

Allow gpu controller runtimeclass to be nil

56b73f7

Update GPU controller to only deploy inferencing service with GPU con…

4fbef8a

…figured

Update GPU controller tests

9c53f74

Fix up test case

0636512

Update workload services to be disabled by default

8660527

Update services.go to address nil pointer bug and also do not enable …

20d11f5

…inference deployment automatically

AmartC force-pushed the add-workload-services branch from f4a38c0 to 20d11f5 Compare November 28, 2022 20:31

dbason approved these changes Nov 28, 2022

View reviewed changes

kralicky approved these changes Nov 29, 2022

View reviewed changes

AmartC merged commit 6cf1d1f into main Nov 29, 2022

AmartC deleted the add-workload-services branch November 30, 2022 02:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add workload services #732

Add workload services #732

AmartC commented Oct 27, 2022

dbason left a comment •

edited

Loading

dbason left a comment

kralicky left a comment

Add workload services #732

Add workload services #732

Conversation

AmartC commented Oct 27, 2022

dbason left a comment • edited Loading

Choose a reason for hiding this comment

dbason left a comment

Choose a reason for hiding this comment

kralicky left a comment

Choose a reason for hiding this comment

dbason left a comment •

edited

Loading