Skip to content

Commit

Permalink
Training gateway plugin (#718)
Browse files Browse the repository at this point in the history
* Introduce gateway plugin for model training

* Update protobuf definitions for WorkloadResponse

* Update proto files to reflect workload log count

* Update model_training proto file as well as api, plugin and system files

* Update modifications to api.go, plugin.go and system.go files

* Update plugin code

* Update grpc proto file for model training

* Update plugin for model training to address bug

* Update plugin

* Update API for training a model

* Update proto files for training controller gateway

* Update proto files for training controller gateway

* Add endpoint for checking GPU

* Update model training api to search for GPU

* Add opensearch aggregation routine as part of training controller gateway plugin

* Update model training plugin to include service for aggregating data from Opensearch

* Rename directory from modelTraining camel case to modeltraining

* Update model training directory names

* Fix up modeltraining plugin imports

* Update main.go file to reflect latest naming style

* Undo changes to opensearch image in repository

* Update plugin to fetch Opensearch endpoint and credentials

* Update code to make sure Opensearch credentials are properly being fetched

* Remove logging debug statements

* Update code to address PR comments

* Remove unused structs

* Update code to follow gofmt

* Update proto to rename GPUInfo list to items

* Update proto files with PR comments

* Update aggregation service to use context from ModelTrainingPlugin

* Update proto file definitions to be more distinct for each endpoint

* Add error handling for api endpoints and update proto definitions

* Update plugin file to check if os-workload-aggregation already exists in Jetstream

* Update proto definitions to be more concise

* Update struct field formatting to make it consistent

* Update composite aggregation size to be from 4 records per scroll to 1000 records per scroll
  • Loading branch information
AmartC authored Nov 14, 2022
1 parent 39de5a5 commit 4ab3c11
Show file tree
Hide file tree
Showing 10 changed files with 2,073 additions and 1 deletion.
2 changes: 1 addition & 1 deletion pkg/resources/gateway/rbac.go
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ func (r *Reconciler) rbac() ([]resources.Resource, error) {
},
{
APIGroups: []string{""},
Resources: []string{"endpoints"},
Resources: []string{"endpoints", "nodes"},
Verbs: []string{
"get",
"list",
Expand Down
21 changes: 21 additions & 0 deletions plugins/modeltraining/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
package main

import (
"context"
"time"

"github.com/gin-gonic/gin"
"github.com/rancher/opni/pkg/plugins"
"github.com/rancher/opni/pkg/tracing"
"github.com/rancher/opni/pkg/util/waitctx"
modeltraining "github.com/rancher/opni/plugins/modeltraining/pkg/modeltraining"
)

func main() {
tracing.Configure("plugin_modeltraining")
gin.SetMode(gin.ReleaseMode)
ctx, ca := context.WithCancel(waitctx.Background())
plugins.Serve(modeltraining.Scheme(ctx))
ca()
waitctx.Wait(ctx, 5*time.Second)
}
Loading

0 comments on commit 4ab3c11

Please sign in to comment.