-
Notifications
You must be signed in to change notification settings - Fork 54
Opni AIOps Gateway
Amartya Chakraborty edited this page Jan 25, 2023
·
4 revisions
Implements gateway endpoints for the Opni admin dashboard UI.
- Go
- Train a new Deep Learning model or update watchlist parameters if workloads are simply removed from watchlist and no new workloads were added.
- Update model training progress to give information such as estimated remaining time of training, percentage completed and total time elapsed during training.
- Update breakdown of logs by workload within each cluster and namespace.
- Get breakdown of logs by workload within each cluster and namespace.
- Get status of whether a trained model currently exists within cluster.
- Get the workloads watchlist of the last training job submitted by the user.
- Get information on GPU(s) within cluster
Component | Type | Description |
---|---|---|
Opni Admin Dashboard | User Interface | The Opni admin dashboard will interact with the gateway plugin. |
modelTrainingStatus | Nats Jetstream key-val storage | This keep tracks of the progress of the training of a Deep Learning model. |
aggregation | Nats Jetstream key-val storage | This keeps track of the count of log messages by cluster, namespace and deployment. |
Component | Type | Description |
---|---|---|
train_model | Nats request/reply subject | When the admin dashboard sends in the workload parmaters for which the user would like a trained model for, the gateway plugin will submit a training job to the train_model Nats subject. |
model_status | Nats request/reply subject | When the admin dashboard is getting the status of the model, it will send a request to the model_status Nats subject. |
workload_parameters | Nats request/reply subject | When the admin dashboard is getting the last set of workloads for which a Deep Learning model was trained for, it will send a request to the workload_parameters Nats subject. |
aggregation | Nats Jetstream key-val storage | Update the aggregation kv store within Nats jetstream every 30 seconds by getting the updated count of log messages by cluster, namespace and deployment. |
- Relies heavily on Nats request/reply structure.
- Unit tests
- Integration tests
- e2e tests
- Manual testing
Architecture
- Backends
- Core Components