Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalise work done for Seldon Metrics Discovery during Obeservability Workshop #68

Open
i-chvets opened this issue Dec 7, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@i-chvets
Copy link
Contributor

i-chvets commented Dec 7, 2022

Finalise work done for Seldon Metrics Discovery during Obeservability Workshop

Work items are tracked in https://warthogs.atlassian.net/browse/KF-829
Branch: https://github.com/canonical/seldon-core-operator/tree/kf-829-gh68-feat-metrics-discovery
Prometheus deployment https://github.com/canonical/prometheus-k8s-operator

Design

Failure alerts are implemented through integration with Prometheus Charm from Canonical Observability Stack. For metrics provided by models targets can change from model to model and from deployment to deployment. Metrics Endpoint Observer provided by COS is integrated. Updates to targets are handled by Mertics Endpoint Observer and relayed to Prometheus by Seldon Core Operator Charm.

Testing

  • Setup MicroK8S cluster and Juju controller:
microk8s enable dns storage metallb:"10.64.140.43-10.64.140.49,192.168.0.105-192.168.0.111"
juju bootstrap microk8s uk8s
juju add-model test
  • Deploy Istio, Prometheus and Seldon Core Operator and relate them.
juju deploy prometheus-k8s --trust
juju deploy ./seldon-core_ubuntu-20.04-amd64.charm seldon-controller-manager --trust --resource oci-image="docker.io/seldonio/seldon-core-operator:1.14.0"
juju relate prometheus-k8s seldon-controller-manager

Resulting deployment should like this:

Model  Controller  Cloud/Region        Version  SLA          Timestamp
test   uk8s        microk8s/localhost  2.9.34   unsupported  15:28:26-05:00

App                        Version  Status  Scale  Charm           Channel      Rev  Address         Exposed  Message
prometheus-k8s             2.33.5   active      1  prometheus-k8s  stable        79  10.152.183.239  no       
seldon-controller-manager           active      1  seldon-core                    0  10.152.183.182  no       

Unit                          Workload  Agent  Address     Ports  Message
prometheus-k8s/0*             active    idle   10.1.59.80         
seldon-controller-manager/0*  active    idle   10.1.59.79         

Deploy model with custom metrics:

microk8s.kubectl -n test apply -f examples/echo-metrics-v1.yaml

Get IP address of model classifier and use it for prediction request:

microk8s.kubectl -n test get svc | grep echo-metrics-default-classifier
echo-metrics-default-classifier       ClusterIP      10.152.183.34    <none>         9000/TCP,9500/TCP                       25m

Request prediction using IP address of model classifier:

for i in `seq 1 10`; do sleep 0.1 && \
   curl -v -s -H "Content-Type: application/json"    \
  -d '{"data": {"ndarray":[[1.0, 2.0, 5.0]]}}'    \
  http://<echo-metrics-default-classifier-IP>:9000/predict > /dev/null ; \
done

Metrics are available at pod's IP address and Prometheus port:

 microk8s.kubectl -n test describe pod  echo-metrics-default-0-classifier-5bf6cf86cd-r7c8l
 IP:           10.1.59.82
      PREDICTIVE_UNIT_METRICS_SERVICE_PORT:  6000
      PREDICTIVE_UNIT_METRICS_ENDPOINT:      /prometheus
curl http://10.1.59.82:6000/prometheus
  • Navigate to Prometheus dashboard https://<Prometheus-unit-IP>:9090, select Status->Targets
i-chvets pushed a commit that referenced this issue Dec 7, 2022
#68

Summary of changes:
- Merged metrics-discovery branch created at Observability Workshop.
- Modified code to integrate Metrics Endpoints Observer.
- Updated libs.

Merge remote-tracking branch 'origin/metrics-discovery' into feat-metrics-discovery
@misohu misohu added the enhancement New feature or request label Dec 8, 2022
i-chvets pushed a commit that referenced this issue Dec 9, 2022
#68

Summary of changes:
- Initial integration test.
- Disable some of integration tests to speed up development.
@i-chvets
Copy link
Contributor Author

Currently blocked by COS: require some changes to Prometheus.

@i-chvets i-chvets self-assigned this Jan 13, 2023
@i-chvets
Copy link
Contributor Author

Dec 9, 2022 debug session notes:
I am hitting juju tools issue: juju-run is not installed in my pod, logs from /var/log/discovery.log:

FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/juju/tools/unit-seldon-controller-manager-0/juju-run'

I guess, my observer is not even starting because of this.
And this is on the pod:

# ls -la /var/lib/juju/tools/unit-seldon-controller-manager-0/juju-run
ls: cannot access '/var/lib/juju/tools/unit-seldon-controller-manager-0/juju-run': No such file or directory

It does live on pod though, but in different place:

# ls -la /usr/bin/juju-run 
lrwxrwxrwx 1 root root 25 Dec 13 19:44 /usr/bin/juju-run -> /charm/bin/containeragent

@DnPlas
Copy link
Contributor

DnPlas commented Feb 9, 2023

@i-chvets I believe this is done, please confirm.

@i-chvets
Copy link
Contributor Author

Merged #94

@i-chvets
Copy link
Contributor Author

i-chvets commented Feb 17, 2023

Testing work is still required. Not completed yet.
By following steps in description, no metrics could be retrieved. Needs more debugging.
Currently, the above setup fills in 32GB of RAM and swap space and the whole setup becomes unresponsive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants