Modified resources for the pelorus-operator #741

mpryc · 2022-12-06T12:57:03Z

Pelorus operator v0.0.3 with updated resources for the pelorus operator manager pod.

Ensure no other pelorus-operator is installed
Run bundle with this PR: operator-sdk run bundle quay.io/migtools/pelorus-operator-bundle:v0.0.3 --namespace pelorus

Signed-off-by: Michal Pryc mpryc@redhat.com

Please include any additional commands or pointers in addition to our standard PR testing process.

@redhat-cop/mdt

Pelorus operator v0.0.3 with updated resources for the pelorus operator manager pod. Signed-off-by: Michal Pryc <mpryc@redhat.com>

mpryc · 2022-12-06T13:00:36Z

The memory usage in this PR was set based on the pod memory usage metric from the cluster (attached).

mpryc · 2022-12-06T14:03:07Z

Fixes #731

mpryc · 2022-12-07T09:54:56Z

/retest

mpryc · 2022-12-07T11:14:06Z

@KevinMGranger

Summarizing changes that makes me comfortable saying the OOMKilled problem is gone:

First I thought it's a memory leak as I was increasing standard 128Mb limit to 256Mb and it was still not enough. With 128Mb the pod was being killed every 30-40minutes, bumping to 256Mb caused the pod to be killed every 1-2h. This lead me to the conclusion there is memory leak as the resources were consumed over time.
However after bumping the limit to 512Mb we could see (from the graph over 24h) the pod occupies 250-350Mb and never goes beyond 350Mb, meaning we should not see OOMKilled anymore and I have not seen a single restart over 24h period. So with previous 256Mb we were just in the middle of what the pod required, that's why we have seen occasional kills.
With the above and reading few write-ups around memory limits I have decided to setup the limit to 512Mb and the same for the requested memory (see example https://home.robusta.dev/blog/kubernetes-memory-limit).
This change also regenerated the pelorus-operator files, which as we can see it's using newer version of an: helm-operator version: "v1.25.2" , which could also have impacted that problem.

mpryc · 2022-12-07T11:27:10Z

/retest

mpryc · 2022-12-07T12:35:47Z

/test 4.8-e2e-openshift

mpryc · 2022-12-07T13:29:57Z

mpryc · 2022-12-07T13:39:09Z

/test 4.8-e2e-openshift

weshayutin · 2022-12-07T18:52:01Z

(.venv) [whayutin@thinkdoe pelorus]$ oc create namespace pelorus
namespace/pelorus created
(.venv) [whayutin@thinkdoe pelorus]$ operator-sdk run bundle quay.io/migtools/pelorus-operator-bundle:v0.0.3 --namespace pelorus
INFO[0004] Creating a File-Based Catalog of the bundle "quay.io/migtools/pelorus-operator-bundle:v0.0.3" 
INFO[0005] Generated a valid File-Based Catalog         
INFO[0010] Created registry pod: quay-io-migtools-pelorus-operator-bundle-v0-0-3 
INFO[0010] Created CatalogSource: pelorus-operator-catalog 
INFO[0010] OperatorGroup "operator-sdk-og" created      
INFO[0010] Created Subscription: pelorus-operator-v0-0-3-sub 
INFO[0015] Approved InstallPlan install-tnm8v for the Subscription: pelorus-operator-v0-0-3-sub 
INFO[0015] Waiting for ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" to reach 'Succeeded' phase 
INFO[0015]   Waiting for ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" to appear 
INFO[0044]   Found ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" phase: Pending 
INFO[0045]   Found ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" phase: InstallReady 
INFO[0046]   Found ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" phase: Installing 
INFO[0067]   Found ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" phase: Succeeded 
INFO[0067] OLM has successfully installed "pelorus-operator.v0.0.3" 
(.venv) [whayutin@thinkdoe pelorus]$ oc get all -n pelorus
NAME                                                                  READY   STATUS      RESTARTS   AGE
pod/41584719616ef258fd5f31b49a607a6a1adfe7985dc2fbeef9c4968727zwz2v   0/1     Completed   0          3m6s
pod/grafana-operator-controller-manager-cf4ccb9dc-zs7fc               2/2     Running     0          2m37s
pod/pelorus-operator-controller-manager-68b95db459-d8prd              1/1     Running     0          2m36s
pod/prometheus-operator-7cc47f949d-stc7z                              1/1     Running     0          2m41s
pod/quay-io-migtools-pelorus-operator-bundle-v0-0-3                   1/1     Running     0          3m17s

NAME                                                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/grafana-operator-controller-manager-metrics-service   ClusterIP   172.30.224.120   <none>        8443/TCP   2m43s
service/pelorus-operator-controller-manager-metrics-service   ClusterIP   172.30.247.142   <none>        8443/TCP   2m41s

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/grafana-operator-controller-manager   1/1     1            1           2m38s
deployment.apps/pelorus-operator-controller-manager   1/1     1            1           2m37s
deployment.apps/prometheus-operator                   1/1     1            1           2m44s

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/grafana-operator-controller-manager-cf4ccb9dc    1         1         1       2m38s
replicaset.apps/pelorus-operator-controller-manager-68b95db459   1         1         1       2m37s
replicaset.apps/prometheus-operator-7cc47f949d                   1         1         1       2m43s

NAME                                                                        COMPLETIONS   DURATION   AGE
job.batch/41584719616ef258fd5f31b49a607a6a1adfe7985dc2fbeef9c4968727c2931   1/1           12s        3m7s
(.venv) [whayutin@thinkdoe pelorus]$

openshift-ci · 2022-12-07T18:52:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [weshayutin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

weshayutin · 2022-12-07T21:54:56Z

Deployed two exporters by creating an instance of pelorus post operator install w/ the following config.

spec:
  exporters:
    global: {}
    instances:
      - app_name: deploytime-exporter
        exporter_type: deploytime
        extraEnv:
          - name: LOG_LEVEL
            value: DEBUG
          - name: NAMESPACES
            value: 'mongo-persistent,demoapp1,demoapp2,demoapp3'
      - app_name: committime-exporter
        exporter_type: committime
        extraEnv:
          - name: LOG_LEVEL
            value: DEBUG
          - name: NAMESPACES
            value: 'mongo-persistent,demoapp1,demoapp2,demoapp3'

Click Save...

The operator spun up the commit time exporter, and deploytime exporter pod and they are running.

(.venv) [whayutin@thinkdoe pelorus]$ oc get all -n pelorus | grep time
pod/committime-exporter-1-deploy                                      0/1     Completed   0               3m55s
pod/committime-exporter-1-ncmgc                                       1/1     Running     0               3m52s
pod/deploytime-exporter-1-bsjr4                                       1/1     Running     0               3m52s
pod/deploytime-exporter-1-deploy                                      0/1     Completed   0               3m55s
replicationcontroller/committime-exporter-1   1         1         1       3m55s
replicationcontroller/deploytime-exporter-1   1         1         1       3m55s
service/committime-exporter                                   ClusterIP   172.30.19.85     <none>        8080/TCP            3m56s
service/deploytime-exporter                                   ClusterIP   172.30.14.45     <none>        8080/TCP            3m56s
deploymentconfig.apps.openshift.io/committime-exporter   1          1         1         config,image(committime-exporter:stable)
deploymentconfig.apps.openshift.io/deploytime-exporter   1          1         1         config,image(deploytime-exporter:stable)
imagestream.image.openshift.io/committime-exporter   image-registry.openshift-image-registry.svc:5000/pelorus/committime-exporter   stable   3 minutes ago
imagestream.image.openshift.io/deploytime-exporter   image-registry.openshift-image-registry.svc:5000/pelorus/deploytime-exporter   stable   3 minutes ago
route.route.openshift.io/committime-exporter   committime-exporter-pelorus.apps.cluster-wdh11092022b.wdh11092022b.mg.dog8code.com          committime-exporter   http                                 None
route.route.openshift.io/deploytime-exporter   deploytime-exporter-pelorus.apps.cluster-wdh11092022b.wdh11092022b.mg.dog8code.com          deploytime-exporter   http                                 None

Deploytime has metrics \0/

# TYPE deploy_timestamp gauge
deploy_timestamp{app="todolist",image_sha="sha256:04559c4ebdde446ade9a67cd58124fdcb9e28f2ffc7fac923e0f3845c9f3d54a",namespace="mongo-persistent"} 1.668809883e+09 1668809883000
# HELP deployment_active Active deployments in cluster
# TYPE deployment_active gauge
deployment_active{app="todolist",image_sha="sha256:04559c4ebdde446ade9a67cd58124fdcb9e28f2ffc7fac923e0f3845c9f3d54a",namespace="mongo-persistent"} 1.668809883e+09

weshayutin · 2022-12-07T21:57:38Z

Modified resources for the pelorus-operator

8141b33

Pelorus operator v0.0.3 with updated resources for the pelorus operator manager pod. Signed-off-by: Michal Pryc <mpryc@redhat.com>

openshift-ci bot added the dco-signoff: yes label Dec 6, 2022

openshift-ci bot requested review from KevinMGranger and weshayutin December 6, 2022 12:57

weshayutin approved these changes Dec 7, 2022

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2022

weshayutin merged commit bb9ec1b into dora-metrics:master Dec 7, 2022

mpryc mentioned this pull request Dec 8, 2022

pelorus-operator-controller-manager-* pod is going into CrashLoopBackOff / oomkilled #731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modified resources for the pelorus-operator #741

Modified resources for the pelorus-operator #741

mpryc commented Dec 6, 2022 •

edited

Loading

mpryc commented Dec 6, 2022

mpryc commented Dec 6, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

weshayutin commented Dec 7, 2022

openshift-ci bot commented Dec 7, 2022

weshayutin commented Dec 7, 2022

weshayutin commented Dec 7, 2022

Modified resources for the pelorus-operator #741

Modified resources for the pelorus-operator #741

Conversation

mpryc commented Dec 6, 2022 • edited Loading

mpryc commented Dec 6, 2022

mpryc commented Dec 6, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

mpryc commented Dec 7, 2022

weshayutin commented Dec 7, 2022

openshift-ci bot commented Dec 7, 2022

weshayutin commented Dec 7, 2022

weshayutin commented Dec 7, 2022

mpryc commented Dec 6, 2022 •

edited

Loading