Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified resources for the pelorus-operator #741

Merged
merged 1 commit into from
Dec 7, 2022

Conversation

mpryc
Copy link
Collaborator

@mpryc mpryc commented Dec 6, 2022

Pelorus operator v0.0.3 with updated resources for the pelorus operator manager pod.

  1. Ensure no other pelorus-operator is installed
  2. Run bundle with this PR: operator-sdk run bundle quay.io/migtools/pelorus-operator-bundle:v0.0.3 --namespace pelorus

Signed-off-by: Michal Pryc mpryc@redhat.com

Please include any additional commands or pointers in addition to our standard PR testing process.

@redhat-cop/mdt

Pelorus operator v0.0.3 with updated resources for the
pelorus operator manager pod.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
@mpryc
Copy link
Collaborator Author

mpryc commented Dec 6, 2022

The memory usage in this PR was set based on the pod memory usage metric from the cluster (attached).
Screenshot from 2022-12-06 13-59-33

@mpryc
Copy link
Collaborator Author

mpryc commented Dec 6, 2022

Fixes #731

@mpryc
Copy link
Collaborator Author

mpryc commented Dec 7, 2022

/retest

@mpryc
Copy link
Collaborator Author

mpryc commented Dec 7, 2022

@KevinMGranger

Summarizing changes that makes me comfortable saying the OOMKilled problem is gone:

  • First I thought it's a memory leak as I was increasing standard 128Mb limit to 256Mb and it was still not enough. With 128Mb the pod was being killed every 30-40minutes, bumping to 256Mb caused the pod to be killed every 1-2h. This lead me to the conclusion there is memory leak as the resources were consumed over time.
  • However after bumping the limit to 512Mb we could see (from the graph over 24h) the pod occupies 250-350Mb and never goes beyond 350Mb, meaning we should not see OOMKilled anymore and I have not seen a single restart over 24h period. So with previous 256Mb we were just in the middle of what the pod required, that's why we have seen occasional kills.
  • With the above and reading few write-ups around memory limits I have decided to setup the limit to 512Mb and the same for the requested memory (see example https://home.robusta.dev/blog/kubernetes-memory-limit).
  • This change also regenerated the pelorus-operator files, which as we can see it's using newer version of an: helm-operator version: "v1.25.2" , which could also have impacted that problem.

@mpryc
Copy link
Collaborator Author

mpryc commented Dec 7, 2022

/retest

@mpryc
Copy link
Collaborator Author

mpryc commented Dec 7, 2022

/test 4.8-e2e-openshift

@mpryc
Copy link
Collaborator Author

mpryc commented Dec 7, 2022

Screenshot from 2022-12-07 12-05-58

@mpryc
Copy link
Collaborator Author

mpryc commented Dec 7, 2022

/test 4.8-e2e-openshift

@weshayutin
Copy link
Contributor

(.venv) [whayutin@thinkdoe pelorus]$ oc create namespace pelorus
namespace/pelorus created
(.venv) [whayutin@thinkdoe pelorus]$ operator-sdk run bundle quay.io/migtools/pelorus-operator-bundle:v0.0.3 --namespace pelorus
INFO[0004] Creating a File-Based Catalog of the bundle "quay.io/migtools/pelorus-operator-bundle:v0.0.3" 
INFO[0005] Generated a valid File-Based Catalog         
INFO[0010] Created registry pod: quay-io-migtools-pelorus-operator-bundle-v0-0-3 
INFO[0010] Created CatalogSource: pelorus-operator-catalog 
INFO[0010] OperatorGroup "operator-sdk-og" created      
INFO[0010] Created Subscription: pelorus-operator-v0-0-3-sub 
INFO[0015] Approved InstallPlan install-tnm8v for the Subscription: pelorus-operator-v0-0-3-sub 
INFO[0015] Waiting for ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" to reach 'Succeeded' phase 
INFO[0015]   Waiting for ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" to appear 
INFO[0044]   Found ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" phase: Pending 
INFO[0045]   Found ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" phase: InstallReady 
INFO[0046]   Found ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" phase: Installing 
INFO[0067]   Found ClusterServiceVersion "pelorus/pelorus-operator.v0.0.3" phase: Succeeded 
INFO[0067] OLM has successfully installed "pelorus-operator.v0.0.3" 
(.venv) [whayutin@thinkdoe pelorus]$ oc get all -n pelorus
NAME                                                                  READY   STATUS      RESTARTS   AGE
pod/41584719616ef258fd5f31b49a607a6a1adfe7985dc2fbeef9c4968727zwz2v   0/1     Completed   0          3m6s
pod/grafana-operator-controller-manager-cf4ccb9dc-zs7fc               2/2     Running     0          2m37s
pod/pelorus-operator-controller-manager-68b95db459-d8prd              1/1     Running     0          2m36s
pod/prometheus-operator-7cc47f949d-stc7z                              1/1     Running     0          2m41s
pod/quay-io-migtools-pelorus-operator-bundle-v0-0-3                   1/1     Running     0          3m17s

NAME                                                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/grafana-operator-controller-manager-metrics-service   ClusterIP   172.30.224.120   <none>        8443/TCP   2m43s
service/pelorus-operator-controller-manager-metrics-service   ClusterIP   172.30.247.142   <none>        8443/TCP   2m41s

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/grafana-operator-controller-manager   1/1     1            1           2m38s
deployment.apps/pelorus-operator-controller-manager   1/1     1            1           2m37s
deployment.apps/prometheus-operator                   1/1     1            1           2m44s

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/grafana-operator-controller-manager-cf4ccb9dc    1         1         1       2m38s
replicaset.apps/pelorus-operator-controller-manager-68b95db459   1         1         1       2m37s
replicaset.apps/prometheus-operator-7cc47f949d                   1         1         1       2m43s

NAME                                                                        COMPLETIONS   DURATION   AGE
job.batch/41584719616ef258fd5f31b49a607a6a1adfe7985dc2fbeef9c4968727c2931   1/1           12s        3m7s
(.venv) [whayutin@thinkdoe pelorus]$ 

@openshift-ci
Copy link

openshift-ci bot commented Dec 7, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2022
@weshayutin
Copy link
Contributor

Deployed two exporters by creating an instance of pelorus post operator install w/ the following config.

spec:
  exporters:
    global: {}
    instances:
      - app_name: deploytime-exporter
        exporter_type: deploytime
        extraEnv:
          - name: LOG_LEVEL
            value: DEBUG
          - name: NAMESPACES
            value: 'mongo-persistent,demoapp1,demoapp2,demoapp3'
      - app_name: committime-exporter
        exporter_type: committime
        extraEnv:
          - name: LOG_LEVEL
            value: DEBUG
          - name: NAMESPACES
            value: 'mongo-persistent,demoapp1,demoapp2,demoapp3'

Click Save...

The operator spun up the commit time exporter, and deploytime exporter pod and they are running.

(.venv) [whayutin@thinkdoe pelorus]$ oc get all -n pelorus | grep time
pod/committime-exporter-1-deploy                                      0/1     Completed   0               3m55s
pod/committime-exporter-1-ncmgc                                       1/1     Running     0               3m52s
pod/deploytime-exporter-1-bsjr4                                       1/1     Running     0               3m52s
pod/deploytime-exporter-1-deploy                                      0/1     Completed   0               3m55s
replicationcontroller/committime-exporter-1   1         1         1       3m55s
replicationcontroller/deploytime-exporter-1   1         1         1       3m55s
service/committime-exporter                                   ClusterIP   172.30.19.85     <none>        8080/TCP            3m56s
service/deploytime-exporter                                   ClusterIP   172.30.14.45     <none>        8080/TCP            3m56s
deploymentconfig.apps.openshift.io/committime-exporter   1          1         1         config,image(committime-exporter:stable)
deploymentconfig.apps.openshift.io/deploytime-exporter   1          1         1         config,image(deploytime-exporter:stable)
imagestream.image.openshift.io/committime-exporter   image-registry.openshift-image-registry.svc:5000/pelorus/committime-exporter   stable   3 minutes ago
imagestream.image.openshift.io/deploytime-exporter   image-registry.openshift-image-registry.svc:5000/pelorus/deploytime-exporter   stable   3 minutes ago
route.route.openshift.io/committime-exporter   committime-exporter-pelorus.apps.cluster-wdh11092022b.wdh11092022b.mg.dog8code.com          committime-exporter   http                                 None
route.route.openshift.io/deploytime-exporter   deploytime-exporter-pelorus.apps.cluster-wdh11092022b.wdh11092022b.mg.dog8code.com          deploytime-exporter   http                                 None

Deploytime has metrics \0/

# TYPE deploy_timestamp gauge
deploy_timestamp{app="todolist",image_sha="sha256:04559c4ebdde446ade9a67cd58124fdcb9e28f2ffc7fac923e0f3845c9f3d54a",namespace="mongo-persistent"} 1.668809883e+09 1668809883000
# HELP deployment_active Active deployments in cluster
# TYPE deployment_active gauge
deployment_active{app="todolist",image_sha="sha256:04559c4ebdde446ade9a67cd58124fdcb9e28f2ffc7fac923e0f3845c9f3d54a",namespace="mongo-persistent"} 1.668809883e+09

@weshayutin weshayutin merged commit bb9ec1b into dora-metrics:master Dec 7, 2022
@weshayutin
Copy link
Contributor

Screenshot from 2022-12-07 14-57-11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants