quick create-delete leaves orphaned objects #551

sdudoladov · 2019-04-29T12:28:09Z

Orphaned pods/endpoints are left in the cluster if the ADD event is followed by DELETE within a short time period for the same cluster.

This error manifests as

time="2019-04-29T12:17:52Z" level=info msg="\"ADD\" event has been queued" cluster-name=default/acid-minimal-cluster pkg=controller worker=0
...
time="2019-04-29T12:17:53Z" level=info msg="waiting for the cluster being ready" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2019-04-29T12:17:56Z" level=debug msg="Waiting for 2 pods to become ready" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2019-04-29T12:18:00Z" level=info msg="\"DELETE\" event has been queued" cluster-name=default/acid-minimal-cluster pkg=controller worker=0
...
time="2019-04-29T12:18:29Z" level=info msg="statefulset \"default/acid-minimal-cluster\" has been deleted" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2019-04-29T12:18:29Z" level=debug msg="deleting pods" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2019-04-29T12:18:29Z" level=debug msg="no pods to delete" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
...
time="2019-04-29T12:18:29Z" level=debug msg="deleting PVCs" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2019-04-29T12:18:29Z" level=debug msg="no PVCs to delete" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
...
time="2019-04-29T12:18:32Z" level=debug msg="removing leftover Patroni objects (endpoints or configmaps)" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2019-04-29T12:18:32Z" level=warning msg="could not remove leftover patroni objects; could not fetch Patroni Endpoint \"/\": an empty namespace may not be set when a resource name is provided" cluster-name=default/acid-minimal-cluster pkg=cluster worker=0
time="2019-04-29T12:18:32Z" level=info msg="cluster has been deleted" cluster-name=default/acid-minimal-cluster pkg=controller worker=0

reproducible both with kind and actual k8s

this issue also prevents creating a new cluster with the same name afterwards

The text was updated successfully, but these errors were encountered:

FxKu · 2019-05-23T14:00:41Z

Aside from finalizers #450, ownerReference #498 might also help.

Jan-M · 2019-05-23T14:09:23Z

Both options need to be investigated with care, the only real delete we care about is the delete of the "postgresql" object. Other objects can be deleted (e.g. the statefulset) and no interruption or impact is expected.

davisford · 2019-08-02T14:48:24Z

Both options need to be investigated with care, the only real delete we care about is the delete of the "postgresql" object. Other objects can be deleted (e.g. the statefulset) and no interruption or impact is expected.

@Jan-M this is what I'm seeing. I have Terraform scripts that build and tear down the whole cluster, but the secondary read replica we spawn never gets removed even though the operator and the postgresql object is removed.

$ kc get postgresqls.acid.zalan.do -A
No resources found.

$ kc get pods -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
default       foo-cluster-1                   1/1     Running   0          19h
kube-system   coredns-5c98db65d4-fm6c5           1/1     Running   0          19h
kube-system   coredns-5c98db65d4-lptdx           1/1     Running   0          19h
kube-system   etcd-minikube                      1/1     Running   0          19h
kube-system   kube-addon-manager-minikube        1/1     Running   0          19h
kube-system   kube-apiserver-minikube            1/1     Running   0          19h
kube-system   kube-controller-manager-minikube   1/1     Running   0          19h
kube-system   kube-proxy-xdg4c                   1/1     Running   0          19h
kube-system   kube-scheduler-minikube            1/1     Running   0          19h
kube-system   storage-provisioner                1/1     Running   0          19h

The foo-cluster-1 is left behind.... it is a PG db node that was part of the original cluster.

Other resources are also left behind, secrets, services, endpoints, persistent volumes + claims.

What is the best way to handle this?

FYI, the logs for that orphaned pg cluster pod, just spewing this error over and over from Patroni:

2019-08-02 14:40:12,125 ERROR: watch
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 428, in watch
    _request_timeout=(1, timeout + 1)):
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/watch/watch.py", line 115, in stream
    resp = func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 50, in wrapper
    return getattr(self._api, func)(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 12528, in list_namespaced_endpoints
    (data) = self.list_namespaced_endpoints_with_http_info(namespace, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 12630, in list_namespaced_endpoints_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 335, in call_api
    _preload_content, _request_timeout)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 148, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 371, in request
    headers=headers)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 250, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 240, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Fri, 02 Aug 2019 14:40:12 GMT', 'Content-Length': '129'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}\n

It looks like the secret it had been using was deleted for the service account which causes it to just go into a continuous failed state:

Events:
  Type     Reason       Age                 From               Message
  ----     ------       ----                ----               -------
  Warning  FailedMount  65s (x20 over 25m)  kubelet, minikube  MountVolume.SetUp failed for volume "postgres-operator-token-5v5x2" : secret "postgres-operator-token-5v5x2" not found

EDIT -- I understand why the pv/pvc aren't deleted. That's undesirable for a StatefulSet. Also noted reading the k8s docs on StatefulSet:

StatefulSets do not provide any guarantees on the termination of pods when a StatefulSet is deleted. To achieve ordered and graceful termination of the pods in the StatefulSet, it is possible to scale the StatefulSet down to 0 prior to deletion.

This may be why the -1 cluster pod node is orphaned? Might it be possible for the operator to thus scale the set down to 0 prior to delete?

davisford · 2019-08-03T14:28:42Z

@Jan-M looking at the code, it appears that it is just attempting to delete the StatefulSet as opposed to the recommended approach of scaling it down to zero prior to deletion.

Jan-M · 2019-08-05T10:23:41Z

But it is followed up by deletePods if I see this correctly.

But you are right, we will look into the right place of the docs on how to delete a statefulset. I did not know that the delete maybe does not delete pod.

Maybe got mixed up here, where it is mentioned that kubectl scales down too.
https://kubernetes.io/docs/tasks/run-application/delete-stateful-set/

@sdudoladov

sdudoladov added the bug label Apr 29, 2019

FxKu mentioned this issue May 20, 2019

use finalizers avoid losing deletions #450

Closed

sdudoladov mentioned this issue Jun 17, 2019

Implement the kubectl plugin for the Postgres CustomResourceDefinition #519

Closed

FxKu mentioned this issue Aug 21, 2019

Avoid orphaned objects on delete #654

Merged

sdudoladov closed this as completed in #654 Aug 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quick create-delete leaves orphaned objects #551

quick create-delete leaves orphaned objects #551

sdudoladov commented Apr 29, 2019 •

edited

Loading

FxKu commented May 23, 2019

Jan-M commented May 23, 2019

davisford commented Aug 2, 2019 •

edited

Loading

davisford commented Aug 3, 2019

Jan-M commented Aug 5, 2019

quick create-delete leaves orphaned objects #551

quick create-delete leaves orphaned objects #551

Comments

sdudoladov commented Apr 29, 2019 • edited Loading

FxKu commented May 23, 2019

Jan-M commented May 23, 2019

davisford commented Aug 2, 2019 • edited Loading

davisford commented Aug 3, 2019

Jan-M commented Aug 5, 2019

sdudoladov commented Apr 29, 2019 •

edited

Loading

davisford commented Aug 2, 2019 •

edited

Loading