Boskos server fail to claim pv after restart/reschedule from the node #4571

krzyzacy · 2017-09-14T20:09:43Z

/area boskos
/assign

Events:
  FirstSeen	LastSeen	Count	From						SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  26m		26m		1	default-scheduler						Normal		Scheduled	Successfully assigned boskos-3859550540-jlgcx to gke-prow-default-pool-42819f20-28z7
  24m		23s		12	kubelet, gke-prow-default-pool-42819f20-28z7			Warning		FailedMount	Unable to mount volumes for pod "boskos-3859550540-jlgcx_test-pods(a9392393-9983-11e7-ad2e-42010a8000c4)": timeout expired waiting for volumes to attach/mount for pod "test-pods"/"boskos-3859550540-jlgcx". list of unattached/unmounted volumes=[boskos-volume]
  24m		23s		12	kubelet, gke-prow-default-pool-42819f20-28z7			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "test-pods"/"boskos-3859550540-jlgcx". list of unattached/unmounted volumes=[boskos-volume]

redeploy fixes it, but still want to fix the actual issue.

The text was updated successfully, but these errors were encountered:

0xmichalis · 2017-09-14T20:13:11Z

The event is cryptic. @kubernetes/sig-storage-misc how can we debug this kind of issues?

msau42 · 2017-09-14T20:19:27Z

You need to look for the volume in kubelet.log

krzyzacy · 2017-09-14T21:14:40Z

(hummm seems it just happened again cc @jingxu97 (was investigating offline try to find some clue))

krzyzacy · 2017-09-15T15:15:20Z

@msau42 I think it's something else - this time it took 6h and still not get ready - let's get the statefulset conversion asap

jingxu97 · 2017-09-15T18:30:47Z

From the the log, I see the following issue

Replica_set created a new pod boskos-3859550540-j5cgh

I0914 21:10:07.400347       6 replica_set.go:455] Too few "test-pods"/"boskos-3859550540" replicas, need 1, creating 1
I0914 21:10:07.405480       6 deployment_controller.go:485] Error syncing deployment test-pods/boskos: Operation cannot be fulfilled on deployments.extensions "boskos": the object has been modified; please apply your changes to the latest version and try again
I0914 21:10:07.410757       6 event.go:217] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"test-pods", Name:"boskos-3859550540", UID:"15d8dd6f-9991-11e7-ad2e-42010a8000c4", APIVersion:"extensions", ResourceVersion:"56666044", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: boskos-3859550540-j5cgh
I0914 21:10:07.439993       6 deployment_controller.go:485] Error syncing deployment test-pods/boskos: Operation cannot be fulfilled on deployments.extensions "boskos": the object has been modified; please apply your changes to the latest version and try again

Volume is attached to node gke-prow-default-pool-42819f20-pbpz

I0914 21:10:37.339888       6 reconciler.go:208] Started AttachVolume for volume "kubernetes.io/gce-pd/gke-prow-f9e60086-dyna-pvc-371f91ee-6754-11e7-9a57-42010a800166" to node "gke-prow-default-pool-42819f20-pbpz"
I0914 21:11:08.206682       6 operation_generator.go:290] AttachVolume.Attach succeeded for volume "kubernetes.io/gce-pd/gke-prow-f9e60086-dyna-pvc-371f91ee-6754-11e7-9a57-42010a800166" (spec.Name: "pvc-371f91ee-6754-11e7-9a57-42010a800166") from node "gke-prow-default-pool-42819f20-pbpz".

Some time later, another pod for the replica_set is created, but the previous is never gets deleted.

I0914 22:11:06.827792       6 replica_set.go:455] Too few "test-pods"/"boskos-3859550540" replicas, need 1, creating 1
I0914 22:11:06.835592       6 event.go:217] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"test-pods", Name:"boskos-3859550540", UID:"15d8dd6f-9991-11e7-ad2e-42010a8000c4", APIVersion:"extensions", ResourceVersion:"56666631", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: boskos-3859550540-gkkvd

Attach volume failed because it was still attached by the first pod

I0914 22:11:06.917126       6 reconciler.go:208] Started AttachVolume for volume "kubernetes.io/gce-pd/gke-prow-f9e60086-dyna-pvc-371f91ee-6754-11e7-9a57-42010a800166" to node "gke-prow-default-pool-42819f20-96n1"
E0914 22:11:10.793582       6 gce.go:505] GCE operation failed: googleapi: Error 400: The disk resource 'projects/k8s-prow-builds/zones/us-central1-f/disks/gke-prow-f9e60086-dyna-pvc-371f91ee-6754-11e7-9a57-42010a800166' is already being used by 'projects/k8s-prow-builds/zones/us-central1-f/instances/gke-prow-default-pool-42819f20-pbpz'

Between the first pod and second pod, there are lots of nodes failures error message, not sure what happened to the tests and whether this is normal.
Also it seems like replica_set didn't realize it already has a pod running, and started a second one at some point.

krzyzacy · 2017-09-15T20:23:00Z

@foxish ^^ more findings from @jingxu97 , any thoughts, can it be a potential replicaset problem? Or that's expected when node is under disk pressure?

For this issue now we are living in the wonderful world of statefulset already, thanks @msau42 !

foxish · 2017-09-15T20:46:13Z

Hmm, my guess for why this happened is - we changed the node controller to not automatically cleanup pods on unhealthy nodes since 1.5. As part of that, cleanup will happen only when the nodes are deleted from the API. So, when the node goes unready/fails, the pods go into an "unknown" state and stay stuck with the mounted volume. The controller knows to create replacements but they cannot mount the volume. The fix would be - detecting an unhealthy node and simply deleting it from the API (which GKE should do automagically).

saad-ali · 2017-09-15T20:56:30Z

@foxish that makes sense, the detach code does not kick in unless the pod is in failed or succeeded states or deleted. So if it goes into unknown state the volume will remain attached (the safe behavior).

The fix would be - detecting an unhealthy node and simply deleting it from the API (which GKE should do automagically).

Why didn't that happen in @krzyzacy's case?

jingxu97 · 2017-09-15T20:56:58Z

For replica_set controller, it will not delete the unknown state pod before creating a new pod? How the controller counts the number of pods in the replica_set, it does not count the unknown status pod?

krzyzacy · 2017-09-15T20:58:32Z

we have gke node-auto-repair on, and that should auto fixing any unhealthy nodes?

foxish · 2017-09-15T21:09:31Z

For replica_set controller, it will not delete the unknown state pod before creating a new pod? How the controller counts the number of pods in the replica_set, it does not count the unknown status pod?

Yeah, the pod will get the deletionTimeStamp set and all the workload controllers know to ignore such "to-be-deleted" pods when accounting for running pods.
Does the attach-detach controller react to that?

jingxu97 · 2017-09-15T21:32:19Z

We don't detach volume from "to-be-deleted" pod. Only after pod is deleted from API server, controller will detach.

jingxu97 · 2017-09-15T21:36:30Z

@saad-ali Based on @foxish description, I think there is a mismatch between replication_set controller and attach_detach controller.

If a pod is in "unknown" state, replication_set controller will not count this as running pod and start a new one. But attach_detach_controller will not detach until the "unkown" pod is deleted from api server. If no garbage collection happen for this "unknown" pod for a long time, the new pod will not ever get the volume attached to the new node.

But I assume after setting "deletionTimeStamp", the pod should be deleted soon from the API server. Anything missed here?

cc @yujuhong

saad-ali · 2017-09-15T21:58:23Z

Attach/detach controller does react to pod deletion events: if the pod's deletionTimeStamp is set and containers are terminated (see https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/volumehelper/volumehelper.go#L103) the attach/detach controller will trigger the detach operation.

However it does not trigger detach if the pod state is "Unknown" because we don't want to detach and potentially corrupt user data, if pod state is unknown. We depend on some outside entity (node repair tool?) stepping in and deleting these pods. Question is if @krzyzacy has node-auto-repair on why didn't it delete the pods?

jingxu97 · 2017-09-15T22:31:42Z

I checked the log again. I think there are a number of cases that the node is not marked as not ready. No record of the pod status. But somehow replica_set controller starts a new pod because the count of running pod is no longer 1. But the new pod cannot attach the volume because it is still attached to different node. Eventually the old pod is deleted, but that's because @krzyzacy manually delete the deployment.

0xmichalis · 2017-09-15T22:39:30Z

Yeah, PVs and ReplicaSets/Deployments don't play well. @krzyzacy can we close this issue now that boskos moved to a StatefulSet?

krzyzacy · 2017-09-15T22:40:55Z

yeah I'll close the issue, but we can keep the discussion going :-)

jingxu97 · 2017-09-15T22:45:33Z

@Kargakis But it seems like for some reason replica_set controller tries to create a second pod even the node status does not change to "NotReady". I am not sure the pod status. Might need to check the kubelet log.

0xmichalis · 2017-09-15T22:47:36Z

The ReplicaSet controller will create a replacement as soon as the previous pod is marked for deletion. RC/RS/D are optimized for availability.

jingxu97 · 2017-09-15T23:40:37Z

I got the more information from kubelet log.

The events happened as follows

Replica_set create pod1, attach volume to node 1
Node 1 is under pressure, eviction manager evict pod1, but there is not event for pod delete from API server. So attach_detach_controller does not know the pod is evicted.
Replica_set create a second pod2, but attach volume failed

the following log from kukbelet during eviction, note this is log is in reversed order

Sep 15 16:41:22 gke-prow-default-pool-42819f20-z1v4 kubelet[1387]: E0915 16:41:22.410891    1387 kubelet_pods.go:920] Failed killing the pod "boskos-3859550540-413kp": failed to "KillContainer" for "boskos" with KillContainerError: "rpc error: code = 2 desc = Error: No such container: 9a6fe5d52189226a97507b6a11cb36034c44d7ad59f5b16051ec3d034e9d292a"
Sep 15 16:41:22 gke-prow-default-pool-42819f20-z1v4 kubelet[1387]: W0915 16:41:22.398442    1387 docker_sandbox.go:263] NetworkPlugin kubenet failed on the status hook for pod "boskos-3859550540-413kp_test-pods": Cannot find the network namespace, skipping pod network status for container {"docker" "4fa0cac96a22723c21c5ea0cbf7aed43034d1a8e17c98848a6c247c11f401839"}
Sep 15 16:41:22 gke-prow-default-pool-42819f20-z1v4 kubelet[1387]: E0915 16:41:22.394223    1387 remote_runtime.go:272] ContainerStatus "9a6fe5d52189226a97507b6a11cb36034c44d7ad59f5b16051ec3d034e9d292a" from runtime service failed: rpc error: code = 2 desc = Error: No such container: 9a6fe5d52189226a97507b6a11cb36034c44d7ad59f5b16051ec3d034e9d292a
Sep 15 16:41:22 gke-prow-default-pool-42819f20-z1v4 kubelet[1387]: I0915 16:41:22.392711    1387 kubelet_pods.go:917] Killing unwanted pod "boskos-3859550540-413kp"
Sep 15 16:41:22 gke-prow-default-pool-42819f20-z1v4 kubelet[1387]: W0915 16:41:22.215676    1387 pod_container_deletor.go:77] Container "4fa0cac96a22723c21c5ea0cbf7aed43034d1a8e17c98848a6c247c11f401839" not found in pod's containers
Sep 15 16:41:22 gke-prow-default-pool-42819f20-z1v4 kubelet[1387]: I0915 16:41:22.215620    1387 kubelet.go:1842] SyncLoop (PLEG): "boskos-3859550540-413kp_test-pods(8f070f7d-9a30-11e7-ad2e-42010a8000c4)", event: &pleg.PodLifecycleEvent{ID:"8f070f7d-9a30-11e7-ad2e-42010a8000c4", Type:"ContainerDied", Data:"4fa0cac96a22723c21c5ea0cbf7aed43034d1a8e17c98848a6c247c11f401839"}
Sep 15 16:41:22 gke-prow-default-pool-42819f20-z1v4 kubelet[1387]: W0915 16:41:22.214899    1387 docker_sandbox.go:263] NetworkPlugin kubenet failed on the status hook for pod "boskos-3859550540-413kp_test-pods": Cannot find the network namespace, skipping pod network status for container {"docker" "4fa0cac96a22723c21c5ea0cbf7aed43034d1a8e17c98848a6c247c11f401839"}
Sep 15 16:41:21 gke-prow-default-pool-42819f20-z1v4 kubelet[1387]: I0915 16:41:21.258784    1387 eviction_manager.go:362] eviction manager: pod boskos-3859550540-413kp_test-pods(8f070f7d-9a30-11e7-ad2e-42010a8000c4) evicted successfully

So the problem is that after pod is evicted, the pod is not deleted from API server. (I think pos is in evicted state) attach_detach_controller does not receive pod delete event from API server so it will not detach
@dashpole , should pod is deleted from API server or not when it is evicted?

dashpole · 2017-09-16T03:54:54Z

It should not be deleted from the api server when it is evicted. This allows introspection of the pod after it is evicted.

I believe the correct behavior here is for the attack_detach_controller to identify that the Pod.Status.Phase == Failed or Succeeded. The only part I am not sure of is if there is a requirement for the volume to be unmounted before it is detached. In that case it may be trickier for the attach_detach_controller to identify that the volume has bee unmounted.

jingxu97 · 2017-09-18T16:34:36Z

I just recalled that we have already changed attach_detach controller to handle terminated pods. Please see kubernetes/kubernetes#45286. The PR was merged in May 11, and is included after 1.7.0 release.
So this test infra has kubernetes of 1.6.9 which does not include the fix. Because this is a breaking change, so it was not backported. But new version should not have this issue anymore.

k8s-ci-robot assigned krzyzacy Sep 14, 2017

k8s-ci-robot added the area/boskos Issues or PRs related to code in /boskos label Sep 14, 2017

krzyzacy mentioned this issue Sep 14, 2017

[upgrade tests] GCE 1.7 upgrade tests aren't coming up kubernetes/kubernetes#52353

Closed

This was referenced Sep 15, 2017

Fix deployment timeout reporting kubernetes/kubernetes#52178

Merged

[e2e and upgrade failure] several GCE and GKE tests failing with: gce boskos failed to acquire project kubernetes/kubernetes#52554

Closed

msau42 mentioned this issue Sep 15, 2017

Convert boskos deployment to a statefulset #4589

Merged

krzyzacy closed this as completed Sep 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boskos server fail to claim pv after restart/reschedule from the node #4571

Boskos server fail to claim pv after restart/reschedule from the node #4571

krzyzacy commented Sep 14, 2017 •

edited

Loading

0xmichalis commented Sep 14, 2017 •

edited

Loading

msau42 commented Sep 14, 2017

krzyzacy commented Sep 14, 2017

krzyzacy commented Sep 15, 2017

jingxu97 commented Sep 15, 2017 •

edited

Loading

krzyzacy commented Sep 15, 2017

foxish commented Sep 15, 2017 •

edited

Loading

saad-ali commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

krzyzacy commented Sep 15, 2017

foxish commented Sep 15, 2017 •

edited

Loading

jingxu97 commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

saad-ali commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

0xmichalis commented Sep 15, 2017

krzyzacy commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

0xmichalis commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

dashpole commented Sep 16, 2017

jingxu97 commented Sep 18, 2017 •

edited

Loading

Boskos server fail to claim pv after restart/reschedule from the node #4571

Boskos server fail to claim pv after restart/reschedule from the node #4571

Comments

krzyzacy commented Sep 14, 2017 • edited Loading

0xmichalis commented Sep 14, 2017 • edited Loading

msau42 commented Sep 14, 2017

krzyzacy commented Sep 14, 2017

krzyzacy commented Sep 15, 2017

jingxu97 commented Sep 15, 2017 • edited Loading

krzyzacy commented Sep 15, 2017

foxish commented Sep 15, 2017 • edited Loading

saad-ali commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

krzyzacy commented Sep 15, 2017

foxish commented Sep 15, 2017 • edited Loading

jingxu97 commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

saad-ali commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

0xmichalis commented Sep 15, 2017

krzyzacy commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

0xmichalis commented Sep 15, 2017

jingxu97 commented Sep 15, 2017

dashpole commented Sep 16, 2017

jingxu97 commented Sep 18, 2017 • edited Loading

krzyzacy commented Sep 14, 2017 •

edited

Loading

0xmichalis commented Sep 14, 2017 •

edited

Loading

jingxu97 commented Sep 15, 2017 •

edited

Loading

foxish commented Sep 15, 2017 •

edited

Loading

foxish commented Sep 15, 2017 •

edited

Loading

jingxu97 commented Sep 18, 2017 •

edited

Loading