Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test] Kubernetes e2e suite: [sig-apps] Deployment iterative rollouts should eventually progress #100314

Closed
sladyn98 opened this issue Mar 16, 2021 · 11 comments
Labels
kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/apps Categorizes an issue or PR as relevant to SIG Apps. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@sladyn98
Copy link
Contributor

Which jobs are flaking:

ci-kubernetes-kind-ipv6-e2e-parallel #1371633997716656128
ci-kubernetes-kind-e2e-parallel #1371833810513039360

Which test(s) are flaking:

Kubernetes e2e suite: [sig-apps] Deployment iterative rollouts should eventually progress

Testgrid link:

https://testgrid.k8s.io/sig-release-master-blocking#kind-master-parallel&exclude-non-failed-tests=
https://testgrid.k8s.io/sig-release-master-blocking#kind-ipv6-master-parallel&exclude-non-failed-tests=

Reason for failure:

home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/apps/deployment.go:130
Mar 16 15:04:06.691: Unexpected error:
    <*errors.errorString | 0xc0005f7510>: {
        s: "error waiting for deployment \"webserver\" status to match expectation: deployment status: v1.DeploymentStatus{ObservedGeneration:14, Replicas:10, UpdatedReplicas:10, ReadyReplicas:9, AvailableReplicas:9, UnavailableReplicas:1, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:\"Available\", Status:\"True\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63751503573, loc:(*time.Location)(0x999d000)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63751503573, loc:(*time.Location)(0x999d000)}}, Reason:\"MinimumReplicasAvailable\", Message:\"Deployment has minimum availability.\"}, v1.DeploymentCondition{Type:\"Progressing\", Status:\"True\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63751503823, loc:(*time.Location)(0x999d000)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63751503823, loc:(*time.Location)(0x999d000)}}, Reason:\"ReplicaSetUpdated\", Message:\"ReplicaSet \\\"webserver-78f95b7859\\\" is progressing.\"}}, CollisionCount:(*int32)(nil)}",
    }
    error waiting for deployment "webserver" status to match expectation: deployment status: v1.DeploymentStatus{ObservedGeneration:14, Replicas:10, UpdatedReplicas:10, ReadyReplicas:9, AvailableReplicas:9, UnavailableReplicas:1, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63751503573, loc:(*time.Location)(0x999d000)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63751503573, loc:(*time.Location)(0x999d000)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}, v1.DeploymentCondition{Type:"Progressing", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63751503823, loc:(*time.Location)(0x999d000)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63751503823, loc:(*time.Location)(0x999d000)}}, Reason:"ReplicaSetUpdated", Message:"ReplicaSet \"webserver-78f95b7859\" is progressing."}}, CollisionCount:(*int32)(nil)}
occurred
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/apps/deployment.go:978

Anything else we need to know:

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-kind-e2e-parallel/1371833810513039360
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-kind-ipv6-e2e-parallel/1371633997716656128

https://storage.googleapis.com/k8s-gubernator/triage/index.html?test=Deployment%20iterative%20rollouts%20should%20eventually%20progress

@sladyn98 sladyn98 added the kind/flake Categorizes issue or PR as related to a flaky test. label Mar 16, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 16, 2021
@sladyn98
Copy link
Contributor Author

Looks like /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/storage/csi_mock_volume.go:958 timed out waiting for condition

@neolit123
Copy link
Member

/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 16, 2021
@thejoycekung
Copy link
Contributor

thejoycekung commented Mar 17, 2021

Thank you for logging this @sladyn98!

/cc @soltysh

@soltysh
Copy link
Contributor

soltysh commented May 31, 2021

It looks like the failures are coming from default ProgressDeadlineSeconds which is 10mins, which kicks in during one of the rollouts:

May 18 01:05:56.890: INFO: deployment status: v1.DeploymentStatus{ObservedGeneration:15, Replicas:10, UpdatedReplicas:10, ReadyReplicas:8, AvailableReplicas:8, UnavailableReplicas:2, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63756896724, loc:(*time.Location)(0x9e31080)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63756896724, loc:(*time.Location)(0x9e31080)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}, v1.DeploymentCondition{Type:"Progressing", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63756896755, loc:(*time.Location)(0x9e31080)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63756896755, loc:(*time.Location)(0x9e31080)}}, Reason:"ProgressDeadlineExceeded", Message:"ReplicaSet \"webserver-847dcfb7fb\" has timed out progressing."}}, CollisionCount:(*int32)(nil)}

@soltysh
Copy link
Contributor

soltysh commented May 31, 2021

@atiratree mind checking this one out
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 31, 2021
@atiratree
Copy link
Member

@soltysh ProgressDeadlineSeconds is actually 30s: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/apps/deployment.go#L883

ProgressDeadlineExceeded

Most of the tests fail because of this. Some pods from active Deployment/Replicaset get run successfully, but some of them fail to start and they CrashLoopBackOff and fail to start repeatedly. Then Deployment reaches ProgressDeadlineSeconds in 30s and the Progressing condition fails. Nevertheless, the status checking still continues with its own 5m timeout.

I cannot find what causes the failed containers. The number of ReadyReplicas vary between each run. I think this is due to the contended environment since most of the test fails happen in tandem with tens of other tests failing as well. Also I can reproduce this locally.

We can bump the ProgressDeadlineSeconds to 5m to at least give it a chance to recover before the checking status timeout runs out.

example cherry picked logs for the last container that triggers ReplicaSet update (from https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-network-proxy-grpc/1399579263820107776)

04:22:58.439302000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0601 04:22:58.439302       9 replica_set.go:375] Pod webserver-5f6fd478c7-p2tpc created: &v1.Pod{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"webserver-5f6fd478c7-p2tpc", GenerateName:"webserver-5f6fd47│8c7-", Namespace:"deployment-4339", SelfLink:"", UID:"90c911ce-654d-4930-bd90-63f0cfb9ca26", ResourceVersion:"6850", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63758118178, loc:(*time.Location)(0x72f0c00)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"name":"httpd", "pod-template-hash":"5f6fd478c7"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference{v1.OwnerReference{APIVersion:"apps/v1", Kind:"ReplicaSet", Name:"webserver-5f6fd478c7", UID:"9869d12b-1484-4db5-b0ea-030f7f73f5e5", Controller:(*bool)(0xc00332f5f0), BlockOwnerDeletion:(*bool)(0xc00332f5f1)}}, Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"kube-controller-manager", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc0012a1c50), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc0012a1c68), Subresource:""}}}, Spec:v1.PodSpec{Volumes:[]v1.Volume{v1.Volume{Name:"kube-api-access-hqjkx", VolumeSource:v1.VolumeSource{HostPath:(*v1.HostPathVolumeSource)(nil), EmptyDir:(*v1.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*v1.GitRepoVolumeSource)(nil), Secret:(*v1.SecretVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), ISCSI:(*v1.ISCSIVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*v1.PersistentVolumeClaimVolumeSource)(nil), RBD:(*v1.RBDVolumeSource)(nil), FlexVolume:(*v1.FlexVolumeSource)(nil), Cinder:(*v1.CinderVolumeSource)(nil), CephFS:(*v1.CephFSVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), DownwardAPI:(*v1.DownwardAPIVolumeSource)(nil), FC:(*v1.FCVolumeSource)(nil), AzureFile:(*v1.AzureFileVolumeSource)(nil), ConfigMap:(*v1.ConfigMapVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), Projected:(*v1.ProjectedVolumeSource)(0xc0027fe940), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOVolumeSource)(nil), StorageOS:(*v1.StorageOSVolumeSource)(nil), CSI:(*v1.CSIVolumeSource)(nil), Ephemeral:(*v1.EphemeralVolumeSource)(nil)}}}, InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"httpd", Image:"k8s.gcr.io/e2e-test-images/httpd:2.4.38-1", Command:[]string(nil), Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar{v1.EnvVar{Name:"A11", Value:"11", ValueFrom:(*v1.EnvVarSource)(nil)}, v1.EnvVar{Name:"A12", Value:"12", ValueFrom:(*v1.EnvVarSource)(nil)}, v1.EnvVar{Name:"A16", Value:"16", ValueFrom:(*v1.EnvVarSource)(nil)}, v1.EnvVar{Name:"A17", Value:"17", ValueFrom:(*v1.EnvVarSource)(nil)}}, Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount{v1.VolumeMount{Name:"kube-api-access-hqjkx", ReadOnly:true, MountPath:"/var/run/secrets/kubernetes.io/serviceaccount", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil), SubPathExpr:""}}, VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), StartupProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*v1.SecurityContext)(0xc00338ade0), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]v1.EphemeralContainer(nil), RestartPolicy:"Always", TerminationGracePeriodSeconds:(*int64)(0xc00332f688), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"default", DeprecatedServiceAccount:"default", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(0xc002d94000), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]v1.Toleration{v1.Toleration{Key:"node.kubernetes.io/not-ready", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc00332f6c0)}, v1.Toleration{Key:"node.kubernetes.io/unreachable", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(0xc00332f6e0)}}, HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(0xc00332f6e8), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), EnableServiceLinks:(*bool)(0xc00332f6ec), PreemptionPolicy:(*v1.PreemptionPolicy)(0xc002c46d70), Overhead:v1.ResourceList(nil), TopologySpreadConstraints:[]v1.TopologySpreadConstraint(nil), SetHostnameAsFQDN:(*bool)(nil)}, Status:v1.PodStatus{Phase:"Pending", Conditions:[]v1.PodCondition(nil), Message:"", Reason:"", NominatedNodeName:"", HostIP:"", PodIP:"", PodIPs:[]v1.PodIP(nil), StartTime:(*v1.Time)(nil), InitContainerStatuses:[]v1.ContainerStatus(nil), ContainerStatuses:[]v1.ContainerStatus(nil), QOSClass:"BestEffort", EphemeralContainerStatuses:[]v1.ContainerStatus(nil)}}.
04:23:00.641774000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:00.641774    1713 kuberuntime_manager.go:996] "getSandboxIDByPodUID got sandbox IDs for pod" podSandboxID=[0e516d0e0b273811951ffa8eec5dc1cb20f4f4105f286d320bf7c057c01d9707] pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:00.643905000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:00.643905    1713 kuberuntime_manager.go:629] "Container of pod is not in the desired state and shall be started" containerName="httpd" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:01.651112000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:01.651112    1713 volume_manager.go:373] "Waiting for volumes to attach and mount for pod" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:01.651305000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:01.651305    1713 volume_manager.go:404] "All volumes are attached and mounted for pod" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:01.651343000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:01.651343    1713 scope.go:111] "RemoveContainer" containerID="f9bf8c53c45ce925fbc8089b76a59f4827850659eea32939c4b9d23305926f50"
04:23:01.651388000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:01.651388    1713 kuberuntime_manager.go:629] "Container of pod is not in the desired state and shall be started" containerName="httpd" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:01.651428000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:01.651428    1713 kuberuntime_manager.go:705] "computePodActions got for pod" podActions={KillPod:false CreateSandbox:false SandboxID:0e516d0e0b273811951ffa8eec5dc1cb20f4f4105f286d320bf7c057c01d9707 Attempt:0 NextInitContaierToStart:nil ContainersToStart:[0] ContainersToKill:map[] EphemeralContainersToStart:[]} pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:01.651561000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:01.651561    1713 kuberuntime_manager.go:916] "Checking backoff for container in pod" containerName="httpd" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:01.651778000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:01.651778    1713 kuberuntime_manager.go:926] "Back-off restarting failed container" err="back-off 10s restarting failed container=httpd pod=webserver-5f6fd478c7-p2tpc_deployment-4339(90c911ce-654d-4930-bd90-63f0cfb9ca26)"
04:23:01.651880000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:01.651880    1713 kuberuntime_manager.go:850] "Backing Off restarting container in pod" containerType="container" container="&Container{Name:httpd,Image:k8s.gcr.io/e2e-test-images/httpd:2.4.38-1,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{EnvVar{Name:A11,Value:11,ValueFrom:nil,},EnvVar{Name:A12,Value:12,ValueFrom:nil,},EnvVar{Name:A16,Value:16,ValueFrom:nil,},EnvVar{Name:A17,Value:17,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:kube-api-access-hqjkx,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:nil,Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,}" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:01.651950000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       E0601 04:23:01.651950    1713 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"httpd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=httpd pod=webserver-5f6fd478c7-p2tpc_deployment-4339(90c911ce-654d-4930-bd90-63f0cfb9ca26)\"" pod="deployment-4339/webserver-5f6fd478c7-p2tpc" podUID=90c911ce-654d-4930-bd90-63f0cfb9ca26
04:23:01.652104000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:01.652104    1713 event.go:291] "Event occurred" object="deployment-4339/webserver-5f6fd478c7-p2tpc" kind="Pod" apiVersion="v1" type="Warning" reason="BackOff" message="Back-off restarting failed container"
04:23:05.169504000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:05.169504    1713 status_manager.go:595] "Status for pod updated successfully" pod="deployment-4339/webserver-5f6fd478c7-p2tpc" statusVersion=3 status={Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-06-01 04:22:58 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-06-01 04:22:58 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [httpd]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-06-01 04:22:58 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [httpd]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-06-01 04:22:58 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.138.0.3 PodIP:10.64.3.73 PodIPs:[{IP:10.64.3.73}] StartTime:2021-06-01 04:22:58 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[{Name:httpd State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2021-06-01 04:23:00 +0000 UTC,FinishedAt:2021-06-01 04:23:00 +0000 UTC,ContainerID:containerd://f9bf8c53c45ce925fbc8089b76a59f4827850659eea32939c4b9d23305926f50,}} LastTerminationState:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2021-06-01 04:22:59 +0000 UTC,FinishedAt:2021-06-01 04:22:59 +0000 UTC,ContainerID:containerd://780401432814c34137236416564bd1738b9455e76214adf6db1be0116f4c503d,}} Ready:false RestartCount:1 Image:k8s.gcr.io/e2e-test-images/httpd:2.4.38-1 ImageID:k8s.gcr.io/e2e-test-images/httpd@sha256:b913fa234cc3473cfe16e937d106b455a7609f927f59031c81aca791e2689b50 ContainerID:containerd://f9bf8c53c45ce925fbc8089b76a59f4827850659eea32939c4b9d23305926f50 Started:0xc000a01a6c}] QOSClass:BestEffort EphemeralContainerStatuses:[]}
04:23:06.264129000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:06.264129    1713 status_manager.go:595] "Status for pod updated successfully" pod="deployment-4339/webserver-5f6fd478c7-p2tpc" statusVersion=4 status={Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-06-01 04:22:58 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-06-01 04:22:58 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [httpd]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-06-01 04:22:58 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [httpd]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-06-01 04:22:58 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.138.0.3 PodIP:10.64.3.73 PodIPs:[{IP:10.64.3.73}] StartTime:2021-06-01 04:22:58 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[{Name:httpd State:{Waiting:&ContainerStateWaiting{Reason:CrashLoopBackOff,Message:back-off 10s restarting failed container=httpd pod=webserver-5f6fd478c7-p2tpc_deployment-4339(90c911ce-654d-4930-bd90-63f0cfb9ca26),} Running:nil Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2021-06-01 04:23:00 +0000 UTC,FinishedAt:2021-06-01 04:23:00 +0000 UTC,ContainerID:containerd://f9bf8c53c45ce925fbc8089b76a59f4827850659eea32939c4b9d23305926f50,}} Ready:false RestartCount:1 Image:k8s.gcr.io/e2e-test-images/httpd:2.4.38-1 ImageID:k8s.gcr.io/e2e-test-images/httpd@sha256:b913fa234cc3473cfe16e937d106b455a7609f927f59031c81aca791e2689b50 ContainerID:containerd://f9bf8c53c45ce925fbc8089b76a59f4827850659eea32939c4b9d23305926f50 Started:0xc002319a2c}] QOSClass:BestEffort EphemeralContainerStatuses:[]}
04:23:13.732530000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:13.732530    1713 kuberuntime_manager.go:629] "Container of pod is not in the desired state and shall be started" containerName="httpd" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:13.740940000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:13.740940    1713 event.go:291] "Event occurred" object="deployment-4339/webserver-5f6fd478c7-p2tpc" kind="Pod" apiVersion="v1" type="Normal" reason="Pulled" message="Container image \"k8s.gcr.io/e2e-test-images/httpd:2.4.38-1\" already present on machine" "Container of pod is not in the desired state and shall be started" containerName="httpd" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:14.757523000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:14.757523    1713 kuberuntime_manager.go:705] "computePodActions got for pod" podActions={KillPod:false CreateSandbox:false SandboxID:0e516d0e0b273811951ffa8eec5dc1cb20f4f4105f286d320bf7c057c01d9707 Attempt:0 NextInitContainerToStart:nil ContainersToStart:[0] ContainersToKill:map[] EphemeralContainersToStart:[]} pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:14.757647000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:14.757647    1713 kuberuntime_manager.go:916] "Checking backoff for container in pod" containerName="httpd" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:14.757840000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:14.757840    1713 kuberuntime_manager.go:926] "Back-off restarting failed container" err="back-off 20s restarting failed container=httpd pod=webserver-5f6fd478c7-p2tpc_deployment-4339(90c911ce-654d-4930-bd90-63f0cfb9ca26)" 
04:23:14.757944000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:14.757944    1713 kuberuntime_manager.go:850] "Backing Off restarting container in pod" containerType="container" container="&Container{Name:httpd,Image:k8s.gcr.io/e2e-test-images/httpd:2.4.38-1,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{EnvVar{Name:A11,Value:11,ValueFrom:nil,},EnvVar{Name:A12,Value:12,ValueFrom:nil,},EnvVar{Name:A16,Value:16,ValueFrom:nil,},EnvVar{Name:A17,Value:17,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:kube-api-access-hqjkx,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:nil,Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,}" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"                                                                                                                                                                                                                                                       
04:23:14.758002000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       E0601 04:23:14.758002    1713 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"httpd\" with CrashLoopBackOff: \"back-off 20s restarting failed container=httpd pod=webserver-5f6fd478c7-p2tpc_deployment-4339(90c911ce-654d-4930-bd90-63f0cfb9ca26)\"" pod="deployment-4339/webserver-5f6fd478c7-p2tpc" podUID=90c911ce-654d-4930-bd90-63f0cfb9ca26   
04:23:40.989694000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:40.989694    1713 volume_manager.go:373] "Waiting for volumes to attach and mount for pod" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:40.989751000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:40.989751    1713 volume_manager.go:404] "All volumes are attached and mounted for pod" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:40.989768000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:40.989768    1713 scope.go:111] "RemoveContainer" containerID="d48a2b8891b83ba0fa9568dcab4ec1960f04d0c4bd2022da5bc6e8be496dfd94"
04:23:40.989798000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:40.989798    1713 kuberuntime_manager.go:629] "Container of pod is not in the desired state and shall be started" containerName="httpd" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:40.989824000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:40.989824    1713 kuberuntime_manager.go:705] "computePodActions got for pod" podActions={KillPod:false CreateSandbox:false SandboxID:0e516d0e0b273811951ffa8eec5dc1cb20f4f4105f286d320bf7c057c01d9707 Attempt:0 NextInitContainerToStart:nil ContainersToStart:[0] ContainersToKill:map[] EphemeralContainersToStart:[]} pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:40.989904000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:40.989904    1713 kuberuntime_manager.go:916] "Checking backoff for container in pod" containerName="httpd" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:40.990069000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:40.990069    1713 kuberuntime_manager.go:926] "Back-off restarting failed container" err="back-off 40s restarting failed container=httpd pod=webserver-5f6fd478c7-p2tpc_deployment-4339(90c911ce-654d-4930-bd90-63f0cfb9ca26)"
04:23:40.990176000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       I0601 04:23:40.990176    1713 kuberuntime_manager.go:850] "Backing Off restarting container in pod" containerType="container" container="&Container{Name:httpd,Image:k8s.gcr.io/e2e-test-images/httpd:2.4.38-1,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{EnvVar{Name:A11,Value:11,ValueFrom:nil,},EnvVar{Name:A12,Value:12,ValueFrom:nil,},EnvVar{Name:A16,Value:16,ValueFrom:nil,},EnvVar{Name:A17,Value:17,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:kube-api-access-hqjkx,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:nil,Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,}" pod="deployment-4339/webserver-5f6fd478c7-p2tpc"
04:23:40.990223000 [/artifacts/bootstrap-e2e-minion-group-71xx/kubelet.log]       E0601 04:23:40.990223    1713 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"httpd\" with CrashLoopBackOff: \"back-off 40s restarting failed container=httpd pod=webserver-5f6fd478c7-p2tpc_deployment-4339(90c911ce-654d-4930-bd90-63f0cfb9ca26)\"" pod="deployment-4339/webserver-5f6fd478c7-p2tpc" podUID=90c911ce-654d-4930-bd90-63f0cfb9ca26
04:23:41.171352000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0601 04:23:41.171352       9 replica_set.go:438] Pod webserver-5f6fd478c7-p2tpc updated, objectMeta {Name:webserver-5f6fd478c7-p2tpc GenerateName:webserver-5f6fd478c7- Namespace:deployment-4339 SelfLink: UID:90c911ce-654d-4930-bd90-63f0cfb9ca26 ResourceVersion:8084 Generation:0 CreationTimestamp:2021-06-01 04:22:58 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[name:httpd pod-template-hash:5f6fd478c7] Annotations:map[] OwnerReferences:[{APIVersion:apps/v1 Kind:ReplicaSet Name:webserver-5f6fd478c7 UID:9869d12b-1484-4db5-b0ea-030f7f73f5e5 Controller:0xc000c50757 BlockOwnerDeletion:0xc000c50758}] Finalizers:[] ClusterName: ManagedFields:[{Manager:kube-controller-manager Operation:Update APIVersion:v1 Time:2021-06-01 04:22:58 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:name":{},"f:pod-template-hash":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"9869d12b-1484-4db5-b0ea-030f7f73f5e5\"}":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"httpd\"}":{".":{},"f:env":{".":{},"k:{\"name\":\"A11\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"A12\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"A16\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"A17\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:securityContext":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}} Subresource:} {Manager:kubelet Operation:Update APIVersion:v1 Time:2021-06-01 04:23:40 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:status":{"f:conditions":{"k:{\"type\":\"ContainersReady\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Initialized\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Ready\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}}},"f:containerStatuses":{},"f:hostIP":{},"f:phase":{},"f:podIP":{},"f:podIPs":{".":{},"k:{\"ip\":\"10.64.3.73\"}":{".":{},"f:ip":{}}},"f:startTime":{}}} Subresource:status}]} -> {Name:webserver-5f6fd478c7-p2tpc GenerateName:webserver-5f6fd478c7- Namespace:deployment-4339 SelfLink: UID:90c911ce-654d-4930-bd90-63f0cfb9ca26 ResourceVersion:8112 Generation:0 CreationTimestamp:2021-06-01 04:22:58 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[name:httpd pod-template-hash:5f6fd478c7] Annotations:map[] OwnerReferences:[{APIVersion:apps/v1 Kind:ReplicaSet Name:webserver-5f6fd478c7 UID:9869d12b-1484-4db5-b0ea-030f7f73f5e5 Controller:0xc002ebfab7 BlockOwnerDeletion:0xc002ebfab8}] Finalizers:[] ClusterName: ManagedFields:[{Manager:kube-controller-manager Operation:Update APIVersion:v1 Time:2021-06-01 04:22:58 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:name":{},"f:pod-template-hash":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"9869d12b-1484-4db5-b0ea-030f7f73f5e5\"}":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"httpd\"}":{".":{},"f:env":{".":{},"k:{\"name\":\"A11\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"A12\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"A16\"}":{".":{},"f:name":{},"f:value":{}},"k:{\"name\":\"A17\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:securityContext":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}} Subresource:} {Manager:kubelet Operation:Update APIVersion:v1 Time:2021-06-01 04:23:41 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:status":{"f:conditions":{"k:{\"type\":\"ContainersReady\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Initialized\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Ready\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}}},"f:containerStatuses":{},"f:hostIP":{},"f:phase":{},"f:podIP":{},"f:podIPs":{".":{},"k:{\"ip\":\"10.64.3.73\"}":{".":{},"f:ip":{}}},"f:startTime":{}}} Subresource:status}]}.
04:23:41.171699000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0601 04:23:41.171699       9 replica_set_utils.go:59] Updating status for : deployment-4339/webserver-5f6fd478c7, replicas 6->6 (need 6), fullyLabeledReplicas 6->6, readyReplicas 5->4, availableReplicas 5->4, sequence No: 7->7

build.log

Jun  1 04:24:08.371: INFO: deployment status: v1.DeploymentStatus{ObservedGeneration:20, Replicas:6, UpdatedReplicas:6, ReadyReplicas:4, AvailableReplicas:4, UnavailableReplicas:2, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Progressing", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118220, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118107, loc:(*time.Location)(0x9e27c40)}}, Reason:"ReplicaSetUpdated", Message:"ReplicaSet \"webserver-5f6fd478c7\" is progressing."}, v1.DeploymentCondition{Type:"Available", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118221, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118221, loc:(*time.Location)(0x9e27c40)}}, Reason:"MinimumReplicasUnavailable", Message:"Deployment does not have minimum availability."}}, CollisionCount:(*int32)(nil)}
Jun  1 04:24:10.364: INFO: deployment status: v1.DeploymentStatus{ObservedGeneration:20, Replicas:6, UpdatedReplicas:6, ReadyReplicas:4, AvailableReplicas:4, UnavailableReplicas:2, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Progressing", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118220, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118107, loc:(*time.Location)(0x9e27c40)}}, Reason:"ReplicaSetUpdated", Message:"ReplicaSet \"webserver-5f6fd478c7\" is progressing."}, v1.DeploymentCondition{Type:"Available", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118221, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118221, loc:(*time.Location)(0x9e27c40)}}, Reason:"MinimumReplicasUnavailable", Message:"Deployment does not have minimum availability."}}, CollisionCount:(*int32)(nil)}
Jun  1 04:24:12.364: INFO: deployment status: v1.DeploymentStatus{ObservedGeneration:20, Replicas:6, UpdatedReplicas:6, ReadyReplicas:4, AvailableReplicas:4, UnavailableReplicas:2, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Available", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118221, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118221, loc:(*time.Location)(0x9e27c40)}}, Reason:"MinimumReplicasUnavailable", Message:"Deployment does not have minimum availability."}, v1.DeploymentCondition{Type:"Progressing", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118251, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118251, loc:(*time.Location)(0x9e27c40)}}, Reason:"ProgressDeadlineExceeded", Message:"ReplicaSet \"webserver-5f6fd478c7\" has timed out progressing."}}, CollisionCount:(*int32)(nil)}
Jun  1 04:24:14.470: INFO: deployment status: v1.DeploymentStatus{ObservedGeneration:20, Replicas:6, UpdatedReplicas:6, ReadyReplicas:4, AvailableReplicas:4, UnavailableReplicas:2, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Available", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118221, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118221, loc:(*time.Location)(0x9e27c40)}}, Reason:"MinimumReplicasUnavailable", Message:"Deployment does not have minimum availability."}, v1.DeploymentCondition{Type:"Progressing", Status:"False", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118251, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758118251, loc:(*time.Location)(0x9e27c40)}}, Reason:"ProgressDeadlineExceeded", Message:"ReplicaSet \"webserver-5f6fd478c7\" has timed out progressing."}}, CollisionCount:(*int32)(nil)}

Status not updated

One test (https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-containerd/1399452929412304896) fails because of the following issue:

Here ProgressDeadlineExceeded does not occur, but the test also fails because of waiting for deployment complete status.

The deployment status and LastUpdateTime is stuck and doesn't change in whole 5 minutes.

May 31 20:16:33.804: INFO: deployment status: v1.DeploymentStatus{ObservedGeneration:17, Replicas:4, UpdatedReplicas:0, ReadyReplicas:4, AvailableReplicas:4, UnavailableReplicas:0, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:"Available", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758088657, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758088657, loc:(*time.Location)(0x9e27c40)}}, Reason:"MinimumReplicasAvailable", Message:"Deployment has minimum availability."}, v1.DeploymentCondition{Type:"Progressing", Status:"True", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758088680, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758088626, loc:(*time.Location)(0x9e27c40)}}, Reason:"ReplicaSetUpdated", Message:"ReplicaSet \"webserver-6ff9f944cc\" is progressing."}}, CollisionCount:(*int32)(nil)}

The last step is May 31 20:11:32.712: INFO: 19: triggering a new rollout for deployment "webserver" and it adds {"name":"A19","value":"19"} env variable to the pod definition,
But from the logs I do not see a creation of a new ReplicaSet. Only older ReplicaSets which were already present before the rollout are logged.
This seems like a bug IMO. My hypothesis is that deployment controller somehow misses the creation of a new ReplicaSet.
And ProgressDeadlineExceeded doesn't occur as well, so it seems it accepts the old ReplicaSet while keeping the UpdatedReplicas on 0?

Nevertheless, I couldn't reproduce this. I tried to simulate the same order of previously random operations as captured by the log with a new rollout at the end.
I have run it many times in parallel for long period of time, but could not reproduce it.

The replicaset and deployment controllers are woken up a few times and nothing interesting happens. No new replica appears.

May 31 20:11:32.712: INFO: 19: triggering a new rollout for deployment "webserver"
May 31 20:11:33.047: INFO: Updating deployment webserver # --- TIMESTAMP IS OFF since it gets logged after the update ---
20:11:33.029284000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:11:33.029284       8 deployment_controller.go:176] "Updating deployment" deployment="deployment-2764/webserver"
20:11:33.029331000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:11:33.029331       8 deployment_controller.go:576] "Started syncing deployment" deployment="deployment-2764/webserver" startTime="2021-05-31 20:11:33.02931119 +0000 UTC m=+1034.855073843"  
20:11:33.228225000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:11:33.228225       8 deployment_controller.go:176] "Updating deployment" deployment="deployment-2764/webserver"
20:11:33.228248556 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:11:33.228266       8 deployment_controller.go:576] "Started syncing deployment" deployment="deployment-2764/webserver" startTime="2021-05-31 20:11:33.228248556 +0000 UTC m=+1035.054011195"
20:11:33.228714000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:11:33.228714       8 deployment_controller.go:578] "Finished syncing deployment" deployment="deployment-2764/webserver" duration="445.944µs"  
May 31 20:11:33.339: INFO: Waiting for deployment "webserver" to be observed by the controller
May 31 20:11:33.534: INFO: Waiting for deployment "webserver" status
20:14:31.252074000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:14:31.252074       8 controller_utils.go:189] Controller expectations fulfilled &controller.ControlleeExpectations{add:0, del:0, key:"deployment-2764/webserver-768bfbd6dc", timestamp:time.Time{wall:0xc0256e99d045eea3, ext:1021098781185, loc:(*time.Location)(0x72f0c00)}}
20:14:31.252145000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:14:31.252145       8 replica_set.go:648] Finished syncing ReplicaSet "deployment-2764/webserver-768bfbd6dc" (93.347µs)
20:14:31.257660000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:14:31.257660       8 controller_utils.go:189] Controller expectations fulfilled &controller.ControlleeExpectations{add:0, del:0, key:"deployment-2764/webserver-6ff9f944cc", timestamp:time.Time{wall:0xc0256e9b0bc11a05, ext:1026022967144, loc:(*time.Location)(0x72f0c00)}}
20:14:31.257692000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:14:31.257692       8 replica_set.go:648] Finished syncing ReplicaSet "deployment-2764/webserver-6ff9f944cc" (33.658µs)
20:14:31.267205000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:14:31.267205       8 deployment_controller.go:176] "Updating deployment" deployment="deployment-2764/webserver"
20:14:31.267228812 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:14:31.267260       8 deployment_controller.go:576] "Started syncing deployment" deployment="deployment-2764/webserver" startTime="2021-05-31 20:14:31.267228812 +0000 UTC m=+1213.092991462"
20:14:31.267579000 [/artifacts/bootstrap-e2e-master/kube-controller-manager.log]  I0531 20:14:31.267579       8 deployment_controller.go:578] "Finished syncing deployment" deployment="deployment-2764/webserver" duration="342.329µs"   
May 31 20:16:33.804: FAIL: Unexpected error:
    <*errors.errorString | 0xc0039b5cb0>: {
        s: "error waiting for deployment \"webserver\" status to match expectation: deployment status: v1.DeploymentStatus{ObservedGeneration:17, Replicas:4, UpdatedReplicas:0, ReadyReplicas:4, AvailableReplicas:4, UnavailableReplicas:0, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:\"Available\", Status:\"True\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758088657, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758088657, loc:(*time.Location)(0x9e27c40)}}, Reason:\"MinimumReplicasAvailable\", Message:\"Deployment has minimum availability.\"}, v1.DeploymentCondition{Type:\"Progressing\", Status:\"True\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63758088680, loc:(*time.Location)(0x9e27c40)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63758088626, loc:(*time.Location)(0x9e27c40)}}, Reason:\"ReplicaSetUpdated\", Message:\"ReplicaSet \\\"webserver-6ff9f944cc\\\" is progressing.\"}}, CollisionCount:(*int32)(nil)}",

Other

I also found a bug that Pause is not tested properly (fixed by #102730)

@mkorbi
Copy link

mkorbi commented Jul 3, 2021

/milestone v1.22
looks like the flakes are gone, however would like to wait until the PRs are merged

@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jul 3, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 31, 2021
@afirth
Copy link
Member

afirth commented Nov 3, 2021

looks like @atiratree crushed this, testgrid looks fine and both #102736 and #102730 are merged
/close
please reopen if I'm wrong

@k8s-ci-robot
Copy link
Contributor

@afirth: Closing this issue.

In response to this:

looks like @atiratree crushed this, testgrid looks fine and both #102736 and #102730 are merged
/close
please reopen if I'm wrong

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/apps Categorizes an issue or PR as relevant to SIG Apps. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

9 participants