Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended.[networking] services when using a plugin that isolates namespaces by default should allow connections to services in the default namespace from a pod in another namespace on the same node #14385

Closed
bparees opened this issue May 27, 2017 · 15 comments · Fixed by #15957 or #18355
Assignees
Labels
component/networking kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1

Comments

@bparees
Copy link
Contributor

bparees commented May 27, 2017

/data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/networking/services.go:51
Expected success, but got an error:
    <*errors.errorString | 0xc42201d200>: {
        s: "pod 'service-wget' terminated with failure: &ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2017-05-26 23:15:05 -0400 EDT,FinishedAt:2017-05-26 23:15:08 -0400 EDT,ContainerID:docker://8bbb3a7e2f90e02eaec697dbc9988e3a69fabaa16157701f92258690bc078564,}",
    }
    pod 'service-wget' terminated with failure: &ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2017-05-26 23:15:05 -0400 EDT,FinishedAt:2017-05-26 23:15:08 -0400 EDT,ContainerID:docker://8bbb3a7e2f90e02eaec697dbc9988e3a69fabaa16157701f92258690bc078564,}
/data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/networking/services.go:50

https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/816/

@enj
Copy link
Contributor

enj commented Jun 24, 2017

Seen very similar in #14853 https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/1111/

Extended.[networking] services when using a plugin that isolates namespaces by default should allow connections from pods in the default namespace to a service in another namespace on a different node (from (networking_multitenant_true_01.xml))

Regression

Extended.[networking] services when using a plugin that isolates namespaces by default should allow connections from pods in the default namespace to a service in another namespace on a different node (from (networking_multitenant_true_01.xml))
Failing for the past 1 build (Since Failed#3636 )
Took 38 sec.
Stacktrace

/data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/networking/services.go:63
Expected success, but got an error:
    <*errors.errorString | 0xc421f680b0>: {
        s: "pod 'service-wget' terminated with failure: &ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2017-06-24 09:49:14 -0400 EDT,FinishedAt:2017-06-24 09:49:17 -0400 EDT,ContainerID:docker://fcfb3a9a2650720cea639dca7491eb0c75cef0f1b961e711d54131e1aaa69c5a,}",
    }
    pod 'service-wget' terminated with failure: &ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2017-06-24 09:49:14 -0400 EDT,FinishedAt:2017-06-24 09:49:17 -0400 EDT,ContainerID:docker://fcfb3a9a2650720cea639dca7491eb0c75cef0f1b961e711d54131e1aaa69c5a,}
/data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/networking/services.go:62

Standard Output

[BeforeEach] [Top Level]
  /data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:51
[BeforeEach] when using a plugin that isolates namespaces by default
  /data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/networking/util.go:270
[BeforeEach] when using a plugin that isolates namespaces by default
  /data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:120
�[1mSTEP�[0m: Creating a kubernetes client
Jun 24 09:49:05.939: INFO: >>> kubeConfig: /tmp/openshift/networking-minimal/multitenant/openshift.local.config/master/admin.kubeconfig

�[1mSTEP�[0m: Building a namespace api object
�[1mSTEP�[0m: Waiting for a default service account to be provisioned in namespace
[BeforeEach] when using a plugin that isolates namespaces by default
  /data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:120
�[1mSTEP�[0m: Creating a kubernetes client
Jun 24 09:49:05.994: INFO: >>> kubeConfig: /tmp/openshift/networking-minimal/multitenant/openshift.local.config/master/admin.kubeconfig

�[1mSTEP�[0m: Building a namespace api object
�[1mSTEP�[0m: Waiting for a default service account to be provisioned in namespace
[It] should allow connections from pods in the default namespace to a service in another namespace on a different node
  /data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/networking/services.go:63
Jun 24 09:49:08.138: INFO: Target pod IP:port is 10.128.0.32:8080
Jun 24 09:49:08.152: INFO: Endpoint e2e-tests-net-services1-r6fq6/service-vh2km is not ready yet
Jun 24 09:49:13.156: INFO: Target service IP:port is 172.30.106.115:8080
Jun 24 09:49:13.164: INFO: Waiting up to 5m0s for pod service-wget                                            status to be success or failure
Jun 24 09:49:13.166: INFO: No Status.Info for container 'service-wget-container' in pod 'service-wget' yet
Jun 24 09:49:13.166: INFO: Waiting for pod service-wget                                            in namespace 'e2e-tests-net-services2-kzkld' status to be 'success or failure'(found phase: "Pending", readiness: false) (1.808497ms elapsed)
Jun 24 09:49:18.169: INFO: Nil State.Terminated for container 'service-wget-container' in pod 'service-wget' in namespace 'e2e-tests-net-services2-kzkld' so far
Jun 24 09:49:18.169: INFO: Waiting for pod service-wget                                            in namespace 'e2e-tests-net-services2-kzkld' status to be 'success or failure'(found phase: "Running", readiness: true) (5.004233844s elapsed)
[AfterEach] when using a plugin that isolates namespaces by default
  /data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:121
�[1mSTEP�[0m: Collecting events from namespace "e2e-tests-net-services1-r6fq6".
�[1mSTEP�[0m: Found 4 events.
Jun 24 09:49:23.194: INFO: At 2017-06-24 09:49:06 -0400 EDT - event for service-vh2km: {kubelet nettest-node-1} MissingClusterDNS: kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
Jun 24 09:49:23.194: INFO: At 2017-06-24 09:49:06.806125334 -0400 EDT - event for service-vh2km: {kubelet nettest-node-1} Pulled: Container image "gcr.io/google_containers/porter:cd5cb5791ebaa8641955f0e8c2a9bed669b1eaab" already present on machine
Jun 24 09:49:23.194: INFO: At 2017-06-24 09:49:06.84383135 -0400 EDT - event for service-vh2km: {kubelet nettest-node-1} Created: Created container
Jun 24 09:49:23.194: INFO: At 2017-06-24 09:49:06.968844726 -0400 EDT - event for service-vh2km: {kubelet nettest-node-1} Started: Started container
Jun 24 09:49:23.198: INFO: POD            NODE            PHASE    GRACE  CONDITIONS
Jun 24 09:49:23.198: INFO: service-vh2km  nettest-node-1  Running  30s    [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-06-24 09:49:06 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2017-06-24 09:49:07 -0400 EDT  }]
Jun 24 09:49:23.198: INFO: 
Jun 24 09:49:23.201: INFO: 
Logging node info for node nettest-master-node
Jun 24 09:49:23.204: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:nettest-master-node,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/nettest-master-node,UID:fd1eb71b-58e1-11e7-b2f8-0242ed886f07,ResourceVersion:4598,Generation:0,CreationTimestamp:2017-06-24 09:35:29.562394451 -0400 EDT,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: nettest-master-node,},Annotations:map[string]string{volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,},Spec:NodeSpec{PodCIDR:,ExternalID:nettest-master-node,ProviderID:aws:////i-0f53595a2904191a5,Unschedulable:true,Taints:[],},Status:NodeStatus{Capacity:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16388919296 0} {<nil>} 16004804Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Allocatable:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16284061696 0} {<nil>} 15902404Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2017-06-24 09:49:22 -0400 EDT 2017-06-24 09:35:29 -0400 EDT KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2017-06-24 09:49:22 -0400 EDT 2017-06-24 09:35:29 -0400 EDT KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2017-06-24 09:49:22 -0400 EDT 2017-06-24 09:35:29 -0400 EDT KubeletHasNoDiskPressure kubelet has no disk pressure} {Ready True 2017-06-24 09:49:22 -0400 EDT 2017-06-24 09:35:39 -0400 EDT KubeletReady kubelet is posting ready status}],Addresses:[{LegacyHostIP 172.17.0.2} {InternalIP 172.17.0.2} {Hostname nettest-master-node}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:06480741201a4d2db4b65ef51888dac7,SystemUUID:EC284834-9DE9-4589-4ABF-F25A7F48EB4F,BootID:2363077f-393f-4e45-b8bc-d65b7bf50704,KernelVersion:3.10.0-514.21.1.el7.x86_64,OSImage:Fedora 25 (Twenty Five),ContainerRuntimeVersion:docker://1.12.6,KubeletVersion:v1.6.1+5115d708d7,KubeProxyVersion:v1.6.1+5115d708d7,OperatingSystem:linux,Architecture:amd64,},Images:[],VolumesInUse:[],VolumesAttached:[],},}
Jun 24 09:49:23.204: INFO: 
Logging kubelet events for node nettest-master-node
Jun 24 09:49:23.207: INFO: 
Logging pods the kubelet thinks is on node nettest-master-node
Jun 24 09:49:23.464: INFO: 
Latency metrics for node nettest-master-node
Jun 24 09:49:23.464: INFO: 
Logging node info for node nettest-node-1
Jun 24 09:49:23.467: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:nettest-node-1,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/nettest-node-1,UID:fb382d0b-58e1-11e7-b2f8-0242ed886f07,ResourceVersion:4596,Generation:0,CreationTimestamp:2017-06-24 09:35:26.373809879 -0400 EDT,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: nettest-node-1,},Annotations:map[string]string{volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,},Spec:NodeSpec{PodCIDR:,ExternalID:nettest-node-1,ProviderID:aws:////i-0f53595a2904191a5,Unschedulable:false,Taints:[],},Status:NodeStatus{Capacity:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16388919296 0} {<nil>} 16004804Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Allocatable:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16284061696 0} {<nil>} 15902404Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2017-06-24 09:49:19 -0400 EDT 2017-06-24 09:35:26 -0400 EDT KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2017-06-24 09:49:19 -0400 EDT 2017-06-24 09:35:26 -0400 EDT KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2017-06-24 09:49:19 -0400 EDT 2017-06-24 09:35:26 -0400 EDT KubeletHasNoDiskPressure kubelet has no disk pressure} {Ready True 2017-06-24 09:49:19 -0400 EDT 2017-06-24 09:35:36 -0400 EDT KubeletReady kubelet is posting ready status}],Addresses:[{LegacyHostIP 172.17.0.3} {InternalIP 172.17.0.3} {Hostname nettest-node-1}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:06480741201a4d2db4b65ef51888dac7,SystemUUID:EC284834-9DE9-4589-4ABF-F25A7F48EB4F,BootID:2363077f-393f-4e45-b8bc-d65b7bf50704,KernelVersion:3.10.0-514.21.1.el7.x86_64,OSImage:Fedora 25 (Twenty Five),ContainerRuntimeVersion:docker://1.12.6,KubeletVersion:v1.6.1+5115d708d7,KubeProxyVersion:v1.6.1+5115d708d7,OperatingSystem:linux,Architecture:amd64,},Images:[{[docker.io/openshift/origin-haproxy-router@sha256:29975c7c7521b5cd114bfa9c673b51c495916c73dc006caad779ae69c6dc7494 docker.io/openshift/origin-haproxy-router:latest] 989297072} {[docker.io/openshift/origin-pod@sha256:d4033f01475399bd540083b9fec0beb5d4928e8769149996d65743c605b080f9 docker.io/openshift/origin-pod:v3.6.0-alpha.2] 213317180} {[gcr.io/google_containers/hostexec@sha256:cab8d4e2526f8f767c64febe4ce9e0f0e58cd35fdff81b3aadba4dd041ba9f00 gcr.io/google_containers/hostexec:1.2] 13185747} {[gcr.io/google_containers/netexec@sha256:56c53846f44ea214e4aa5df37c9c50331f0b09e64a32cc7cf17c7e1808d38eef gcr.io/google_containers/netexec:1.7] 8016035} {[docker.io/openshift/hello-openshift@sha256:6e383758181bbab1f88ab7c6e8fa410a74806e478d28097ae0e87e2542940946 docker.io/openshift/hello-openshift:latest] 5635113} {[gcr.io/google_containers/porter@sha256:076acdada33f35b917c9eebe89eba95923601302beac57274985e418b70067e2 gcr.io/google_containers/porter:cd5cb5791ebaa8641955f0e8c2a9bed669b1eaab] 5010921} {[gcr.io/google_containers/busybox@sha256:d8d3bc2c183ed2f9f10e7258f84971202325ee6011ba137112e01e30f206de67 gcr.io/google_containers/busybox:latest] 2429728}],VolumesInUse:[],VolumesAttached:[],},}
Jun 24 09:49:23.467: INFO: 
Logging kubelet events for node nettest-node-1
Jun 24 09:49:23.469: INFO: 
Logging pods the kubelet thinks is on node nettest-node-1
Jun 24 09:49:23.489: INFO: service-vh2km started at 2017-06-24 09:49:06 -0400 EDT (0+1 container statuses recorded)
Jun 24 09:49:23.489: INFO: 	Container service-vh2km-container ready: true, restart count 0
Jun 24 09:49:23.636: INFO: 
Latency metrics for node nettest-node-1
Jun 24 09:49:23.636: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.99 Latency:2m2.016163s}
Jun 24 09:49:23.636: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.9 Latency:2m2.016163s}
Jun 24 09:49:23.636: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.5 Latency:2m1.013787s}
Jun 24 09:49:23.636: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.99 Latency:30.835795s}
Jun 24 09:49:23.636: INFO: 
Logging node info for node nettest-node-2
Jun 24 09:49:23.638: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:nettest-node-2,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/nettest-node-2,UID:fb9cd30d-58e1-11e7-b2f8-0242ed886f07,ResourceVersion:4597,Generation:0,CreationTimestamp:2017-06-24 09:35:27.033419394 -0400 EDT,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: nettest-node-2,},Annotations:map[string]string{volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,},Spec:NodeSpec{PodCIDR:,ExternalID:nettest-node-2,ProviderID:aws:////i-0f53595a2904191a5,Unschedulable:false,Taints:[],},Status:NodeStatus{Capacity:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16388919296 0} {<nil>} 16004804Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Allocatable:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16284061696 0} {<nil>} 15902404Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2017-06-24 09:49:20 -0400 EDT 2017-06-24 09:35:27 -0400 EDT KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2017-06-24 09:49:20 -0400 EDT 2017-06-24 09:35:27 -0400 EDT KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2017-06-24 09:49:20 -0400 EDT 2017-06-24 09:35:27 -0400 EDT KubeletHasNoDiskPressure kubelet has no disk pressure} {Ready True 2017-06-24 09:49:20 -0400 EDT 2017-06-24 09:35:37 -0400 EDT KubeletReady kubelet is posting ready status}],Addresses:[{LegacyHostIP 172.17.0.4} {InternalIP 172.17.0.4} {Hostname nettest-node-2}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:06480741201a4d2db4b65ef51888dac7,SystemUUID:EC284834-9DE9-4589-4ABF-F25A7F48EB4F,BootID:2363077f-393f-4e45-b8bc-d65b7bf50704,KernelVersion:3.10.0-514.21.1.el7.x86_64,OSImage:Fedora 25 (Twenty Five),ContainerRuntimeVersion:docker://1.12.6,KubeletVersion:v1.6.1+5115d708d7,KubeProxyVersion:v1.6.1+5115d708d7,OperatingSystem:linux,Architecture:amd64,},Images:[{[docker.io/openshift/origin-haproxy-router@sha256:29975c7c7521b5cd114bfa9c673b51c495916c73dc006caad779ae69c6dc7494 docker.io/openshift/origin-haproxy-router:latest] 989297072} {[docker.io/openshift/origin-pod@sha256:d4033f01475399bd540083b9fec0beb5d4928e8769149996d65743c605b080f9 docker.io/openshift/origin-pod:v3.6.0-alpha.2] 213317180} {[gcr.io/google_containers/hostexec@sha256:cab8d4e2526f8f767c64febe4ce9e0f0e58cd35fdff81b3aadba4dd041ba9f00 gcr.io/google_containers/hostexec:1.2] 13185747} {[gcr.io/google_containers/netexec@sha256:56c53846f44ea214e4aa5df37c9c50331f0b09e64a32cc7cf17c7e1808d38eef gcr.io/google_containers/netexec:1.7] 8016035} {[docker.io/openshift/hello-openshift@sha256:6e383758181bbab1f88ab7c6e8fa410a74806e478d28097ae0e87e2542940946 docker.io/openshift/hello-openshift:latest] 5635113} {[gcr.io/google_containers/busybox@sha256:d8d3bc2c183ed2f9f10e7258f84971202325ee6011ba137112e01e30f206de67 gcr.io/google_containers/busybox:latest] 2429728} {[gcr.io/google_containers/mounttest@sha256:bec3122ddcf8bd999e44e46e096659f31241d09f5236bc3dc212ea584ca06856 gcr.io/google_containers/mounttest:0.8] 1450761}],VolumesInUse:[],VolumesAttached:[],},}
Jun 24 09:49:23.638: INFO: 
Logging kubelet events for node nettest-node-2
Jun 24 09:49:23.640: INFO: 
Logging pods the kubelet thinks is on node nettest-node-2
Jun 24 09:49:23.812: INFO: 
Latency metrics for node nettest-node-2
Jun 24 09:49:23.812: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.9 Latency:2m1.01285s}
Jun 24 09:49:23.812: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.99 Latency:2m1.01285s}
Jun 24 09:49:23.812: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.5 Latency:2m1.01285s}
�[1mSTEP�[0m: Dumping a list of prepulled images on each node
Jun 24 09:49:23.814: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
�[1mSTEP�[0m: Destroying namespace "e2e-tests-net-services1-r6fq6" for this suite.
Jun 24 09:49:33.915: INFO: namespace: e2e-tests-net-services1-r6fq6, resource: bindings, ignored listing per whitelist
[AfterEach] when using a plugin that isolates namespaces by default
  /data/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:121
�[1mSTEP�[0m: Collecting events from namespace "e2e-tests-net-services2-kzkld".
�[1mSTEP�[0m: Found 5 events.
Jun 24 09:49:33.926: INFO: At 2017-06-24 09:49:13 -0400 EDT - event for service-wget: {kubelet nettest-node-2} MissingClusterDNS: kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
Jun 24 09:49:33.926: INFO: At 2017-06-24 09:49:13.76058865 -0400 EDT - event for service-wget: {kubelet nettest-node-2} Pulling: pulling image "gcr.io/google_containers/busybox"
Jun 24 09:49:33.926: INFO: At 2017-06-24 09:49:14.718743967 -0400 EDT - event for service-wget: {kubelet nettest-node-2} Pulled: Successfully pulled image "gcr.io/google_containers/busybox"
Jun 24 09:49:33.926: INFO: At 2017-06-24 09:49:14.766144822 -0400 EDT - event for service-wget: {kubelet nettest-node-2} Created: Created container
Jun 24 09:49:33.926: INFO: At 2017-06-24 09:49:14.884544859 -0400 EDT - event for service-wget: {kubelet nettest-node-2} Started: Started container
Jun 24 09:49:33.929: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Jun 24 09:49:33.929: INFO: 
Jun 24 09:49:33.931: INFO: 
Logging node info for node nettest-master-node
Jun 24 09:49:33.932: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:nettest-master-node,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/nettest-master-node,UID:fd1eb71b-58e1-11e7-b2f8-0242ed886f07,ResourceVersion:4633,Generation:0,CreationTimestamp:2017-06-24 09:35:29.562394451 -0400 EDT,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: nettest-master-node,},Annotations:map[string]string{volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,},Spec:NodeSpec{PodCIDR:,ExternalID:nettest-master-node,ProviderID:aws:////i-0f53595a2904191a5,Unschedulable:true,Taints:[],},Status:NodeStatus{Capacity:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16388919296 0} {<nil>} 16004804Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Allocatable:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16284061696 0} {<nil>} 15902404Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2017-06-24 09:49:32 -0400 EDT 2017-06-24 09:35:29 -0400 EDT KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2017-06-24 09:49:32 -0400 EDT 2017-06-24 09:35:29 -0400 EDT KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2017-06-24 09:49:32 -0400 EDT 2017-06-24 09:35:29 -0400 EDT KubeletHasNoDiskPressure kubelet has no disk pressure} {Ready True 2017-06-24 09:49:32 -0400 EDT 2017-06-24 09:35:39 -0400 EDT KubeletReady kubelet is posting ready status}],Addresses:[{LegacyHostIP 172.17.0.2} {InternalIP 172.17.0.2} {Hostname nettest-master-node}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:06480741201a4d2db4b65ef51888dac7,SystemUUID:EC284834-9DE9-4589-4ABF-F25A7F48EB4F,BootID:2363077f-393f-4e45-b8bc-d65b7bf50704,KernelVersion:3.10.0-514.21.1.el7.x86_64,OSImage:Fedora 25 (Twenty Five),ContainerRuntimeVersion:docker://1.12.6,KubeletVersion:v1.6.1+5115d708d7,KubeProxyVersion:v1.6.1+5115d708d7,OperatingSystem:linux,Architecture:amd64,},Images:[],VolumesInUse:[],VolumesAttached:[],},}
Jun 24 09:49:33.932: INFO: 
Logging kubelet events for node nettest-master-node
Jun 24 09:49:33.934: INFO: 
Logging pods the kubelet thinks is on node nettest-master-node
Jun 24 09:49:34.032: INFO: 
Latency metrics for node nettest-master-node
Jun 24 09:49:34.032: INFO: 
Logging node info for node nettest-node-1
Jun 24 09:49:34.034: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:nettest-node-1,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/nettest-node-1,UID:fb382d0b-58e1-11e7-b2f8-0242ed886f07,ResourceVersion:4631,Generation:0,CreationTimestamp:2017-06-24 09:35:26.373809879 -0400 EDT,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: nettest-node-1,},Annotations:map[string]string{volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,},Spec:NodeSpec{PodCIDR:,ExternalID:nettest-node-1,ProviderID:aws:////i-0f53595a2904191a5,Unschedulable:false,Taints:[],},Status:NodeStatus{Capacity:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16388919296 0} {<nil>} 16004804Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Allocatable:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16284061696 0} {<nil>} 15902404Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2017-06-24 09:49:29 -0400 EDT 2017-06-24 09:35:26 -0400 EDT KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2017-06-24 09:49:29 -0400 EDT 2017-06-24 09:35:26 -0400 EDT KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2017-06-24 09:49:29 -0400 EDT 2017-06-24 09:35:26 -0400 EDT KubeletHasNoDiskPressure kubelet has no disk pressure} {Ready True 2017-06-24 09:49:29 -0400 EDT 2017-06-24 09:35:36 -0400 EDT KubeletReady kubelet is posting ready status}],Addresses:[{LegacyHostIP 172.17.0.3} {InternalIP 172.17.0.3} {Hostname nettest-node-1}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:06480741201a4d2db4b65ef51888dac7,SystemUUID:EC284834-9DE9-4589-4ABF-F25A7F48EB4F,BootID:2363077f-393f-4e45-b8bc-d65b7bf50704,KernelVersion:3.10.0-514.21.1.el7.x86_64,OSImage:Fedora 25 (Twenty Five),ContainerRuntimeVersion:docker://1.12.6,KubeletVersion:v1.6.1+5115d708d7,KubeProxyVersion:v1.6.1+5115d708d7,OperatingSystem:linux,Architecture:amd64,},Images:[{[docker.io/openshift/origin-haproxy-router@sha256:29975c7c7521b5cd114bfa9c673b51c495916c73dc006caad779ae69c6dc7494 docker.io/openshift/origin-haproxy-router:latest] 989297072} {[docker.io/openshift/origin-pod@sha256:d4033f01475399bd540083b9fec0beb5d4928e8769149996d65743c605b080f9 docker.io/openshift/origin-pod:v3.6.0-alpha.2] 213317180} {[gcr.io/google_containers/hostexec@sha256:cab8d4e2526f8f767c64febe4ce9e0f0e58cd35fdff81b3aadba4dd041ba9f00 gcr.io/google_containers/hostexec:1.2] 13185747} {[gcr.io/google_containers/netexec@sha256:56c53846f44ea214e4aa5df37c9c50331f0b09e64a32cc7cf17c7e1808d38eef gcr.io/google_containers/netexec:1.7] 8016035} {[docker.io/openshift/hello-openshift@sha256:6e383758181bbab1f88ab7c6e8fa410a74806e478d28097ae0e87e2542940946 docker.io/openshift/hello-openshift:latest] 5635113} {[gcr.io/google_containers/porter@sha256:076acdada33f35b917c9eebe89eba95923601302beac57274985e418b70067e2 gcr.io/google_containers/porter:cd5cb5791ebaa8641955f0e8c2a9bed669b1eaab] 5010921} {[gcr.io/google_containers/busybox@sha256:d8d3bc2c183ed2f9f10e7258f84971202325ee6011ba137112e01e30f206de67 gcr.io/google_containers/busybox:latest] 2429728}],VolumesInUse:[],VolumesAttached:[],},}
Jun 24 09:49:34.034: INFO: 
Logging kubelet events for node nettest-node-1
Jun 24 09:49:34.036: INFO: 
Logging pods the kubelet thinks is on node nettest-node-1
Jun 24 09:49:34.163: INFO: 
Latency metrics for node nettest-node-1
Jun 24 09:49:34.163: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.99 Latency:2m2.016163s}
Jun 24 09:49:34.163: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.9 Latency:2m2.016163s}
Jun 24 09:49:34.163: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.5 Latency:2m1.013787s}
Jun 24 09:49:34.163: INFO: {Operation: Method:pod_start_latency_microseconds Quantile:0.99 Latency:30.835795s}
Jun 24 09:49:34.163: INFO: 
Logging node info for node nettest-node-2
Jun 24 09:49:34.165: INFO: Node Info: &Node{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:nettest-node-2,GenerateName:,Namespace:,SelfLink:/api/v1/nodes/nettest-node-2,UID:fb9cd30d-58e1-11e7-b2f8-0242ed886f07,ResourceVersion:4632,Generation:0,CreationTimestamp:2017-06-24 09:35:27.033419394 -0400 EDT,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/os: linux,kubernetes.io/hostname: nettest-node-2,},Annotations:map[string]string{volumes.kubernetes.io/controller-managed-attach-detach: true,},OwnerReferences:[],Finalizers:[],ClusterName:,},Spec:NodeSpec{PodCIDR:,ExternalID:nettest-node-2,ProviderID:aws:////i-0f53595a2904191a5,Unschedulable:false,Taints:[],},Status:NodeStatus{Capacity:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16388919296 0} {<nil>} 16004804Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Allocatable:ResourceList{cpu: {{4 0} {<nil>} 4 DecimalSI},memory: {{16284061696 0} {<nil>} 15902404Ki BinarySI},pods: {{40 0} {<nil>} 40 DecimalSI},},Phase:,Conditions:[{OutOfDisk False 2017-06-24 09:49:30 -0400 EDT 2017-06-24 09:35:27 -0400 EDT KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2017-06-24 09:49:30 -0400 EDT 2017-06-24 09:35:27 -0400 EDT KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2017-06-24 09:49:30 -0400 EDT 2017-06-24 09:35:27 -0400 EDT KubeletHasNoDiskPressure kubelet has no disk pressure} {Ready True 2017-06-24 09:49:30 -0400 EDT 2017-06-24 09:35:37 -0400 EDT KubeletReady kubelet is posting ready status}],Addresses:[{LegacyHostIP 172.17.0.4} {InternalIP 172.17.0.4} {Hostname nettest-node-2}],DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:06480741201a4d2db4b65ef51888dac7,SystemUUID:EC284834-9DE9-4589-4ABF-F25A7F48EB4F,BootID:2363077f-393f-4e45-b8bc-d65b7bf50704,KernelVersion:3.10.0-514.21.1.el7.x86_64,OSImage:Fedora 25 (Twenty Five),ContainerRuntimeVersion:docker://1.12.6,KubeletVersion:v1.6.1+5115d708d7,KubeProxyVersion:v1.6.1+5115d708d7,OperatingSystem:linux,Architecture:amd64,},Images:[{[docker.io/openshift/origin-haproxy-router@sha256:29975c7c7521b5cd114bfa9c673b51c495916c73dc006caad779ae69c6dc7494 docker.io/openshift/origin-haproxy-router:latest] 989297072} {[docker.io/openshift/origin-pod@sha256:d4033f01475399bd540083b9fec0beb5d4928e8769149996d65743c605b080f9 docker.io/openshift/origin-pod:v3.6.0-alpha.2] 213317180} {[gcr.io/google_containers/hostexec@sha256:cab8d4e2526f8f767c64febe4ce9e0f0e58cd35fdff81b3aadba4dd041ba9f00 gcr.io/google_containers/hostexec:1.2] 13185747} {[gcr.io/google_containers/netexec@sha256:56c53846f44ea214e4aa5df37c9c50331f0b09e64a32cc7cf17c7e1808d38eef gcr.io/google_containers/netexec:1.7] 8016035} {[docker.io/openshift/hello-openshift@sha256:6e383758181bbab1f88ab7c6e8fa410a74806e478d28097ae0e87e2542940946 docker.io/openshift/hello-openshift:latest] 5635113} {[gcr.io/google_containers/busybox@sha256:d8d3bc2c183ed2f9f10e7258f84971202325ee6011ba137112e01e30f206de67 gcr.io/google_containers/busybox:latest] 2429728} {[gcr.io/google_containers/mounttest@sha256:bec3122ddcf8bd999e44e46e096659f31241d09f5236bc3dc212ea584ca06856 gcr.io/google_containers/mounttest:0.8] 1450761}],VolumesInUse:[],VolumesAttached:[],},}
Jun 24 09:49:34.166: INFO: 
Logging kubelet events for node nettest-node-2
Jun 24 09:49:34.167: INFO: 
Logging pods the kubelet thinks is on node nettest-node-2
Jun 24 09:49:34.283: INFO: 
Latency metrics for node nettest-node-2
Jun 24 09:49:34.283: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.9 Latency:2m1.01285s}
Jun 24 09:49:34.283: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.5 Latency:2m1.01285s}
Jun 24 09:49:34.283: INFO: {Operation:sync Method:pod_worker_latency_microseconds Quantile:0.99 Latency:2m1.01285s}
�[1mSTEP�[0m: Dumping a list of prepulled images on each node
Jun 24 09:49:34.285: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
�[1mSTEP�[0m: Destroying namespace "e2e-tests-net-services2-kzkld" for this suite.
Jun 24 09:49:44.323: INFO: namespace: e2e-tests-net-services2-kzkld, resource: bindings, ignored listing per whitelist

@0xmichalis
Copy link
Contributor

@bparees
Copy link
Contributor Author

bparees commented Jul 21, 2017

@bparees
Copy link
Contributor Author

bparees commented Jul 27, 2017

@bparees
Copy link
Contributor Author

bparees commented Jul 29, 2017

@danwinship
Copy link
Contributor

missed this in the original report before:

StartedAt:2017-05-26 23:15:05 -0400 EDT,FinishedAt:2017-05-26 23:15:08 -0400 EDT

ie, the pod only ran for 3 seconds, despite being wget --timeout=10 ...

logs from the latest flaked run show:

Jul 29 22:12:39 nettest-node-1 dockerd-current[151]: Connecting to 172.30.126.86:8080 (172.30.126.86:8080)
Jul 29 22:12:42 nettest-node-1 dockerd-current[151]: wget: can't connect to remote host (172.30.126.86): No route to host

Not exactly sure what triggers "no route to host" as opposed to a normal connect timeout but it's definitely a thing that happens sometimes with service IP addresses... From the logs it looks like the iptables and OVS rules are both set up correctly.

We don't do a readiness check on the webserver pod, but it got started 8 seconds before the wget command runs in this case so I don't think that's the problem.

@danwinship danwinship assigned danwinship and unassigned knobunc Aug 1, 2017
@danwinship
Copy link
Contributor

Not exactly sure what triggers "no route to host" as opposed to a normal connect timeout but it's definitely a thing that happens sometimes with service IP addresses

OK, looks like it happens if there is no iptables rule for the service IP; in that case OVS will forward the packet to the node via tun0, but the node won't have an iptables rule to rewrite it, and so then the node's routing table says that 172.30.0.0/16 should be forwarded to tun0, but that's where the packet came from, so it won't re-route it there, so it will respond with an ICMP error.

But the logs seem to show that the iptables rule was there...

@bparees
Copy link
Contributor Author

bparees commented Aug 1, 2017

openshift-merge-robot added a commit that referenced this issue Aug 26, 2017
Automatic merge from submit-queue (batch tested with PRs 15942, 15940, 15957, 15858, 15946)

Make service e2e tests retry to avoid flakes

This is an attempt to fix #14385; given that our tests tend to flake but the upstream service tests don't, it seems like we should make our tests more like theirs. So this replaces our `checkConnectivityToHost` code with code mostly copied from the upstream `execSourceipTest` (which, among other things, retries on failure until the timeout is reached).

There are actually a lot of changes we could make to our tests to use new upstream code, but I wanted to keep this simple for now to avoid introducing new flakes.

Fixes #14385 (hopefully)
@jsafrane
Copy link
Contributor

jsafrane commented Sep 8, 2017

wget: can't connect to remote host (172.30.38.87): No route to host happened again in https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16232/test_pull_request_origin_extended_networking_minimal/7177/

openshift-merge-robot added a commit that referenced this issue Oct 22, 2017
Automatic merge from submit-queue.

Try to collect some debug info if e2e service test fails

Attempting to debug #14385; the logs don't show anything suspicious so let's try to get more logs

@knobunc One thing I noticed is that the test is using the hybrid proxier, so if you have any ideas on possible failure modes with that (or additional debug commands that might be useful with it), let me know.
@danwinship
Copy link
Contributor

OK, in https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/test_branch_origin_extended_networking/1051/, the test fails, and we see that the expected iptables rules for the service don't exist. In fact, weirdly, there is no rule at all for the service (neither a functioning redirect rule nor a "no endpoints -> -j REJECT" rule).

From the node-1 logs:

I1028 00:18:55.658300     297 kubelet.go:1861] SyncLoop (UPDATE, "api"): "service-6wdqz_e2e-tests-net-services1-2czlv(94943bae-bb75-11e7-83eb-0242bb87345f)"
I1028 00:18:55.665635     297 config.go:224] Calling handler.OnServiceAdd
I1028 00:18:55.665742     297 proxier.go:320] Adding new service port "e2e-tests-net-services1-2czlv/service-6wdqz:" at 172.30.97.146:8080/TCP
I1028 00:18:55.665779     297 proxier.go:1018] Syncing iptables rules
I1028 00:18:55.668874     297 config.go:124] Calling handler.OnEndpointsAdd
I1028 00:18:55.668935     297 roundrobin.go:276] LoadBalancerRR: Setting endpoints for e2e-tests-net-services1-2czlv/service-6wdqz: to [10.130.0.27:8080]
I1028 00:18:55.669065     297 roundrobin.go:100] LoadBalancerRR service "e2e-tests-net-services1-2czlv/service-6wdqz:" did not exist, created
I1028 00:18:55.669148     297 proxier.go:882] Setting endpoints for "e2e-tests-net-services1-2czlv/service-6wdqz:" to [10.130.0.27:8080]
I1028 00:18:55.720094     297 iptables.go:314] running iptables-save [-t filter]
I1028 00:18:55.727359     297 iptables.go:314] running iptables-save [-t nat]
I1028 00:18:55.732888     297 iptables.go:369] running iptables-restore [--noflush --counters]
I1028 00:18:55.739575     297 proxier.go:993] syncProxyRules took 73.83774ms
I1028 00:18:55.739612     297 bounded_frequency_runner.go:221] sync-runner: ran, next possible in 0s, periodic in 30s
I1028 00:18:55.739656     297 proxier.go:1018] Syncing iptables rules
I1028 00:18:55.780044     297 iptables.go:314] running iptables-save [-t filter]
I1028 00:18:55.784373     297 iptables.go:314] running iptables-save [-t nat]
I1028 00:18:55.789180     297 iptables.go:369] running iptables-restore [--noflush --counters]
I1028 00:18:55.796048     297 healthcheck.go:232] Not saving endpoints for unknown healthcheck "e2e-tests-net-services1-2czlv/service-6wdqz"
I1028 00:18:55.796086     297 proxier.go:993] syncProxyRules took 56.458509ms

The node-2 logs look similar but don't have the "Not saving endpoints for unknown healthcheck" error. It looks like this might be a bad interaction between the roughly-simultaneous-in-different-goroutines OnServiceAdd and OnEndpointsAdd routines.

@smarterclayton
Copy link
Contributor

@danwinship I've seen this happen a few times recently, I'm assuming this isn't "new"?

@danwinship
Copy link
Contributor

Hm... #18258 shows something different actually. It looks like it's probably the kernel-namespaces-leaking-into-random-goroutines bug discussed in #15991. The logs are full of "impossible" networking errors, like the KUBE-SERVICES table temporarily not existing when trying to add the rules for this service:

Jan 23 23:44:50 ci-prtest-5a37c28-14885-ig-n-bfkx origin-node[2198]: I0123 23:44:50.299415    2198 proxier.go:329] Adding new service port "e2e-tests-net-services1-zx9wf/service-xmp4d:" at 172.30.155.166:8080/TCP
Jan 23 23:44:50 ci-prtest-5a37c28-14885-ig-n-bfkx origin-node[2198]: E0123 23:44:50.317131    2198 proxier.go:1030] Failed to ensure that filter chain INPUT jumps to KUBE-SERVICES: error checking rule: exit status 2: iptables v1.4.21: Couldn't load target `KUBE-SERVICES':No such file or directory

and lots of

Jan 23 23:45:49 ci-prtest-5a37c28-14885-ig-n-bfkx origin-node[2198]: E0123 23:45:49.447158    2198 cni.go:259] Error adding network: CNI request failed with status 400: 'error on port veth2cb87c6e: "could not open network device veth2cb87c6e (No such device)"

I'm pretty sure I didn't see any evidence of #15991 in the logs the last time I looked at one of these flakes though. Though it seems weird that there would be two completely unrelated causes of the same flake (and we had even suspected #15991 of being the cause of this flake before), so it's possible that I just missed it before.

The good news is that the golang 1.10 runtime fixes this (golang/go#20676).

openshift-merge-robot added a commit that referenced this issue Feb 2, 2018
Automatic merge from submit-queue (batch tested with PRs 18376, 18355).

Move pod-namespace calls out of process

As discussed in #15991, we need to move all operations in the pod's network namespace out of process, due to a golang issue that allows setns() calls in a locked thread to leak into other threads, causing random lossage as operations intended for the main network namespace end up running in other namespaces instead. (This is fixed in golang 1.10 but we need a fix before then.)

Fixes #15991
Fixes #14385
Fixes #13108
Fixes #18317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/networking kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1
Projects
None yet
8 participants