OADM diagnostic SDN errors #18392

parthasarathi-DXC · 2018-02-01T17:05:58Z

After multiple reinstallation (on newly built RHEL7 VMs) oadm diagnostic shows 2 SDN error. Unable to create pods. Kindly help me to resolve this error.

###oc version

oc v3.6.173.0.49
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://master1.cat.test:8443
openshift v3.6.173.0.49
kubernetes v1.6.1+5115d708d7

Steps To Reproduce

oadm diagnostic

Results

failed

       Info:  Service account token successfully authenticated to master
       ERROR: [DP1014 from diagnostic PodCheckAuth@openshift/origin/pkg/diagnostics/pod/auth.go:174]
              Request to integrated registry timed out; this typically indicates network or SDN problems.


ERROR: [DNet2005 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:119]
       Setting up test environment for network diagnostics failed: Failed to run network diags test pod and service: Failed to run network diags test pods, failed: 21, total: 24

Expected Result

Pass

Additional Information

Attached error and oc get all output.
putty.log
oadm_diag_error.txt

The text was updated successfully, but these errors were encountered:

parthasarathi-DXC · 2018-02-05T06:40:00Z

Hi Team,

Greetings. Any updates on this?

sosiouxme · 2018-02-05T16:40:39Z

@parthasarathi-DXC are you noticing any problems aside from diagnostics reporting errors?

Diagnostics are intended to help point you in a helpful direction and give information when something is going wrong. It would be nice if they could say exactly what and why is going wrong, but it's unusual problems are so simply solved, so we kind of have to settle for pointers.

Here we're seeing a few things going wrong.

Error logs in the registry indicate it's having trouble authorizing against the OpenShift API: Get user failed with error: Get https://172.30.0.1:443/oapi/v1/users/~: Service Unavailable -- that's pretty interesting, may want to rsh into the registry pod and try a connection manually. I'm honestly not sure what would cause that. Check registry logs around that as well as master logs / API audit logs. Service unavailable tends to point at something wrong with the API or a proxy in the middle or something -- you'd get a more helpful error if it was reaching the service.
PodCheckAuth reporting Request to integrated registry timed out -- may be related to previous problem, or could be networking-related. Unfortunately when networking is broken the main symptom is that connections fail and you have to dig a lot deeper to figure out why.
NetworkCheck failed Failed to run network diags test pods, failed: 21, total: 24 -- I would like this diagnostic to be a lot more helpful when the test pods fail to run; 3.7 should be a bit better about this so you might want to try a 3.7 client. Three out of 24 did run so it would be interesting to see what happened to the others. You could try getting a project list while NetworkCheck is running and seeing which pods are succeeding/failing in the network check projects and go from there.

0xmichalis · 2018-02-06T11:17:04Z

@openshift/sig-networking

sosiouxme · 2018-02-08T13:55:26Z

/unassign

openshift-bot · 2018-05-09T18:13:57Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2018-06-08T18:42:32Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2018-07-08T18:45:44Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

jwforres assigned sosiouxme and knobunc Feb 1, 2018

jwforres added the kind/question label Feb 1, 2018

openshift-ci-robot added the sig/networking label Feb 6, 2018

openshift-ci-robot unassigned sosiouxme Feb 8, 2018

knobunc assigned pravisankar and unassigned knobunc Feb 8, 2018

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 9, 2018

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 8, 2018

openshift-ci-robot closed this as completed Jul 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OADM diagnostic SDN errors #18392

OADM diagnostic SDN errors #18392

parthasarathi-DXC commented Feb 1, 2018 •

edited by sosiouxme

Loading

parthasarathi-DXC commented Feb 5, 2018

sosiouxme commented Feb 5, 2018

0xmichalis commented Feb 6, 2018

sosiouxme commented Feb 8, 2018

openshift-bot commented May 9, 2018

openshift-bot commented Jun 8, 2018

openshift-bot commented Jul 8, 2018

OADM diagnostic SDN errors #18392

OADM diagnostic SDN errors #18392

Comments

parthasarathi-DXC commented Feb 1, 2018 • edited by sosiouxme Loading

Steps To Reproduce

Results

Expected Result

Additional Information

parthasarathi-DXC commented Feb 5, 2018

sosiouxme commented Feb 5, 2018

0xmichalis commented Feb 6, 2018

sosiouxme commented Feb 8, 2018

openshift-bot commented May 9, 2018

openshift-bot commented Jun 8, 2018

openshift-bot commented Jul 8, 2018

parthasarathi-DXC commented Feb 1, 2018 •

edited by sosiouxme

Loading