NetworkCheck blocks keyboard interrupt #16847

sosiouxme · 2017-10-12T19:52:21Z

There is some code in the NetworkCheck diagnostic to handle an interrupt; however it has the effect of blocking them until the diagnostic completes, which is not user friendly.

Version

$ oc version
oc v3.7.0-alpha.1+86fefdf-1060-dirty
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth

But this is probably true since the diagnostic was introduced.

Steps To Reproduce

oc adm diagnostics NetworkCheck
Ctrl-C

Current Result

Continues until the diagnostic completes

Expected Result

Aborts

The text was updated successfully, but these errors were encountered:

fixes openshift#16847

pravisankar · 2017-10-13T02:56:22Z

@sosiouxme I could not reproduce the issue on my local machine (using dind cluster). What's the issue with the current code?

[ravi@dhcp-16-230 origin]$ oc version
oc v3.7.0-alpha.1+c80317c-1034-dirty
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth

[ravi@dhcp-16-230 origin]$ oc adm diagnostics NetworkCheck
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/tmp/openshift-dind-cluster/openshift/openshift.local.config/master/admin.kubeconfig'

[Note] Running diagnostic: NetworkCheck
       Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint

^CERROR: [DNet2006 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:137]
       Creating network diagnostic pod "network-diag-pod-bg916" on node "openshift-node-1" with command "openshift infra network-diagnostic-pod -l 1" failed: namespaces "network-diag-ns-2ml4c" not found

[Note] Summary of diagnostics execution (version v3.7.0-alpha.1+c80317c-1034-dirty):
[Note] Errors seen: 1
[ravi@dhcp-16-230 origin]$

sosiouxme · 2017-10-13T13:02:41Z

The current code does nothing to end the diagnostic when an interrupt is received. The goroutine performs cleanup and exits, but nothing else happens until the diagnostic completes whatever it was doing (in your case, it completed with an error because the namespace has been deleted out from under it; if the pods are already created it takes a lot longer to finish). Worse, the signal handler is still directing interrupts to the channel, but nothing is listening to that channel, so any further interrupts go to limbo.

$ oc adm diagnostics  NetworkCheck  ClusterRoleBindings
[...]
[Note] Running diagnostic: NetworkCheck
       Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint
       
^C^C^C^C^C^CERROR: [DNet2006 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:137]
       Creating network diagnostic pod "network-diag-pod-x1vhb" on node "ip-172-18-0-248.ec2.internal" with command "openshift infra network-diagnostic-pod -l 1" failed: pods "network-diag-pod-x1vhb" is forbidden: unable to create new content in namespace network-diag-ns-ss4qd because it is being terminated.
       
[Note] Running diagnostic: ClusterRoleBindings
       Description: Check that the default ClusterRoleBindings are present and contain the expected subjects
       
^C^C^C^C^C^C^C^CInfo:  clusterrolebinding/cluster-readers has more subjects than expected.
       
       Use the `oc adm policy reconcile-cluster-role-bindings` command to update the role binding to remove extra subjects.
[...]

pravisankar · 2017-10-13T16:05:49Z

@sosiouxme Thanks, now I understood the actual issue and the fix.
Main changes:

Running actual check in a go routine along with a channel helps to terminate the diagnostic faster.
Stopping the signal notification once we catch it in a diagnostic allows other diagnostic checks to react to the signal if needed.

Automatic merge from submit-queue (batch tested with PRs 16848, 16874). Fix some diagnostic error handling (NetworkCheck and DiagnosticPod) Fixes #16847 A keyboard interrupt on the NetworkCheck diagnostic will actually abort it (giving it a chance to clean up) and proceed to the next diagnostic. The same is done for DiagnosticPod (which previously did not catch the signal and cleanup at all).

sosiouxme self-assigned this Oct 12, 2017

sosiouxme added a commit to sosiouxme/origin that referenced this issue Oct 12, 2017

NetworkCheck: handle interrupt

0ccc922

fixes openshift#16847

sosiouxme mentioned this issue Oct 12, 2017

Fix some diagnostic error handling (NetworkCheck and DiagnosticPod) #16848

Merged

openshift-merge-robot closed this as completed in #16848 Oct 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetworkCheck blocks keyboard interrupt #16847

NetworkCheck blocks keyboard interrupt #16847

sosiouxme commented Oct 12, 2017

pravisankar commented Oct 13, 2017 •

edited

Loading

sosiouxme commented Oct 13, 2017 •

edited

Loading

pravisankar commented Oct 13, 2017

NetworkCheck blocks keyboard interrupt #16847

NetworkCheck blocks keyboard interrupt #16847

Comments

sosiouxme commented Oct 12, 2017

Version

Steps To Reproduce

Current Result

Expected Result

pravisankar commented Oct 13, 2017 • edited Loading

sosiouxme commented Oct 13, 2017 • edited Loading

pravisankar commented Oct 13, 2017

pravisankar commented Oct 13, 2017 •

edited

Loading

sosiouxme commented Oct 13, 2017 •

edited

Loading