Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move pod-namespace calls out of process #18355

Merged

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented Jan 30, 2018

As discussed in #15991, we need to move all operations in the pod's network namespace out of process, due to a golang issue that allows setns() calls in a locked thread to leak into other threads, causing random lossage as operations intended for the main network namespace end up running in other namespaces instead. (This is fixed in golang 1.10 but we need a fix before then.)

Fixes #15991
Fixes #14385
Fixes #13108
Fixes #18317
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1539987

@danwinship danwinship added kind/bug Categorizes issue or PR as related to a bug. component/networking sig/networking labels Jan 30, 2018
@danwinship danwinship requested a review from dcbw January 30, 2018 19:46
@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 30, 2018
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 30, 2018
@knobunc
Copy link
Contributor

knobunc commented Jan 31, 2018

I hate that we have to do this, but the code looks as reasonable as it can be.

It was only being used in one remaining case (to get the pod IP when
tearing down a pod), but we can just always use the fallback code to
get that information from OVS instead.
This requires two OVS commands rather than one but it saves us having
to parse all that dump-flows output.
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2018
@danwinship
Copy link
Contributor Author

/test extended_networking

@danwinship
Copy link
Contributor Author

/test extended_networking
/test unit

@danwinship danwinship changed the title [WIP] Move pod-namespace calls out of process Move pod-namespace calls out of process Feb 1, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 1, 2018
@danwinship
Copy link
Contributor Author

@dcbw I think this is ready for review.

@smarterclayton
Copy link
Contributor

/test gcp

@smarterclayton
Copy link
Contributor

/test gcp

The DNS flake is currently undiagnosed, but doesn't appear to be network related.

However, I notice when that test fails I see the following things in the logs on that node for the specific pod (dns-test):

Feb 01 18:20:44 ci-prtest-5a37c28-15429-ig-n-qzmw origin-node[10940]: I0201 18:20:44.622016   10940 kubelet.go:1852] SyncLoop (DELETE, "api"): "dns-test-8a31e45e-077c-11e8-9376-0e8c3420a542_e2e-tests-dns-g9vhj(8a3a5941-077c-11e8-b172-42010a8e0005)"                                                                   
Feb 01 18:20:44 ci-prtest-5a37c28-15429-ig-n-qzmw origin-node[10940]: I0201 18:20:44.622162   10940 kubelet.go:1846] SyncLoop (REMOVE, "api"): "dns-test-8a31e45e-077c-11e8-9376-0e8c3420a542_e2e-tests-dns-g9vhj(8a3a5941-077c-11e8-b172-42010a8e0005)"                                                                   
Feb 01 18:20:44 ci-prtest-5a37c28-15429-ig-n-qzmw origin-node[10940]: W0201 18:20:44.688273   10940 docker_sandbox.go:340] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "dns-test-8a31e45e-077c-11e8-9376-0e8c3420a542_e2e-tests-dns-g9vhj": Unexpected command output nsenter: cannot open /proc/49301/ns/net: No such file or directory
Feb 01 18:20:44 ci-prtest-5a37c28-15429-ig-n-qzmw origin-node[10940]: with error: exit status 1                                                                                                                                                                                                                           

and then a little later for a different pod:

Feb 01 18:20:46 ci-prtest-5a37c28-15429-ig-n-qzmw origin-node[10940]: W0201 18:20:46.019187   10940 cni.go:242] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "b4042e74db575730b2de71c100638609f216e87c625dd6267e11d9b3a5fbc207"                               
Feb 01 18:20:46 ci-prtest-5a37c28-15429-ig-n-qzmw origin-node[10940]: I0201 18:20:46.053343   10940 ovs.go:147] Error executing ovs-ofctl: 2018-02-01T18:20:46Z|00001|ofp_util|WARN|Negative value -1 is not a valid port number.                                                                                          
Feb 01 18:20:46 ci-prtest-5a37c28-15429-ig-n-qzmw origin-node[10940]: ovs-ofctl: -1: port value out of range for in_port                                                                                                                                                                                                   
Feb 01 18:20:46 ci-prtest-5a37c28-15429-ig-n-qzmw ovs-vsctl[56743]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --may-exist add-port br0 veth94082da1 -- set Interface veth94082da1 external-ids=sandbox=35d61949b758ff0d3479e24da1bc26406b577cb30430b23bfd75a09d4c414a1c                                                     
Feb 01 18:20:46 ci-prtest-5a37c28-15429-ig-n-qzmw ovs-vsctl[56850]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists clear port veth94082da1 qos                                                                                                                                                                      

@smarterclayton
Copy link
Contributor

/retest

@smarterclayton
Copy link
Contributor

/test gcp

@smarterclayton
Copy link
Contributor

I launched 4 jobs for GCP in parallel, that should give good coverage on the original flake issue.

@smarterclayton
Copy link
Contributor

One of them failed on a web console startup failure - that could be network related (web console is the only component that in theory needs pod network started during the install). Rest of the runs went fine.

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 2, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, knobunc

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@danwinship
Copy link
Contributor Author

Feb 01 18:20:44 ci-prtest-5a37c28-15429-ig-n-qzmw origin-node[10940]: W0201 18:20:44.688273 10940 docker_sandbox.go:340] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "dns-test-8a31e45e-077c-11e8-9376-0e8c3420a542_e2e-tests-dns-g9vhj": Unexpected command output nsenter: cannot open /proc/49301/ns/net: No such file or directory

That's not a new thing and I don't think it indicates anything actually going wrong with the pod: it's just kubelet trying to get a status update on a pod that it has already started tearing down. (We ought to fix it to be better about that...)

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue (batch tested with PRs 18376, 18355).

@openshift-merge-robot openshift-merge-robot merged commit 1927cae into openshift:master Feb 2, 2018
@smarterclayton
Copy link
Contributor

Can you open a bug for that and assign it to Seth? I think it happens on every pod termination now.

@danwinship
Copy link
Contributor Author

filed #18414 for the pod termination vs pod status error

func (p *cniPlugin) CmdAdd(args *skel.CmdArgs) (types.Result, error) {
body, err := p.doCNI("http://dummy/", newCNIRequest(args))
func (p *cniPlugin) doCNIServerAdd(req *cniserver.CNIRequest, hostVeth string) (types.Result, error) {
req.Env["OSDN_HOSTVETH"] = hostVeth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to do this in the environment rather than just stuff it into the JSON by extending CNIRequest?


func (p *cniPlugin) CmdAdd(args *skel.CmdArgs) error {
req := newCNIRequest(args)
ifname := req.Env["CNI_IFNAME"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.IfName and args.Netns. But you can ignore the "" check if you want as these are already checked for you in the 'skel' package.

},
}
index := 0
result030.IPs[0].Interface = &index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be " = current.Int(0)" now.

@@ -323,64 +329,58 @@ func (oc *ovsController) SetPodBandwidth(hostVeth, sandboxID string, ingressBPS,
return nil
}

func (oc *ovsController) getPodDetailsBySandboxID(sandboxID string) (int, string, string, error) {
func (oc *ovsController) getPodDetailsBySandboxID(sandboxID string) (int, net.IP, error) {
Copy link
Contributor

@dcbw dcbw Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what if we just stuff the PodIP into external-ids? Then we don't need to dump and parse flows all the time.

(edit: removed bit about getting ofport via ovs-vsctl which was wrong...)

openshift-merge-robot added a commit that referenced this pull request Feb 26, 2018
Automatic merge from submit-queue (batch tested with PRs 18737, 18418).

Minor cleanups to sdn-cni-plugin

Based on belated comments on #18355:

> So what if we just stuff the PodIP into external-ids? Then we don't need to dump and parse flows all the time.

I had thought about that, but then we'd have to change ovs.Find() to return multiple columns (`ofport` and `external-ids`) and we'd have to parse the external-ids to separate out the sandboxID from the podIP. Though, admittedly, there's already code to parse external-ids in fake_ovs now anyway...

Also, we're only parsing a single flow now, because we request "in_port=%d" in the dump-flows.

> Any reason to do [hostVeth] in the environment rather than just stuff it into the JSON by extending CNIRequest?

I actually did do it that way first, and then changed it. Something about adding a field to CNIRequest had weird unexpected side effects. IIRC, it would have required a whole bunch of changes to the unit tests or something like that.

But I can change it if you prefer.
@danwinship danwinship deleted the sdn-out-of-process branch March 7, 2018 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/networking kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. sig/networking size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
6 participants