Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Support removing networking from stopped VM #661

Merged
merged 1 commit into from
Aug 17, 2020
Merged

Support removing networking from stopped VM #661

merged 1 commit into from
Aug 17, 2020

Conversation

innobead
Copy link
Contributor

Fix #660

@innobead innobead requested a review from twelho as a code owner August 15, 2020 16:23
pkg/operations/remove.go Outdated Show resolved Hide resolved
pkg/operations/remove.go Outdated Show resolved Hide resolved
pkg/operations/remove.go Outdated Show resolved Hide resolved
pkg/operations/remove.go Outdated Show resolved Hide resolved
@darkowlzz
Copy link
Contributor

darkowlzz commented Aug 17, 2020

Thanks for working on this bug. This could fix the VM restart issue due to the same duplicate IP allocation error as seen in #504 (comment) .

Copy link
Contributor

@twelho twelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @innobead 👋

Thanks for digging into this 👍
I have just some nits, and @darkowlzz has some points that would be good to fix before merging. Otherwise it's looking very good!

pkg/operations/remove.go Outdated Show resolved Hide resolved
pkg/operations/remove.go Outdated Show resolved Hide resolved
pkg/operations/remove.go Outdated Show resolved Hide resolved
pkg/operations/remove.go Outdated Show resolved Hide resolved
@innobead
Copy link
Contributor Author

@darkowlzz @twelho thanks for the review 👍 All comments/suggestions resolved.

@twelho
Copy link
Contributor

twelho commented Aug 17, 2020

Hi @innobead, thanks for your contributions! I have one more improvement to get rid of the isRunning boolean for removal entirely, since our usage of it was not correct in the first place. Could you apply the following patch using git apply and do one more commit? Thanks!

diff --git a/pkg/network/cni/cni.go b/pkg/network/cni/cni.go
index 7f953da1..c3e6877d 100644
--- a/pkg/network/cni/cni.go
+++ b/pkg/network/cni/cni.go
@@ -189,7 +189,7 @@ func cniToIgniteResult(r *gocni.CNIResult) *network.Result {
        return result
 }
 
-func (plugin *cniNetworkPlugin) RemoveContainerNetwork(containerID string, isRunning bool) (err error) {
+func (plugin *cniNetworkPlugin) RemoveContainerNetwork(containerID string) (err error) {
        if err = plugin.initialize(); err != nil {
                return err
        }
@@ -208,13 +208,10 @@ func (plugin *cniNetworkPlugin) RemoveContainerNetwork(containerID string, isRun
                return nil
        }
 
-       netnsPath := ""
-       if isRunning {
-               netnsPath = fmt.Sprintf(netNSPathFmt, c.PID)
-               if c.PID == 0 {
-                       log.Info("CNI failed to retrieve network namespace path, PID was 0")
-                       return nil
-               }
+       netnsPath := fmt.Sprintf(netNSPathFmt, c.PID)
+       if c.PID == 0 {
+               netnsPath = "" // Using a blank netns triggers a best effort cleanup
+               log.Warn("CNI failed to retrieve network namespace path, PID was 0")
        }
 
        return plugin.cni.Remove(context.Background(), containerID, netnsPath)
diff --git a/pkg/network/docker/docker.go b/pkg/network/docker/docker.go
index e3377baa..4d419267 100644
--- a/pkg/network/docker/docker.go
+++ b/pkg/network/docker/docker.go
@@ -44,7 +44,7 @@ func (plugin *dockerNetworkPlugin) SetupContainerNetwork(containerID string, _ .
        }, nil
 }
 
-func (*dockerNetworkPlugin) RemoveContainerNetwork(_ string, _ bool) error {
+func (*dockerNetworkPlugin) RemoveContainerNetwork(_ string) error {
        // no-op for docker, this is handled automatically
        return nil
 }
diff --git a/pkg/network/types.go b/pkg/network/types.go
index 89273292..f91ce053 100644
--- a/pkg/network/types.go
+++ b/pkg/network/types.go
@@ -21,7 +21,7 @@ type Plugin interface {
        SetupContainerNetwork(containerID string, portmappings ...meta.PortMapping) (*Result, error)
 
        // RemoveContainerNetwork is the method called before a container using the network plugin can be deleted
-       RemoveContainerNetwork(containerID string, isRunning bool) error
+       RemoveContainerNetwork(containerID string) error
 }
 
 type Result struct {
diff --git a/pkg/operations/remove.go b/pkg/operations/remove.go
index 1fbf907b..2786c254 100644
--- a/pkg/operations/remove.go
+++ b/pkg/operations/remove.go
@@ -83,7 +83,7 @@ func StopVM(vm *api.VM, kill, silent bool) error {
        }
 
        // Remove VM networking
-       if err = removeNetworking(util.NewPrefixer().Prefix(vm.GetUID()), true); err != nil {
+       if err = removeNetworking(util.NewPrefixer().Prefix(vm.GetUID())); err != nil {
                log.Warnf("Failed to cleanup networking for stopped container %s %q: %v", vm.GetKind(), vm.GetUID(), err)
 
                return err
@@ -116,7 +116,7 @@ func StopVM(vm *api.VM, kill, silent bool) error {
        return nil
 }
 
-func removeNetworking(containerID string, isRunning bool) error {
+func removeNetworking(containerID string) error {
        log.Infof("Removing the container with ID %q from the %q network", containerID, providers.NetworkPlugin.Name())
-       return providers.NetworkPlugin.RemoveContainerNetwork(containerID, isRunning)
+       return providers.NetworkPlugin.RemoveContainerNetwork(containerID)
 }

@innobead
Copy link
Contributor Author

innobead commented Aug 17, 2020

@twelho I can clean up a bit by using this patch but I doubt if this will cause some side effect.

Because namespace path == "" will be tackled in cni client lib when VM is not running. So after the patch applied, removing the CNI network will be failed.

if err := network.Remove(ctx, ns); err != nil {
// Based on CNI spec v0.7.0, empty network namespace is allowed to
// do best effort cleanup. However, it is not handled consistently
// right now:
// https://github.com/containernetworking/plugins/issues/210
// TODO(random-liu): Remove the error handling when the issue is
// fixed and the CNI spec v0.6.0 support is deprecated.
if path == "" && strings.Contains(err.Error(), "no such file or directory") {
continue
}
return err

@twelho
Copy link
Contributor

twelho commented Aug 17, 2020

Upon stopping a VM the container running ignite-spawn will be deleted, and before that happens the network is removed. We do an InspectContainer call in RemoveContainerNetwork in pkg/network/cni/cni.go, so the container needs to exist at that point, thus we can fill in netnsPath. If the PID is zero for some reason, we can queue a best effort cleanup based on the code block you linked, which should not fail even though the netns is already removed.

Since you already pass true to isRunning currently, the patch is functionally equivalent except for always calling the removal, with a blank netnsPath if the returned PID is 0. If the container has been externally removed, RemoveContainerNetwork will stop at InspectContainer, both before and after the patch.

I tested this locally using Docker:

$ sudo ignite --runtime docker start a
INFO[0001] Networking is handled by "cni"               
INFO[0001] Started Firecracker VM "a6c4217ec5f07295" in a container with ID "22a0997d7119e75c17ed3ff6b9a34f46eba82d1bdebe29e3035b6191091c6939" 
$ sudo docker ps -a
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS                  PORTS               NAMES
22a0997d7119        weaveworks/ignite:dev   "/usr/local/bin/igni…"   2 seconds ago       Up 2 seconds                                ignite-a6c4217ec5f07295
$ sudo docker stop ignite-a6c4217ec5f07295 # Stopping will automatically remove the container using Docker's autoremove
ignite-a6c4217ec5f07295
$ sudo ignite --runtime docker stop a
WARN[0000] VM "a6c4217ec5f07295" is not running but trying to cleanup networking for stopped container 
INFO[0000] Removing the container with ID "ignite-a6c4217ec5f07295" from the "cni" network 
INFO[0000] CNI failed to retrieve network namespace path: Error: No such container: ignite-a6c4217ec5f07295
$

and using containerd:

$ sudo ignite start b
INFO[0000] Networking is handled by "cni"               
INFO[0000] Started Firecracker VM "b7ddccf23b066051" in a container with ID "ignite-b7ddccf23b066051" 
$ sudo ctr -n firecracker task ls
TASK                       PID       STATUS    
ignite-b7ddccf23b066051    126854    RUNNING
$ sudo ctr -n firecracker task kill ignite-b7ddccf23b066051
$ sudo ctr -n firecracker task ls
TASK                       PID       STATUS    
ignite-b7ddccf23b066051    126854    STOPPED
$ sudo ctr -n firecracker task rm ignite-b7ddccf23b066051
$ sudo ignite stop b
WARN[0000] VM "b7ddccf23b066051" is not running but trying to cleanup networking for stopped container 
INFO[0000] Removing the container with ID "ignite-b7ddccf23b066051" from the "cni" network 
INFO[0000] CNI failed to retrieve network namespace path: no running task found: task ignite-b7ddccf23b066051 not found: not found 
$

Both return an exit code of 0, this is intended.

@innobead
Copy link
Contributor Author

Thanks for the testing 👍, the patch applied.

@stealthybox
Copy link
Contributor

LGTM 👍

@stealthybox stealthybox merged commit c17a99c into weaveworks:master Aug 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to restart a stopped VM caused by a handled signal
4 participants