Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove rkt pods when exiting #3562

Merged
merged 2 commits into from
Nov 17, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ BUG FIXES:
* core: Fixed an issue where the leader server could get into a state where it
was no longer performing the periodic leader loop duties after a barrier
timeout error [GH-3402]
* core: Fixes an issue with jobs that have `auto_revert` set to true, where
reverting to a previously stable job that fails to start up causes an
infinite cycle of reverts [GH-3496]
* api: Apply correct memory default when task's do not specify memory
explicitly [GH-3520]
* cli: Fix passing Consul address via flags [GH-3504]
Expand All @@ -38,8 +41,7 @@ BUG FIXES:
[GH-3502]
* client: Fix allocation accounting in GC and trigger GCs on allocation
updates [GH-3445]
* core: Fixes an issue with jobs that have `auto_revert` set to true, where reverting
to a previously stable job that fails to start up causes an infinite cycle of reverts [GH-3496]
* driver/rkt: Remove pods on shutdown [GH-3562]
* template: Fix issue where multiple environment variable templates would be
parsed incorrectly when contents of one have changed after the initial
rendering [GH-3529]
Expand Down
29 changes: 26 additions & 3 deletions client/driver/rkt.go
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,22 @@ func rktManifestMakePortMap(manifest *appcschema.PodManifest, configPortMap map[
return portMap, nil
}

// rktRemove pod after it has exited.
func rktRemove(uuid string) error {
errBuf := &bytes.Buffer{}
cmd := exec.Command(rktCmd, "rm", uuid)
cmd.Stdout = ioutil.Discard
cmd.Stderr = errBuf
if err := cmd.Run(); err != nil {
if msg := errBuf.String(); len(msg) > 0 {
return fmt.Errorf("error removing pod %q: %s", uuid, msg)
}
return err
}

return nil
}

// NewRktDriver is used to create a new rkt driver
func NewRktDriver(ctx *DriverContext) Driver {
return &RktDriver{DriverContext: *ctx}
Expand Down Expand Up @@ -671,9 +687,9 @@ func (d *RktDriver) Open(ctx *ExecContext, handleID string) (DriverHandle, error
}
exec, pluginClient, err := createExecutorWithConfig(pluginConfig, d.config.LogOutput)
if err != nil {
d.logger.Println("[ERROR] driver.rkt: error connecting to plugin so destroying plugin pid and user pid")
d.logger.Println("[ERR] driver.rkt: error connecting to plugin so destroying plugin pid and user pid")
if e := destroyPlugin(id.PluginConfig.Pid, id.ExecutorPid); e != nil {
d.logger.Printf("[ERROR] driver.rkt: error destroying plugin and executor pid: %v", e)
d.logger.Printf("[ERR] driver.rkt: error destroying plugin and executor pid: %v", e)
}
return nil, fmt.Errorf("error connecting to plugin: %v", err)
}
Expand Down Expand Up @@ -771,7 +787,7 @@ func (h *rktHandle) run() {
close(h.doneCh)
if ps.ExitCode == 0 && werr != nil {
if e := killProcess(h.executorPid); e != nil {
h.logger.Printf("[ERROR] driver.rkt: error killing user process: %v", e)
h.logger.Printf("[ERR] driver.rkt: error killing user process: %v", e)
}
}

Expand All @@ -781,6 +797,13 @@ func (h *rktHandle) run() {
}
h.pluginClient.Kill()

// Remove the pod
if err := rktRemove(h.uuid); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking: Should it use the created resources feature so that cleanup gets retried?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to match Docker's behavior and this is how we do it there. I'll update rkt to use created resources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated this route and it's a bit more work than I'd like to put in this PR. It involves returning CreatedResources from Driver.Start which isn't already happening and ensuring TaskDriver cleans up containers exactly when desired. Following Dockers lead for now seems safest.

h.logger.Printf("[ERR] driver.rkt: error removing pod %q - must gc manually: %v", h.uuid, err)
} else {
h.logger.Printf("[DEBUG] driver.rkt: removed pod %q", h.uuid)
}

// Send the results
h.waitCh <- dstructs.NewWaitResult(ps.ExitCode, 0, werr)
close(h.waitCh)
Expand Down
41 changes: 37 additions & 4 deletions client/driver/rkt_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"fmt"
"io/ioutil"
"os"
"os/exec"
"path/filepath"
"reflect"
"strings"
Expand Down Expand Up @@ -166,16 +167,16 @@ func TestRktDriver_Start_Wait(t *testing.T) {
if err != nil {
t.Fatalf("err: %v", err)
}
defer resp.Handle.Kill()
handle := resp.Handle.(*rktHandle)
defer handle.Kill()

// Update should be a no-op
err = resp.Handle.Update(task)
if err != nil {
if err := handle.Update(task); err != nil {
t.Fatalf("err: %v", err)
}

// Signal should be an error
if err = resp.Handle.Signal(syscall.SIGTERM); err == nil {
if err := resp.Handle.Signal(syscall.SIGTERM); err == nil {
t.Fatalf("err: %v", err)
}

Expand All @@ -187,6 +188,18 @@ func TestRktDriver_Start_Wait(t *testing.T) {
case <-time.After(time.Duration(testutil.TestMultiplier()*15) * time.Second):
t.Fatalf("timeout")
}

// Make sure pod was removed #3561
var stderr bytes.Buffer
cmd := exec.Command(rktCmd, "status", handle.uuid)
cmd.Stdout = ioutil.Discard
cmd.Stderr = &stderr
if err := cmd.Run(); err == nil {
t.Fatalf("expected error running 'rkt status %s' on removed container", handle.uuid)
}
if out := stderr.String(); !strings.Contains(out, "no matches found") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems brittle if the returned error message ever changes, but other than checking for a non empty error message I don't have a better idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I hate resorting to strings.Contains, but if it ever fails due to a string change it should be a quick and easy fix (and caught by Travis, not a user at runtime).

t.Fatalf("expected 'no matches found' but received: %s", out)
}
}

func TestRktDriver_Start_Wait_Skip_Trust(t *testing.T) {
Expand Down Expand Up @@ -576,3 +589,23 @@ func TestRktDriver_HandlerExec(t *testing.T) {
t.Fatalf("error killing handle: %v", err)
}
}

func TestRktDriver_Remove_Error(t *testing.T) {
if !testutil.IsTravis() {
t.Parallel()
}
if os.Getenv("NOMAD_TEST_RKT") == "" {
t.Skip("skipping rkt tests")
}

ctestutils.RktCompatible(t)

// Removing a non-existent pod should return an error
if err := rktRemove("00000000-0000-0000-0000-000000000000"); err == nil {
t.Fatalf("expected an error")
}

if err := rktRemove("zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"); err == nil {
t.Fatalf("expected an error")
}
}