WIP Fix incorrect exit codes from Exec #7965

matthewavery · 2018-05-18T00:50:25Z

NOTE: I still have to add the test I have written for the CI, I am not done tweaking it.

This PR is not quite complete. It still has a few problems around context handling which is causing a CDE.

This PR adds TaskWaitStart to the portlayer as a means for a portlayer call to wait for a Task to "begin". It also changes TaskWait to wait for the indicated task to exit/terminate before returning. This will allow users of the portlayer to properly wait on a task and get the correct exit code when performing an inspect against the task. To do this, I added an additional api target /tasks/start to the portlayer which can be used with a task ID.

matthewavery · 2018-05-18T00:52:45Z

lib/portlayer/exec/base.go

+	op.Debugf("waiting for exec(%s) to complete", id)
+
+	// NOTE: Do we need to go to the property collector for all of these? The container base is from the exec cache... will it contain the new changes liked started?
+	startedKey := extraconfig.CalculateKeys(c.ExecConfig, fmt.Sprintf("Execs.%s.Started", id), "")[0]


It is likely that we may not really need to go to extra config for started and startTime as we have the exec task info right there with the container base. Thoughts?

Additionally, since the Wait func for exec config looks for changes in the exec config, it is possible that the exec finishes before we start waiting for changes. Thoughts/suggestions on this?

codecov-io · 2018-05-18T01:08:33Z

Codecov Report

Merging #7965 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #7965   +/-   ##
=======================================
  Coverage   26.07%   26.07%           
=======================================
  Files          37       37           
  Lines        5243     5243           
=======================================
  Hits         1367     1367           
  Misses       3769     3769           
  Partials      107      107

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fff550f...9cb57c4. Read the comment docs.

gigawhitlocks · 2018-05-18T17:45:33Z

lib/apiservers/portlayer/swagger.json

+    "/tasks/start": {
+        "put": {
+				  "description": "Initiates a wait for the specified task to start",
+				  "summary": "Initiates a waot for the specified task to start",


hickeng · 2018-05-18T17:38:32Z

lib/apiservers/engine/backends/container.go

@@ -297,6 +297,7 @@ func (c *ContainerBackend) ContainerExecInspect(eid string) (*backend.ExecInspec
 		return nil, err
 	}

+	log.Debugf("exit code was reported as %d", ec.ExitCode)


I'm commenting not because this is significant but because of an ongoing effort to get more concise log messages:

log.Debugf("exit code: %d", ec.ExitCode)

hickeng · 2018-05-18T17:41:11Z

lib/apiservers/engine/proxy/container_proxy.go

@@ -544,7 +545,7 @@ func (c *ContainerProxy) WaitTask(op trace.Operation, cid string, cname string,
 	// wait the Task to start
 	config := &models.TaskWaitConfig{
 		Handle: handle,
-		ID:     eid,
+		TaskID: eid,


This is stuttering - the ID field is already part of a config named Task.... Yes we stutter in a lot of places but it's against best practice so we shouldn't be deliberately adding stutters.

Also, please remove the strange capitalization of Container in the tasks.WaitNotFound error case below.

hickeng · 2018-05-18T17:42:59Z

lib/apiservers/engine/proxy/container_proxy.go

@@ -563,7 +564,40 @@ func (c *ContainerProxy) WaitTask(op trace.Operation, cid string, cname string,
 	}

 	return nil
+}
+
+func (c *ContainerProxy) WaitStartTask(op trace.Operation, cid string, cname string, eid string) error {


Add function comments for this and WaitTask to note what they do and how they're different. The similar naming makes this important and it's good practice anyway.

hickeng · 2018-05-18T17:45:37Z

lib/apiservers/engine/proxy/container_proxy.go

+	if err != nil {
+		switch err := err.(type) {
+		case *tasks.WaitNotFound:
+			return errors.InternalServerError(fmt.Sprintf("the Container(%s) has been shutdown during execution of the exec operation", cid))


remove strange Container capitalization, here and in original.

hickeng · 2018-05-18T18:05:08Z

lib/apiservers/portlayer/restapi/handlers/task_handlers.go

@@ -241,7 +243,7 @@ func (handler *TaskHandlersImpl) InspectHandler(params tasks.InspectParams) midd

 	res := &models.TaskInspectResponse{
 		ID:       t.ID,
-		Running:  t.Started == "true",
+		Running:  t.Started == "true" && t.Detail.StartTime > t.Detail.StopTime && handle.Runtime.PowerState == types.VirtualMachinePowerStatePoweredOn,


This deep inspection of Handle in the swagger handlers violates the opaque nature of handle.
I suggest adding a task.State(op, handle, id) method that returns an enum:

Inactive // not bound, therefore won't run (this is the state after Create but before Start) Active // bound, (this is the state after State but before it's been successfully launched - all primary sessions would be in this state) Running // the logic you have here Stopped // Not running, not clean exit

In fact, having looked at the swagger changes, I'm edging towards not having separate WaitStart and Wait calls - it was convoluted just to talk about it and now I see it in code it's confusing.

It should be simple to alter the logic to make Wait become "wait for state change vs supplied handle state" and then it's easily paired with inspect to decide whether to wait again for the next state change.

hickeng · 2018-05-18T18:06:46Z

lib/apiservers/portlayer/restapi/handlers/task_handlers.go

+		case *task.TaskPowerStateError:
+			op.Errorf("The container was in an invalid power state for the wait operation: %s", err.Error())
+			return tasks.NewWaitStartPreconditionRequired().WithPayload(
+				&models.Error{Message: err.Error()})


Not a fan of the wrapping - any chance you could have a slightly higher value for your columns?

The wrapping of the encapsulation? I can just put it all on the same line. I think the go fmt automatically did this? or I might have done it out of habit. I WIll move it all to one line.

hickeng · 2018-05-18T18:28:39Z

lib/apiservers/portlayer/swagger.json

-				"tags": [
-					"tasks"
+    "/tasks/start": {
+        "put": {


This call doesn't change state so should not be a "put". A "get" is the more natural fit.

agreed. Done on a new branch.

hickeng · 2018-05-18T18:28:50Z

lib/apiservers/portlayer/swagger.json

+    "/tasks/start": {
+        "put": {
+				  "description": "Initiates a wait for the specified task to start",
+				  "summary": "Initiates a waot for the specified task to start",


s/waot/wait/

hickeng · 2018-05-18T18:36:37Z

lib/portlayer/task/wait.go

@@ -23,7 +23,7 @@ import (
 	"github.com/vmware/vic/pkg/trace"
 )

-// Wait waits the task to start
+// Wait waits for the task to complete, it will retry reading the appropriate task keys until they meet the criteria for a "finished" Task


// Wait will block until the task is complete. If the task is not yet started it will <fill in what it will do>

Your comment is referencing details that are unknown in this package - there's no indication in the function about keys or similar. When we shift from guestinfo keys to namespacedb then we will not be waiting on keys... yet this function should require no changes as the implementation details are wrapped up in the WaitForX functions. This is a good indicator that the comment detail is misplaced.

hickeng · 2018-05-18T19:45:17Z

lib/portlayer/exec/base.go

 	defer trace.End(trace.Begin(id, op))

 	// guestinfo key that we want to wait for
 	key := extraconfig.CalculateKeys(c.ExecConfig, fmt.Sprintf("Execs.%s.Started", id), "")[0]
 	return c.waitFor(op, key)
 }

+// Waits for the exec'd task to complete.
+//TODO: we really need structured Task style errors so that we can be more explicit in our handling upstream.
+func (c *containerBase) WaitForExec(op trace.Operation, id string) error {


We spoke directly about this, but here for posterity...

I think altering this so that we're just waiting for a change in a specified set of keys will be the most effective way to implement what we're after (WaitForKeyChange(op trace.Operation, currentKeyValues map[string]string) maybe).

I'd also request a containerBase.TaskWait() or similar function that does the calls to extraconfig.CalculateLeys before calling the general function to wait for the change. This is so that the task package doesn't start depending on the implementation of the config store.

vmwclabot added the cla-not-required label May 18, 2018

matthewavery commented May 18, 2018

View reviewed changes

matthewavery requested review from cgtexmex and hickeng May 18, 2018 15:38

matthewavery added 4 commits May 18, 2018 09:34

Add TaskWaitStart as an endpoint to portlayer

cfd1696

Stub TaskWait and Move to using TaskWaitStart

f7c1b32

Add path for WaitTask functionality

64983a4

Add function pointer to the portlayer handler and other cleaning

9cb57c4

matthewavery force-pushed the exec-exit-code-fix branch from 6d0406b to 9cb57c4 Compare May 18, 2018 16:35

gigawhitlocks reviewed May 18, 2018

View reviewed changes

hickeng requested changes May 18, 2018

View reviewed changes

matthewavery closed this May 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Fix incorrect exit codes from Exec #7965

WIP Fix incorrect exit codes from Exec #7965

matthewavery commented May 18, 2018

matthewavery May 18, 2018

codecov-io commented May 18, 2018 •

edited

Loading

gigawhitlocks May 18, 2018

hickeng May 18, 2018

hickeng May 18, 2018

hickeng May 18, 2018

hickeng May 18, 2018

hickeng May 18, 2018

hickeng May 18, 2018

hickeng May 18, 2018

matthewavery May 19, 2018

hickeng May 18, 2018

matthewavery May 19, 2018

hickeng May 18, 2018

hickeng May 18, 2018

hickeng May 18, 2018

WIP Fix incorrect exit codes from Exec #7965

WIP Fix incorrect exit codes from Exec #7965

Conversation

matthewavery commented May 18, 2018

Choose a reason for hiding this comment

codecov-io commented May 18, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented May 18, 2018 •

edited

Loading