Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Fix incorrect exit codes from Exec #7965

Closed
wants to merge 4 commits into from

Conversation

matthewavery
Copy link
Contributor

Fixes #5692

NOTE: I still have to add the test I have written for the CI, I am not done tweaking it.

This PR is not quite complete. It still has a few problems around context handling which is causing a CDE.

This PR adds TaskWaitStart to the portlayer as a means for a portlayer call to wait for a Task to "begin". It also changes TaskWait to wait for the indicated task to exit/terminate before returning. This will allow users of the portlayer to properly wait on a task and get the correct exit code when performing an inspect against the task. To do this, I added an additional api target /tasks/start to the portlayer which can be used with a task ID.

op.Debugf("waiting for exec(%s) to complete", id)

// NOTE: Do we need to go to the property collector for all of these? The container base is from the exec cache... will it contain the new changes liked started?
startedKey := extraconfig.CalculateKeys(c.ExecConfig, fmt.Sprintf("Execs.%s.Started", id), "")[0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is likely that we may not really need to go to extra config for started and startTime as we have the exec task info right there with the container base. Thoughts?

Additionally, since the Wait func for exec config looks for changes in the exec config, it is possible that the exec finishes before we start waiting for changes. Thoughts/suggestions on this?

@codecov-io
Copy link

codecov-io commented May 18, 2018

Codecov Report

Merging #7965 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #7965   +/-   ##
=======================================
  Coverage   26.07%   26.07%           
=======================================
  Files          37       37           
  Lines        5243     5243           
=======================================
  Hits         1367     1367           
  Misses       3769     3769           
  Partials      107      107

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fff550f...9cb57c4. Read the comment docs.

@matthewavery matthewavery requested review from cgtexmex and hickeng May 18, 2018 15:38
"/tasks/start": {
"put": {
"description": "Initiates a wait for the specified task to start",
"summary": "Initiates a waot for the specified task to start",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waot

@@ -297,6 +297,7 @@ func (c *ContainerBackend) ContainerExecInspect(eid string) (*backend.ExecInspec
return nil, err
}

log.Debugf("exit code was reported as %d", ec.ExitCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm commenting not because this is significant but because of an ongoing effort to get more concise log messages:

log.Debugf("exit code: %d", ec.ExitCode)

@@ -544,7 +545,7 @@ func (c *ContainerProxy) WaitTask(op trace.Operation, cid string, cname string,
// wait the Task to start
config := &models.TaskWaitConfig{
Handle: handle,
ID: eid,
TaskID: eid,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is stuttering - the ID field is already part of a config named Task.... Yes we stutter in a lot of places but it's against best practice so we shouldn't be deliberately adding stutters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please remove the strange capitalization of Container in the tasks.WaitNotFound error case below.

@@ -563,7 +564,40 @@ func (c *ContainerProxy) WaitTask(op trace.Operation, cid string, cname string,
}

return nil
}

func (c *ContainerProxy) WaitStartTask(op trace.Operation, cid string, cname string, eid string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add function comments for this and WaitTask to note what they do and how they're different. The similar naming makes this important and it's good practice anyway.

if err != nil {
switch err := err.(type) {
case *tasks.WaitNotFound:
return errors.InternalServerError(fmt.Sprintf("the Container(%s) has been shutdown during execution of the exec operation", cid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove strange Container capitalization, here and in original.

@@ -241,7 +243,7 @@ func (handler *TaskHandlersImpl) InspectHandler(params tasks.InspectParams) midd

res := &models.TaskInspectResponse{
ID: t.ID,
Running: t.Started == "true",
Running: t.Started == "true" && t.Detail.StartTime > t.Detail.StopTime && handle.Runtime.PowerState == types.VirtualMachinePowerStatePoweredOn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deep inspection of Handle in the swagger handlers violates the opaque nature of handle.
I suggest adding a task.State(op, handle, id) method that returns an enum:

Inactive // not bound, therefore won't run (this is the state after Create but before Start)
Active // bound, (this is the state after State but before it's been successfully launched - all primary sessions would be in this state)
Running // the logic you have here
Stopped // Not running, not clean exit

In fact, having looked at the swagger changes, I'm edging towards not having separate WaitStart and Wait calls - it was convoluted just to talk about it and now I see it in code it's confusing.

It should be simple to alter the logic to make Wait become "wait for state change vs supplied handle state" and then it's easily paired with inspect to decide whether to wait again for the next state change.

case *task.TaskPowerStateError:
op.Errorf("The container was in an invalid power state for the wait operation: %s", err.Error())
return tasks.NewWaitStartPreconditionRequired().WithPayload(
&models.Error{Message: err.Error()})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the wrapping - any chance you could have a slightly higher value for your columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapping of the encapsulation? I can just put it all on the same line. I think the go fmt automatically did this? or I might have done it out of habit. I WIll move it all to one line.

"tags": [
"tasks"
"/tasks/start": {
"put": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call doesn't change state so should not be a "put". A "get" is the more natural fit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. Done on a new branch.

"/tasks/start": {
"put": {
"description": "Initiates a wait for the specified task to start",
"summary": "Initiates a waot for the specified task to start",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/waot/wait/

@@ -23,7 +23,7 @@ import (
"github.com/vmware/vic/pkg/trace"
)

// Wait waits the task to start
// Wait waits for the task to complete, it will retry reading the appropriate task keys until they meet the criteria for a "finished" Task
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Wait will block until the task is complete. If the task is not yet started it will <fill in what it will do>

Your comment is referencing details that are unknown in this package - there's no indication in the function about keys or similar. When we shift from guestinfo keys to namespacedb then we will not be waiting on keys... yet this function should require no changes as the implementation details are wrapped up in the WaitForX functions. This is a good indicator that the comment detail is misplaced.

defer trace.End(trace.Begin(id, op))

// guestinfo key that we want to wait for
key := extraconfig.CalculateKeys(c.ExecConfig, fmt.Sprintf("Execs.%s.Started", id), "")[0]
return c.waitFor(op, key)
}

// Waits for the exec'd task to complete.
//TODO: we really need structured Task style errors so that we can be more explicit in our handling upstream.
func (c *containerBase) WaitForExec(op trace.Operation, id string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spoke directly about this, but here for posterity...

I think altering this so that we're just waiting for a change in a specified set of keys will be the most effective way to implement what we're after (WaitForKeyChange(op trace.Operation, currentKeyValues map[string]string) maybe).

I'd also request a containerBase.TaskWait() or similar function that does the calls to extraconfig.CalculateLeys before calling the general function to wait for the change. This is so that the task package doesn't start depending on the implementation of the config store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants