Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad exec part 1: plumbing and docker driver #5632

Merged
merged 15 commits into from
May 9, 2019

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Apr 30, 2019

Context

This PR is the first of PRs that aim to introduce nomad exec: a command allowing operators to execute commands into running tasks that run anywhere in cluster in any driver that supports exec. Think docker exec but that can run against task running remotely and with exec/java tasks as well.

The PRs are relatively very large, and best reviewed by examining individual commits along with the commit messages that dig deeper into implementation choices.

This PR: Plumbing and initial driver support

This PR introduces an API endpoint for executing arbitrary commands on running tasks, that forwards the API call to the nomad client running the task.

The endpoint uses websockets for processing user inputs and streaming back process output. The agent handler would forward the stream entities all the way to client. Some care is given to reuse the same serialization bits all the way to drivers, and only unpack the stream at the very last moment (driver or executor); this allows us to reduce transfer/conversion overhead and in some cases allow reuse of buffers.

When ACLs are enabled, an operator must have "alloc-exec" capability enabled to exec into any command. If the task is a raw_exec or using a driver that has no isolation, "alloc-node-exec" capability.

In this PR, we add support for docker as it's the most trivial implementation. Follow up PRs would add support for other drivers.

High level walk through and key ideas:

(please review the commit messages for details)

The PR starts with few helper changes that are relevant for subsequent significant changes - wanted to get them out of the way before getting into the meaty commits:

6d711d0:

  • introduces the core structs and interfaces involved. It introduces two (one high-level and one low-level) interfaces for drivers intending to support nomad exec. Drivers support is opt-in, and existing drivers can compile without any modification.

  • Also introduces the websocket streaming entities (e.g. ExecTaskStreamingRequest and ExecTaskStreamingResponse) that will get reused in the API as well in e78f7ce#diff-bafd99d510459580cfc433bada3fde5cR535 .

4ade58e and 781b255:

  • contain the client ultimate endpoint handler and the boilerplate necessary for forwarding connection from api agent handler to the client running allocation.
  • they perform basic checks (e.g. checking ACL, running state) and then delegate to a driver handler that inspects if driver supports nomad exec and invokes the appropriate call

58a4c71

  • adds the agent websocket handler that unwraps the API call and translate the websocket interface to the RPC method.

e78f7ce

  • Adds an Allocations.Exec API convenience method for invoking the API.

d144553

  • adds a test harness to be used for all drivers - mostly modeled after docker exec behavior in terms of tty/children-process behaviors.

a8e460a:

  • adds docker support for nomad exec as a working driver example to vet this implementation

Mahmood Ali added 5 commits April 30, 2019 10:23
This helper returns the token as well as the ACL policy, to be used in a later
commit for logging the token info associated with nomad exec invocation.
Test helper that allows registration of jobs when ACL is activated.
Extract command parsing and execution mocking into a separate struct.  Also,
allow mocking of different fs_isolation for testing.
This adds `alloc-exec` capability to allow operator to execute command into a
running task.  Furthermore, it adds `alloc-node-exec` capability, required when
the alloc task is raw_exec or a driver with no FSIsolation.
In this commit, we add two driver interfaces for supporting `nomad exec`
invocation:

* A high level `ExecTaskStreamingDriver`, that operates on io reader/writers.
  Drivers should prefer using this interface
* A low level `ExecTaskStreamingRawDriver` that operates on the raw stream of
  input structs; useful when a driver delegates handling to driver backend (e.g.
  across RPC/grpc).

The interfaces are optional for a driver, as `nomad exec` support is opt-in.
Existing drivers continue to compile without exec support, until their
maintainer add such support.

Furthermore, we create protobuf structures to represent exec stream entities:
`ExecTaskStreamingRequest` and `ExecTaskStreamingResponse`.  We aim to reuse the
protobuf generated code as much as possible, without translation to avoid
conversion overhead.

`ExecTaskStream` abstract fetching and sending stream entities.  It's influenced
by the grpc bi-directional stream interface, to avoid needing any adapter.  I
considered using channels, but the asynchronisity and concurrency makes buffer
reuse too complicated, which would put more pressure on GC and slows exec operation.
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review is still a WIP. Just dumping a first batch of comments.

api/allocations.go Outdated Show resolved Hide resolved
api/allocations.go Outdated Show resolved Hide resolved
api/allocations.go Show resolved Hide resolved
return -1, ctx.Err()
case frame, ok := <-output:
if !ok {
return -1, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is closing the output chan an expected way to signal an exit? If not we should probably return an error so it doesn't appear the -1 exit signal came from the remote side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - the only legitimate expected way to exit is by receiving a frame with exit results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use -2 here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it return an error then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point - i'll make it return an error

case frame.Exited && frame.Result != nil:
return frame.Result.ExitCode, nil
default:
// unexpected event, TODO: log it?!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log or error. Logging seems more reasonable in this case as it allows us to tuck more metadata into the frames in the future without completely breaking old clients.

Copy link
Contributor Author

@notnoop notnoop May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear how to log in the cli side of things. Do we have a pattern of cli logging? Also, logging can be very disruptive, consider the case where you are running in vim/cli-with-rich-interface, having log messages emited may cause display to go weird. That was my concern with all the other places where I would have logged if we were in the agent side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right! I forgot we can't really log in the api package. Let's update the comment.

Suggested change
// unexpected event, TODO: log it?!
// Unexpected event! Can't log as the user may be in an interactive session.
// Silently drop unexpected frames to provide graceful degradation for future protocol changes.

}
}

// TestAlloc_ExecStreaming_ACL_WithIsolation_Image assets that token only needs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TestAlloc_ExecStreaming_ACL_WithIsolation_Image assets that token only needs
// TestAlloc_ExecStreaming_ACL_WithIsolation_Image asserts that token only needs

}
}

// TestAlloc_ExecStreaming_ACL_WithIsolation_Chroot assets that token only needs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TestAlloc_ExecStreaming_ACL_WithIsolation_Chroot assets that token only needs
// TestAlloc_ExecStreaming_ACL_WithIsolation_Chroot asserts that token only needs

}
}

// TestAlloc_ExecStreaming_ACL_WithIsolation_None assets that token needs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TestAlloc_ExecStreaming_ACL_WithIsolation_None assets that token needs
// TestAlloc_ExecStreaming_ACL_WithIsolation_None asserts that token needs

}

var frame drivers.ExecTaskStreamingResponseMsg
json.Unmarshal(msg.Payload, &frame)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check error

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass complete! Overall structure looks great, just worried about some of the trickier multiple-goroutines-handling-errors cases.

return err
}

stream.Send(drivers.NewExecStreamingResponseExit(result.ExitCode))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle error or comment why we don't need to

client/pluginmanager/drivermanager/manager.go Show resolved Hide resolved
command/agent/alloc_endpoint.go Outdated Show resolved Hide resolved
command/agent/alloc_endpoint.go Outdated Show resolved Hide resolved
command/agent/alloc_endpoint.go Outdated Show resolved Hide resolved
plugins/drivers/execstreaming.go Outdated Show resolved Hide resolved
plugins/drivers/execstreaming.go Outdated Show resolved Hide resolved
plugins/drivers/execstreaming.go Outdated Show resolved Hide resolved
plugins/drivers/proto/driver.proto Show resolved Hide resolved
plugins/drivers/server.go Outdated Show resolved Hide resolved
@notnoop notnoop requested a review from schmichael May 8, 2019 20:49
Width: int(msg.TtySize.Width),
}:
case <-ctx.Done():
// process terminated before resize is processed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you missing a return here?

msg, err := stream.Recv()
if err == io.EOF {
return
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle non EOF errors here

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A couple minor nitpicks and questions remain, but none are blockers.

Mahmood Ali added 10 commits May 9, 2019 16:49
Also add a helper that converts the adapts the high level interface to the
low-level interface of nomad exec interfaces.
Add a client streaming RPC endpoint for processing nomad exec tasks, by invoking
the relevant task handler for execution.
This adds a websocket endpoint for handling `nomad exec`.

The endpoint is a websocket interface, as we require a bi-directional
streaming (to handle both input and output), which is not very appropriate for
plain HTTP 1.0. Using websocket makes implementing the web ui a bit simpler. I
considered using golang http hijack capability to treat http request as a plain
connection, but the web interface would be too complicated potentially.

Furthermore, the API endpoint operates against the raw core nomad exec streaming
datastructures, defined in protobuf, with json serializer.  Our APIs use json
interfaces in general, and protobuf generates json friendly golang structs.
Reusing the structs here simplify interface and reduce conversion overhead.
Add consolidated testing package to serve as conformance tests for all drivers.
Adds nomad exec support in our API, by hitting the websocket endpoint.

We introduce API structs that correspond to the drivers streaming exec structs.

For creating the websocket connection, we reuse the transport setting from api
http client.
@notnoop notnoop force-pushed the f-nomad-exec-parts-01-base branch from bffe3d2 to 980e4f5 Compare May 9, 2019 20:49
@notnoop notnoop merged commit 5abbee5 into master May 9, 2019
@notnoop notnoop deleted the f-nomad-exec-parts-01-base branch May 18, 2019 00:47
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants