Propose a feature to troubleshoot running pods #649

verb · 2017-05-22T16:49:49Z

This feature allows troubleshooting of running pods by running a new "Debug Container" in the pod namespaces.

This proposal was originally opened and reviewed in kubernetes/kubernetes#35584.

This proposal needs LGTM by the following SIGs:

SIG Node
SIG CLI
SIG Auth
API Reviewer

Work in Progress:

Prototype kubectl attach for debug containers
Talk to sig-api-machinery about /debug subresource semantics

verb · 2017-05-22T16:50:03Z

/assign @dashpole

dashpole · 2017-05-22T16:55:35Z

/lgtm

derekwaynecarr · 2017-05-22T16:57:20Z

I think this proposal needs wider review since it comes with API changes.

derekwaynecarr · 2017-05-22T16:59:12Z

Related to feature kubernetes/enhancements#277

derekwaynecarr · 2017-05-22T17:03:30Z

contributors/design-proposals/troubleshoot-running-pods.md

+```
+type PodStatus struct {
+        ...
+        DebugContainerStatuses []ContainerStatus


since this feature is alpha, doesn't the field name need to reflect that?

That would make sense, I just didn't know that was how to do it. I know there's been a lot of discussion in kubernetes/kubernetes#30819, but I haven't followed it. It doesn't look like there's consensus, but I can try to cherry-pick something that might work.

I think that we do not need to update the filed name but how about using annotations like init container? Such as adding an annotation as pod.alpha.kubernetes.io/debug-containers? You can get more detail from https://github.com/kubernetes/community/blob/master/contributors/design-proposals/container-init.md

We should definitely not store something as important as debug container info in an annotation. Those are also compatibility nightmares when they transition to supported fields.

@kubernetes/sig-api-machinery-misc for guidance on alpha field names and gating

@liggitt can you please show more detail for this? I found that the init container was using this logic to be from alpha->beta->graduate, why cannot the debug container?

It did, and it has been a very painful transition, with very poor user experience (c.f. kubernetes/kubernetes#45627 (comment))

Got it, thanks @liggitt

derekwaynecarr · 2017-05-22T17:04:21Z

I think a section that covers the security concerns associated with this feature would be appreciated for future readers.

liggitt · 2017-05-23T04:26:17Z

contributors/design-proposals/troubleshoot-running-pods.md

+This creates an interactive shell in a pod which can examine and signal all
+processes in the pod. It has access to the same network and IPC as processes in
+the pod. It can access the filesystem of other processes by `/proc/$PID/root`,
+and enter aribitrary namespaces of another container via `nsenter` when


is it possible to hop from one container to another within a pod with nsenter today? is the proposed debug container more powerful than a bash prompt inside an existing container obtained via exec today?

nsenter is possible today with CAP_SYS_ADMIN. Debug containers aren't different from containers in Kubernetes today except that they aren't in a pod spec. If you were to add the following container to a pod spec:

- name: shell image: debian stdin: true tty: true securityContext: capabilities: add: - SYS_ADMIN

Then, as long as there was a shared pid namespace, you can attach and nsenter other processes.

SYS_ADMIN wouldn't be granted to Debug Containers by default, but it should be an option.

Kubernetes master with a new enough docker version uses a shared pid namespace by default now.

liggitt · 2017-05-23T04:27:36Z

contributors/design-proposals/troubleshoot-running-pods.md

+
+The process for creating a Debug Container is:
+
+1.  `kubectl` constructs a `v1.Container` based on command line flags and


v1.Container is not a top-level object (it has no objectmeta or typemeta)... can you describe the wrapper object that would be posted?

Added description of a new object PodDebugContainer

liggitt · 2017-05-23T04:30:04Z

contributors/design-proposals/troubleshoot-running-pods.md

+
+1.  `kubectl` constructs a `v1.Container` based on command line flags and
+    `POST`s it to `/api/v1/namespaces/$NS/pods/$POD/debug`.
+1.  The API server performs admission control and proxies the connection to the


this means that all admission plugins that gate on pods, pods/exec, and pods/attach would need to be updated to guard a new kind. Expanding the surface area an admission plugin needs to protect will become a bigger deal when we have out of tree admission plugins (the mechanism for which is in progress)

I've noted this in a new security considerations section. It sounds like it will be good to get this change in prior to admission plugins becoming GA, though I would hope the plugins will deny resources they don't recognize.

I would hope the plugins will deny resources they don't recognize.

they do not, they select the resources/subresources they guard (and admission plugins are already GA, it's the externalized ones that are being developed)

liggitt · 2017-05-23T04:30:45Z

contributors/design-proposals/troubleshoot-running-pods.md

+    is used because `/debug` is already used by the kubelet. `/podDebug` was
+    chosen to parallel existing endpoints like `/containerLogs`.
+1.  The kubelet instructs the Generic Runtime Manager (this feature is only
+    implemented for the CRI) to create a Debug Container.


how is availability of this feature determined, if only some CRI implementations support it?

Perhaps this was poorly worded. What I was trying to say is that this is implemented using the CRI and not the legacy runtimes. It doesn't require any changes to the CRI and will work with any runtime implementing the interface. I've updated the wording in my copy.

liggitt · 2017-05-23T04:31:26Z

contributors/design-proposals/troubleshoot-running-pods.md

+It is an error to attempt to create a Debug Container with the same name as a
+container that exists in the pod spec. There are no limits on the number of
+Debug Containers that can be created in a pod, but exceeding a pod's resource
+allocation may cause it to be evicted.


clarify what would be evicted? the pod? the debug container?

The pod would be evicted. Updated the doc.

liggitt · 2017-05-23T04:56:21Z

contributors/design-proposals/troubleshoot-running-pods.md

+    to streaming.
+
+It is an error to attempt to create a Debug Container with the same name as a
+container that exists in the pod spec. There are no limits on the number of


what component detects this error and what response is returned in that case?

The apiserver detects the error and returns a BadRequest, i.e.:

https://github.com/kubernetes/kubernetes/pull/46243/files#diff-c73f80ad83608f18657d22a06950d929R470

will update doc.

liggitt · 2017-05-23T04:59:02Z

contributors/design-proposals/troubleshoot-running-pods.md

+    policy.
+*   Explicit reattaching isn't implemented. Instead a `kubectl debug` invocation
+    will implicitly reattach if there is an existing, running container with the
+    same name. In this case container configuration will be ignored.


ignoring specified information and implicitly reattaching seems confusing, and not like something we'd want long-term

I agree 100%

so if this moves from alpha to beta to stable, how do you maintain skew compatibility with older kubectl clients' debug implementations without propagating the implicit throw-away-info-and-reattach behavior into the stable version of the feature?

I think if we just expand kubectl attach to support debug containers, that would solve the problem.

either way this has to be resolved prior to moving out of alpha

liggitt · 2017-05-23T05:02:11Z

contributors/design-proposals/troubleshoot-running-pods.md

+
+`startContainer()` will be updated to write a new label
+`io.kubernetes.container.type` to the runtime. Existing containers will be
+started with a type of `REGULAR` or `INIT`. When added in a subsequent step,


does this mean a 1.7 kubelet managing containers without this label (that were started by a 1.6 kubelet) will be confused about whether they are debug containers?

No, it's backwards compatible (and there's a test). The REGULAR and INIT labels aren't used by anything initially and the kubelet behavior only differs when Type == "DEBUG". (Type will be an empty string for a container that existed prior to the feature being enabled.)

liggitt · 2017-05-23T05:06:03Z

contributors/design-proposals/troubleshoot-running-pods.md

+
+### Additional Constraints
+
+1.  Non-interactive workloads are explicitly supported. There are no plans to


If the container run and attach steps are distinct, how is stdout/stderr coordinated so that the attach request obtains the first byte written to each? Is the first attach special? How do subsequent or additional attaches behave w.r.t. previous output from the debug container's entrypoint?

You're right, there's a race here since the runtime is buffering the output. The container starts and its initial output goes to the buffer (visible via kubectl log) and then the attach picks up mid-stream. This is fine for the interactive troubleshooting but might be a problem for non-interactive workloads.

if @ncdc has learned anything, it is that if an output buffer race can happen, it will, and that if it might be a problem, it will.

@liggitt the typical process for starting a docker container is (outside of kubernetes):

create container

attach to container

start container

If you must start the container prior to attaching (which tends to be the case for things likes kubectl run), then your only option to make sure you see all prior output is to specify logs=true when attaching. This has downsides: last time I checked, you can't limit the output to e.g. the last 100 lines, and if you have a TTY, I'm not sure what happens in that case. Also note, this isn't currently available in the version of the docker api vendored in to kubernetes.

@ncdc What's the longer term direction here? It would be significantly more work, but we could theoretically add create/attach/start functionality for Debug Containers. I wouldn't want to do it for an MVP, though.

I just wrote in another comment about this, but I think given the way the kubelet sync loop works, it will be very difficult to achieve create/attach/start without blocking one of the kubelet workers. We will need to discuss with sig-node if we want to pursue this.

liggitt · 2017-05-23T05:07:59Z

contributors/design-proposals/troubleshoot-running-pods.md

+1.  Non-interactive workloads are explicitly supported. There are no plans to
+    supported detached workloads, but doing so would be trivial with an
+    `attach=false` flag.
+1.  There are no guaranteed resources for ad-hoc troubleshooting. If


this seems like it would make debugging a pod that was memory constrained pretty difficult.

I agree, but in practice that's not been a problem we've had so far and we have to start somewhere. This could be improved in the future with the planned vertical pod autoscaling feature.

liggitt · 2017-05-23T05:26:38Z

how do debug containers interact with graceful termination of pods?

verb · 2017-05-23T18:15:58Z

@liggitt Debug containers receive the same signals as other containers in the pod for lifecycle events. They differ only in that they aren't deleted when syncing pod spec while the pod is alive.

verb · 2017-05-23T23:11:39Z

/assign @pwittrock
/assign @liggitt

ncdc · 2017-05-24T13:25:26Z

contributors/design-proposals/troubleshoot-running-pods.md

+```
+
+It would be reasonable for Kubernetes to provide a default container name and
+image1, making the minimal possible debug command:


Corrected, thanks.

ncdc · 2017-05-24T13:26:20Z

contributors/design-proposals/troubleshoot-running-pods.md

+image1, making the minimal possible debug command:
+
+```
+kubectl debug -it target-pod


Do you think you'd ever want to run kubectl debug and not attach stdin and use a tty?

I wouldn't, no, but then again I'm not sure why the option exists in kubectl exec. I can imagine wanting to run kubectl debug target-pod -- netstat -an, but only if I'll definitely get the first byte of the output stream and that would of course work just as well with a tty.

Updated the document to specify that -i & -t will be enabled by default.

ncdc · 2017-05-24T13:27:17Z

contributors/design-proposals/troubleshoot-running-pods.md

+This creates an interactive shell in a pod which can examine and signal all
+processes in the pod. It has access to the same network and IPC as processes in
+the pod. It can access the filesystem of other processes by `/proc/$PID/root`,
+and enter aribitrary namespaces of another container via `nsenter` when


Kubernetes master with a new enough docker version uses a shared pid namespace by default now.

ncdc · 2017-05-24T13:32:36Z

contributors/design-proposals/troubleshoot-running-pods.md

+replace a Debug Container that has exited by re-using a Debug Container name. It
+is an error to attempt to replace a Debug Container that is still running.
+
+One way in which `kubectl debug` differs from `kubectl exec` is the ability to


kubectl attach supports attaching to both init and normal containers. Would you want to expand it to support debug containers too? That would require the least amount of coding.

Ahh, well this is a compelling alternative. I may have misunderstood your intention, but not streaming the /debug subresource and instead relying solely on attach would solve several problems. It would sidestep (though not solve) output stream coordination and allow kubectl to generate the container configuration, which is more flexible. Off the top of my head:

kubectl debug would do a 2 step run debug container followed by an optional attach. The optional attach would better support non-interactive workloads.

the apiserver can't check that the debug container exists by examining Pod.Spec as it does for regular/init containers, but it should be able to check Pod.Status.DebugContainerStatuses. It's the same story for kubectl.

Output stream coordination would then be solved for Debug Containers when/if it's solved for attach.

Great idea, I'll prototype it.

The only true way to ensure you don't miss any output is create-container, attach-container, start-container, in that order. That's how docker run works. For something like kubectl run and probably kubectl debug too, we can't really do that, because of the way the kubelet sync loop works (kubectl waits until it sees the pod is Running before attaching). Well, I guess we could potentially do that, but it would require pausing a sync loop iteration until the remote client (kubectl) attaches, which isn't ideal.

Added this info to the doc, thanks!

ncdc · 2017-05-24T13:35:01Z

contributors/design-proposals/troubleshoot-running-pods.md

+### Killing Debug Containers
+
+Debug containers will not be killed automatically until the pod (specifically,
+the pod sandbox) is destroyed. Unlike `kubectl exec`, Debug Containers will not


There is roundabout support in newer docker versions for killing exec sessions. It now records the pid of the process that was exec'd, and we could use that information to do a kill.

ncdc · 2017-05-24T13:35:39Z

contributors/design-proposals/troubleshoot-running-pods.md

+the pod sandbox) is destroyed. Unlike `kubectl exec`, Debug Containers will not
+receive an EOF if their connection is interrupted. Instead, Debug Containers
+must be reattached to exit a running process. This could be tricky if the
+process does not allocate a TTY, in this case a second Debug Container could be


Same comment as above re why would you not allocate a tty?

Unless there's a reason that non-interactive workloads might shun a TTY I have no argument for not having one. I was just following the perceived convention of kubectl exec

I would view kubectl debug as an interactive utility for debugging a running pod, whereas I see kubectl exec as a tool in my toolbox that might or might not require user interaction. Although the more I think about it, typically when I do use kubectl exec it's to get a shell, in which case I almost always do -it. There's a recent issue proposing that exec defaults these to true: #46300.

@ncdc @liggitt since kubectl debug is mainly a tool for getting a shell with a TTY, how much do we care about output stream coordination when kubectl will already say "press enter to get a prompt"? I think it's reasonable to defer to a future general solution rather than trying to solve this for debug.

No objections to deferring

ncdc · 2017-05-24T13:39:37Z

contributors/design-proposals/troubleshoot-running-pods.md

+
+### Additional Constraints
+
+1.  Non-interactive workloads are explicitly supported. There are no plans to


@liggitt the typical process for starting a docker container is (outside of kubernetes):

create container

attach to container

start container

If you must start the container prior to attaching (which tends to be the case for things likes kubectl run), then your only option to make sure you see all prior output is to specify logs=true when attaching. This has downsides: last time I checked, you can't limit the output to e.g. the last 100 lines, and if you have a TTY, I'm not sure what happens in that case. Also note, this isn't currently available in the version of the docker api vendored in to kubernetes.

ncdc · 2017-05-24T13:41:10Z

contributors/design-proposals/troubleshoot-running-pods.md

+    policy.
+*   Explicit reattaching isn't implemented. Instead a `kubectl debug` invocation
+    will implicitly reattach if there is an existing, running container with the
+    same name. In this case container configuration will be ignored.


I think if we just expand kubectl attach to support debug containers, that would solve the problem.

verb · 2017-05-31T00:07:45Z

contributors/design-proposals/troubleshoot-running-pods.md

+of Debug Containers is reported via a new field in `v1.PodStatus`, described in
+a subsequent section.
+
+#### Alternative: Extending `/exec`


@dchen1107 @lavalamp @smarterclayton @pwittrock

For pod troubleshooting we need to choose between the object-based approach described above (suggested by @smarterclayton ) and the exec-based approach described below (suggested by @lavalamp and resembling the "image exec" approach of the original proposal, interestingly).

mikedanese · 2017-06-23T11:54:17Z

contributors/design-proposals/troubleshoot-running-pods.md

+   13 ?        Ss     0:00 bash
+   26 ?        Ss+    0:00 /neato
+  107 ?        R+     0:00 ps x
+root@debug-image:~# cat /proc/26/root/etc/resolv.conf


nit: I don't think this works with docker.

This updates the Pod Troubleshooting Design Proposal for recent developments in the community and to reflect the consensus from the API review: using the existing /exec endpoint as a starting point for this feature.

lavalamp · 2017-09-26T22:02:19Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+        ...
+        // DebugName is the name of the Debug Container. Its presence will cause
+        // exec to create a Debug Container rather than performing a runtime exec.
+        DebugName string `json:"debugName,omitempty" ...`


I vote for making a sub section:

type PodExecOptions struct { ... EphemeralContainer *PodExecEphemeralContainerSpec } type PodExecEphemeralContainerSpec struct { Name string Image string }

SGTM, though in my naive prototype the sub section wasn't populated from the HTTP params.

I'm not very familiar with the api machinery so I probably just missed something. I see where queryparams flattens the struct based on JSON names. Do I need to write a custom converter somewhere to get it back from params to an object? Could you point me in the right direction?

yeah, nested structs parsed from query params won't work cleanly (kubernetes/kubernetes#21476)

Earlier in the text it said this was not using query params--I thought it was using a message sent after the SPDY channel was opened.

lavalamp · 2017-09-27T16:59:47Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+We will extend `v1.Pod`'s `/exec` subresource to support "executing" container
+images. The current `/exec` endpoint must implement `GET` to support streaming
+for all clients. We don't want to encode a (potentially large) `v1.Container` as
+an HTTP parameter, so we must extend `v1.PodExecOptions` with the specific


Oh, I see. I misunderstood this line.

lavalamp · 2017-09-27T17:00:46Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+type PodExecOptions struct {
+        ...
+        // Run Command in an ephemeral container which shares some namespaces with Container.
+        EphemeralContainer PodExecEphemeralContainerSpec


This needs to be a pointer. Uh, but I will suggest something else since sub structs are indeed currently broken.

// PodExecOptions is the query options to a Pod's remote exec call type PodExecOptions struct { ... // EphemeralContainerName is the name of an ephemeral container in which the // command ought to be run. Either both EphemeralContainerName and // EphemeralContainerImage fields must be set, or neither. EphemeralContainerName *string `json:"ephemeralContainerName,omitempty" ...` // EphemeralContainerImage is the image of an ephemeral container in which the command // ought to be run. Either both EphemeralContainerName and EphemeralContainerImage // fields must be set, or neither. EphemeralContainerImage *string `json:"ephemeralContainerImage,omitempty" ...` }

Renamed as suggested

lavalamp · 2017-09-27T20:38:01Z

/lgtm

k8s-github-robot · 2017-09-27T20:38:46Z

Automatic merge from submit-queue.

liggitt · 2017-09-28T01:28:41Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+```
+type PodStatus struct {
+        ...
+        DebugStatuses []DebugStatus


should this mirror the exec option names? ephemeral statuses, etc?

We should use consistent naming

Argh, yes, I was in a rush and didn't see this section. @verb can you modify to

type PodStatus struct { EphemeralContainerStatuses []v1.ContainerStatus }

Hm. Actually were you trying to represent all exec actions with this?

(note I edited my comment above. I can't see anything you need that isn't already in v1.ContainerStatus.)

@lavalamp I'd like to at least have command and args, which aren't part of ContainerStatus.

liggitt · 2017-09-28T01:31:00Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+
+1.  `kubectl` invokes the debug API as described in the preceding section.
+1.  The API server checks for name collisions with existing containers, performs
+    admission control and proxies the connection to the kubelet's


note that all admission plugins that do anything related to checking containers, images, etc, would need to be updated to check ephemeral images specified in exec options now

This is noted in the "Security Considerations" section.

liggitt · 2017-09-28T01:33:22Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+requests and the kubelet must return an error to all but one request.
+
+There are no limits on the number of Debug Containers that can be created in a
+pod, but exceeding a pod's resource allocation may cause the pod to be evicted.


this also would bypass admission plugins that set resource limits/range on containers. please describe the container spec that would result from an ephemeral exec request

liggitt · 2017-09-28T01:34:51Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+1.  `KillPod()` already operates on all running containers returned by the
+    runtime.
+1.  Containers created prior to this feature being enabled will have a
+    `containerType` of `""`. Since this does not match `"DEBUG"` the special


DEBUG or EPHEMERAL?

Yes please change references in the API to "ephemeral" everywhere.

To be clear, this is a private label in the kubelet's runtime manager and not part of the API. I've updated it to EPHEMERAL for consistency, though.

smarterclayton

I don't see a discussion of how this will be secure (users with access to CAP_SYS_ADMIN)

smarterclayton · 2017-09-28T01:30:17Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+*   Exited Debug Containers will be garbage collected as regular containers and
+    may disappear from the list of Debug Container Statuses.
+*   Security Context for the Debug Container is not configurable. It will always
+    be run with `CAP_SYS_PTRACE` and `CAP_SYS_ADMIN`.


So only a cluster admin should ever use debug containers?

Also, pods don't have security context set in many cases. Exec implicitly escalating to root is bad.

This should be configurable at debug time, but that depends on the API. Some of my proposed API changes addressed this, but SIG Node and the API reviewers deadlocked on which was best. Our compromise was to proceed to alpha with the minimum possible API change, which doesn't include a configurable security context.

Created kubernetes/kubernetes#53188 to track this.

/cc @thockin

Our compromise was to proceed to alpha with the minimum possible API change, which doesn't include a configurable security context.

you have a pointer to where that discussion happened? no one from the auth/psp side was involved afaik

I don't understand, you've been involved since the first draft of the proposal 10 months ago, and your input has always been most welcome. It's not too late, what would you like to see changed?

@verb how bad would it be to pass a full v1.Container?

@thockin Not bad, It's what the kubelet does internally and I've had it working in a prototype.

We would use the API described in Alternative 1 to POST a v1.Container (wrapped in a new top level object) to a new /debug subresource. Then the client would perform a separate /attach.

The API reviewers had concerns about this being a novel use of the API. Nothing else POSTs to a subresource.

Both the API reviewers and SIG Node reviewers had concerns about using v1.Container being confusing or communicating the wrong intent to the user. Debug Containers are not general purpose containers and should not be used to build services or for routine operations. Some prefer to consider "extended exec" rather than "configurable container".

Most of the fields of v1.Container do not apply to Debug Containers and should be rejected if configured (lifecycle, livenessProbe, ports, readinessProbe, resources, stdin, stdinOnce, terminationMessagePath, terminationMessagePolicy, tty, volumeMounts). We can do this with a validation whitelist, though it would be simpler to pass a securityContext rather than a full v1.Container.

The API reviewers had concerns about this being a novel use of the API. Nothing else POSTs to a subresource.

I thought scheduler POSTs to pod/binding subresource, clients POST to pod/eviction subresource, etc. Or is there a distinction between POST and PUT here? (Which I can never keep straight.)

@davidopp Oh, that's good news then. When I prototyped this ~6 months ago I recall needing a couple of changes in the apiserver in order to make it work, but maybe those were specific to upgrading a connection to streaming after a POST. It's been a long ride.

Let's figure out a way to move this forward. Since we already had agreement among the reviewers at the time, and now we want to renegotiate that agreement based on new reviewers, I suggest that take the form of a new PR to amend the proposal where the new and old reviewers can work out their conflicting requirements. I'll prepare a diff.

The API reviewers had concerns about this being a novel use of the API. Nothing else POSTs to a subresource.

I thought scheduler POSTs to pod/binding subresource, clients POST to pod/eviction subresource, etc. Or is there a distinction between POST and PUT here? (Which I can never keep straight.)

Not sure about subresources, but there are issues with both PUT and POST as part of websocket requests (not all clients support them)

smarterclayton · 2017-09-28T01:33:56Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+```
+type PodStatus struct {
+        ...
+        DebugStatuses []DebugStatus


We should use consistent naming

liggitt · 2017-09-28T01:36:29Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+particular, they should enforce the same container image policy on the `Image`
+parameter as is enforced for regular containers. During the alpha phase we will
+additionally support a container image whitelist as a kubelet flag to allow
+cluster administrators to easily constraint debug container images.


what security context settings (uid/gid, selinux, apparmor) will the debug container have? how will admission plugins that constraint/force those (like PodSecurityPolicy) govern an ephemeral container

edit: just saw https://github.com/kubernetes/community/pull/649/files#diff-5cfb31b40ca47511743d0545d5697aa0R394

can we determine the equivalent v1.Container (including securitycontext) that would correspond to the ephemeral container? If so, we could see if a PodSecurityPolicy would allow the pod with a container with those settings/permissions

I thought @verb had worked this out with @derekwaynecarr already?

@liggitt Compatibility with admission plugins is a top priority and a strict requirement. The implementation will depend a little bit on how the Kubernetes API settles, but it will be one of:

The client may end up providing tje v1.Container that creates the Ephemeral Container

If we stick with the imperative, exec-style API we can do exactly as you suggest and provide the v1.Container based on the PodExecOptions.

liggitt · 2017-09-28T01:47:52Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+*   Security Context for the Debug Container is not configurable. It will always
+    be run with `CAP_SYS_PTRACE` and `CAP_SYS_ADMIN`.
+*   Image pull policy for the Debug Container is not configurable. It will
+    always be run with `PullAlways`.


this prevents offline installations with pre-pulled images

Good point. Created kubernetes/kubernetes#53189 to track this.

lavalamp · 2017-09-28T03:37:39Z

contributors/design-proposals/node/troubleshoot-running-pods.md

+        DebugStatuses []DebugStatus
+}
+
+type DebugStatus struct {


I don't see the necessity of this.

If you wanted to represent the way(s) in which the container is "dirty" after exec/attach/portrforward etc, I don't think this is the way to go.

Yes, that's something we want to represent along with details about what command was run by exec (whether traditional exec or in an ephemeral container). This isn't a blocker for the alpha implementation, though, so if you think this is the wrong approach then I'll remove this bit from the proposal and we can figure out the correct way later.

/cc @dchen1107 @thockin

I think we do want some way to represent the taintedness (also should do for exec )

@r2d4 Can you work on the best way to represent taintedness in v1.PodStatus?

@verb ack looking into it

ndeloof · 2018-06-25T07:05:32Z

Hi @verb , I'm glad to see your repeated effort to get this feature added to k8s.

I understand the initial use-case for this effort is related to diagnostic on a running Pod, but would like to share with you another use-case which would benefit this exact same improvement:

I'm working on Jenkins integration in k8s, we run containerized builds as pods. During the build execution, a new container might be required. In many case this can be identified before the build is scheduled and as such set as part of the Pod's spec, but in some cases the required image is dynamically selected as part of the build. Also, being able to run containers as part of the build just like developer do on their workstation helps to make the build script reproducible and portable.

with a new API to add (transient) containers to a Pod I could provide the glue code for build script to control such additional containers. The current usage for most users is to run a privileged (!) DinD container to host the build, or to bind mount docker.sock from host (!). As you can guess I'd prefer we don't rely on this :-\

Hope this will help understand potential use-cases this feature could support.

smarterclayton · 2018-07-02T22:23:55Z

I've been thinking along lines like this - we don't want to add infinite features to Kubernetes pods that allow them to orchestrate containers, but we also don't want to make pods so inflexible that people build external container orchestration. We have clearly stated that the pod abstraction is not the Borg Job/Tasks abstraction (directed graph), but allowing people to implement directed graph operations within a single pod has utility. If instead, a pod could leverage a node local API to add / remove containers within the limits the pod has defined (security boundaries, resources, secrets, volumes) but that the orchestration of those containers could be done by talking to the kubelet within the pod, then we could potentially continue to leverage CRI for intra-pod isolation AND avoid needing to add an infinite number of features to the pod api. I think it deserves some thought as an approach.

…

On Mon, Jun 25, 2018 at 3:05 AM, Nicolas De loof ***@***.***> wrote: Hi @verb <https://github.com/verb> , I'm glad to see your repeated effort to get this feature added to k8s. I understand the initial use-case for this effort is related to diagnostic on a running Pod, but would like to share with you another use-case which would benefit this exact same improvement: I'm working on Jenkins integration in k8s, we run containerized builds as pods. During the build execution, a new container might be required. In many case this can be identified *before* the build is scheduled and as such set as part of the Pod's spec, but in some cases the required image is dynamically selected as part of the build. Also, being able to run containers as part of the build just like developer do on their workstation helps to make the build script reproducible and portable. with a new API to add (transient) containers to a Pod I could provide the glue code for build script to control such additional containers. The current usage for most users is to run a privileged (!) DinD container to host the build, or to bind mount docker.sock from host (!). As you can guess I'd prefer we don't rely on this :-\ Hope this will help understand potential use-cases this feature could support. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#649 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_pwgpJIR1A7P9-1qdHkSNueZ5QXiQks5uAIvDgaJpZM4NinaN> .

smarterclayton · 2018-07-03T03:46:50Z

Note that I'm specifically saying: Once ephemeral containers are available, we have the rough shape of an api to further subdivide a pod (subcontainer division). We could potentially put a whole class of problems ("how do i orchestrate a set of containers within the pod") into the same pot by having the user run a container that spawns ephemeral containers (in whatever fashion the user wants) and using a combination of process and container controls to manage subdivision. I.e. instead of kube implementing a task like mechanism within a pod, make it possible for the user to safely implement that themselves. The ephemeral pod API then becomes an "in pod orchestrator" which exposes the container runtime in a kube-like fashion to the user, rather than forcing the user to implement nested containers (which are invisible to kube) or having kube continue to add feature after feature to approximate a directed task graph.

…

On Mon, Jul 2, 2018 at 6:23 PM, Clayton Coleman ***@***.***> wrote: I've been thinking along lines like this - we don't want to add infinite features to Kubernetes pods that allow them to orchestrate containers, but we also don't want to make pods so inflexible that people build external container orchestration. We have clearly stated that the pod abstraction is not the Borg Job/Tasks abstraction (directed graph), but allowing people to implement directed graph operations within a single pod has utility. If instead, a pod could leverage a node local API to add / remove containers within the limits the pod has defined (security boundaries, resources, secrets, volumes) but that the orchestration of those containers could be done by talking to the kubelet within the pod, then we could potentially continue to leverage CRI for intra-pod isolation AND avoid needing to add an infinite number of features to the pod api. I think it deserves some thought as an approach. On Mon, Jun 25, 2018 at 3:05 AM, Nicolas De loof ***@***.*** > wrote: > Hi @verb <https://github.com/verb> , I'm glad to see your repeated > effort to get this feature added to k8s. > > I understand the initial use-case for this effort is related to > diagnostic on a running Pod, but would like to share with you another > use-case which would benefit this exact same improvement: > > I'm working on Jenkins integration in k8s, we run containerized builds as > pods. During the build execution, a new container might be required. In > many case this can be identified *before* the build is scheduled and as > such set as part of the Pod's spec, but in some cases the required image is > dynamically selected as part of the build. Also, being able to run > containers as part of the build just like developer do on their workstation > helps to make the build script reproducible and portable. > > with a new API to add (transient) containers to a Pod I could provide the > glue code for build script to control such additional containers. The > current usage for most users is to run a privileged (!) DinD container to > host the build, or to bind mount docker.sock from host (!). As you can > guess I'd prefer we don't rely on this :-\ > > Hope this will help understand potential use-cases this feature could > support. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#649 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABG_pwgpJIR1A7P9-1qdHkSNueZ5QXiQks5uAIvDgaJpZM4NinaN> > . >

avanier · 2018-10-22T14:00:29Z

I have a keen interest for this feature. Does anyone have movement to report on this outside of this ticket?

Also, to echo @ndeloof mentions for debuging Jenkins, I have to say this is indeed quite useful. I've been running Concourse at a few shops over recent years, and using Pivotal's Garden OCI runtime they achieve exactly that.

verb · 2018-10-22T15:53:09Z

@avanier kubernetes/enhancements#277 might be better for tracking progress of this feature. The API change is under review in kubernetes/kubernetes#59416, once the API changes there should be quick progress.

Automatic merge from submit-queue. Propose a feature to troubleshoot running pods This feature allows troubleshooting of running pods by running a new "Debug Container" in the pod namespaces. This proposal was originally opened and reviewed in kubernetes/kubernetes#35584. This proposal needs LGTM by the following SIGs: - [ ] SIG Node - [ ] SIG CLI - [ ] SIG Auth - [x] API Reviewer Work in Progress: - [x] Prototype `kubectl attach` for debug containers - [x] Talk to sig-api-machinery about `/debug` subresource semantics

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 22, 2017

k8s-ci-robot assigned dashpole May 22, 2017

verb mentioned this pull request May 22, 2017

Propose feature for troubleshooting running pods kubernetes/kubernetes#35584

Closed

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2017

derekwaynecarr removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2017

derekwaynecarr assigned smarterclayton May 22, 2017

derekwaynecarr requested review from liggitt and derekwaynecarr May 22, 2017 16:57

derekwaynecarr reviewed May 22, 2017

View reviewed changes

liggitt reviewed May 23, 2017

View reviewed changes

k8s-ci-robot assigned liggitt and pwittrock May 23, 2017

verb mentioned this pull request May 23, 2017

Ephemeral Containers kubernetes/enhancements#277

Closed

23 tasks

ncdc suggested changes May 24, 2017

View reviewed changes

verb changed the title ~~Propose a feature to troubleshoot running pods~~ WIP: Propose a feature to troubleshoot running pods May 24, 2017

xiangpengzhao mentioned this pull request May 30, 2017

sidecar exec kubernetes/kubernetes#27464

Closed

verb changed the title ~~WIP: Propose a feature to troubleshoot running pods~~ Propose a feature to troubleshoot running pods May 31, 2017

verb commented May 31, 2017

View reviewed changes

mikedanese reviewed Jun 23, 2017

View reviewed changes

verb added 3 commits September 25, 2017 11:07

Add analysis of standalone pod alternative

33a5520

Switch Pod Troubleshooting API back to /exec

bcd232f

This updates the Pod Troubleshooting Design Proposal for recent developments in the community and to reflect the consensus from the API review: using the existing /exec endpoint as a starting point for this feature.

Move Pod Troubleshooting proposal into node subdir

8136e3d

verb force-pushed the pod-troubleshooting branch from bbca2a1 to 8136e3d Compare September 25, 2017 09:09

lavalamp reviewed Sep 26, 2017

View reviewed changes

lavalamp reviewed Sep 27, 2017

View reviewed changes

Name PodExecOptions fields as recommended by API reviewer

473c49d

verb force-pushed the pod-troubleshooting branch from 1dd396f to 473c49d Compare September 27, 2017 20:10

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2017

k8s-github-robot merged commit c4d900e into kubernetes:master Sep 27, 2017

liggitt reviewed Sep 28, 2017

View reviewed changes

smarterclayton reviewed Sep 28, 2017

View reviewed changes

liggitt reviewed Sep 28, 2017

View reviewed changes

lavalamp reviewed Sep 28, 2017

View reviewed changes

verb mentioned this pull request Oct 20, 2017

Add a container type to the runtime manager's container status kubernetes/kubernetes#45442

Merged

2 tasks

briansmith mentioned this pull request Feb 8, 2018

Stop including debugging utilities in the Conduit images linkerd/linkerd2#293

Closed

danehans pushed a commit to danehans/community that referenced this pull request Jul 18, 2023

Adding knabben as member (kubernetes#649)

362f347


		The process for creating a Debug Container is:

		1. `kubectl` constructs a `v1.Container` based on command line flags and


		### Additional Constraints

		1. Non-interactive workloads are explicitly supported. There are no plans to

Propose a feature to troubleshoot running pods #649

Propose a feature to troubleshoot running pods #649

Conversation

verb commented May 22, 2017 • edited Loading

verb commented May 22, 2017

dashpole commented May 22, 2017

derekwaynecarr commented May 22, 2017

derekwaynecarr commented May 22, 2017

Choose a reason for hiding this comment

verb May 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr commented May 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt May 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented May 23, 2017

verb commented May 23, 2017

verb commented May 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lavalamp commented Sep 27, 2017

k8s-github-robot commented Sep 27, 2017

verb commented May 22, 2017 •

edited

Loading

verb May 22, 2017 •

edited

Loading

liggitt May 24, 2017 •

edited

Loading