Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] PoC support for shim v2 to run kata containers v2 #11238

Closed

Conversation

EduardoVega
Copy link
Contributor

This PR is a PoC to integrate the shim v2 as a container monitor to run kata containers version 2. A very basic functionality has been added to create, run and stop containers. Some actions are a bit buggy, mainly those related to attach, interactive and tty but the options can be used.

Fixes #8579

All the code changes are based on https://github.com/cri-o/cri-o/blob/master/internal/oci/runtime_vm.go

An important change to consider is that the version of github.com/golang/protobuf had to be downgraded due to this issue containerd/ttrpc#62

Testing information

Host:

root@default:/home/foo# uname -a
Linux default 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux
root@default:/home/foo# cat /etc/*release*
PRETTY_NAME="Debian GNU/Linux 10 (buster)"

Kata version:

root@default:/home/foo# /opt/kata/bin/kata-runtime version
  : 2.0.3

Examples:

root@default:/home/foo/podman# ./bin/podman run --runtime /opt/kata/bin/containerd-shim-kata-v2 alpine uname -a
time="2021-08-16T22:45:07.152992075-05:00" level=warning msg="sandbox's cgroup won't be updated: cgroup path is empty" name=containerd-shim-v2 pid=929 sandbox=3b208b6c615a source=virtcontainers subsystem=sandbox
Linux 3b208b6c615a 5.4.71 #1 SMP Fri Apr 9 19:03:10 UTC 2021 x86_64 Linux

Bugs:

  • ctr-c must be pressed to exit the attached mode
  • podman stop and rm must be run since podman thinks that the container is still running
root@default:/home/foo/podman# ./bin/podman run --runtime /opt/kata/bin/containerd-shim-kata-v2 -it alpine sh
time="2021-08-16T22:49:15.560226053-05:00" level=warning msg="sandbox's cgroup won't be updated: cgroup path is empty" name=containerd-shim-v2 pid=1271 sandbox=8cf1b8d3a0c8 source=virtcontainers subsystem=sandbox
                            / # time="2021-08-16T22:49:15.585753670-05:00" level=error msg="post event" error="failed to publish event: exit status 125" name=containerd-shim-v2 pid=1271 sandbox=8cf1b8d3a0c8 source=containerd-kata-shim-v2
                                                     time="2021-08-16T22:49:15.606020074-05:00" level=error msg="post event" error="failed to publish event: exit status 125" name=containerd-shim-v2 pid=1271 sandbox=8cf1b8d3a0c8 source=containerd-kata-shim-v2

/ # uname -a
Linux 8cf1b8d3a0c8 5.4.71 #1 SMP Fri Apr 9 19:03:10 UTC 2021 x86_64 Linux
/ # hostname
8cf1b8d3a0c8
/ # id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)

Bugs:

  • when trying to exit the container the terminal gets stuck and returns the error ERRO[0061] Failed to close stdin for container "8cf1b8d3a0c8" error="invalid argument: Invalid exec id: invalid argument"

Signed-off-by: Eduardo Vega edvegavalerio@gmail.com

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: EduardoVega
To complete the pull request process, please assign tomsweeneyredhat after the PR has been reviewed.
You can assign the PR to them by writing /assign @tomsweeneyredhat in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Aug 17, 2021

@EduardoVega Could you check to see how much the size of Podman grows with this PR?

@TomSweeneyRedHat
Copy link
Member

I love wee little PRs.
Test system doesn't like it at the moment.

@EduardoVega
Copy link
Contributor Author

@rhatdan

go version go1.15.14 linux/amd64

main

-rwxrwxr-x. 1 edu edu 53M Aug 18 19:53 ./bin/podman

shimv2 branch

-rwxrwxr-x. 1 edu edu 56M Aug 18 20:04 ./bin/podman

@TomSweeneyRedHat

My bad, first time working on something this big, I can definitely make any change to improve this.

@rhatdan
Copy link
Member

rhatdan commented Aug 20, 2021

What is the benefit of this over executing kata OCI Runtime directly? Is there any way to do this without sucking in so much code.

@mheon
Copy link
Member

mheon commented Aug 20, 2021

I think the benefit is that the Kata runtime is deprecated and going away 😄

I'll try and do a full review this afternoon

@rhatdan
Copy link
Member

rhatdan commented Aug 21, 2021

Adding three megabytes in size in order to talk to a socket, seems a little extreme.

func isShimv2(name string, paths []string) bool {
// Check if oci runtime name is shimv2
// i.e containerd.shim.kata.v2
r, _ := regexp.Compile(shimv2NameRegex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use string contains or match operations did not understand the use-case of regexp it seems costly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I'll replace this with a new field in containers.conf to specify when shimv2 is used as suggested in the other comment.


// Check if any of the oci runtime paths is shimv2
// i.e /path/to/containerd-shim-kata-v2
r, _ = regexp.Compile(shimv2BinaryRegex)
Copy link
Collaborator

@flouthoc flouthoc Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same could we try string match or contains instead of regexp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this will change.

@@ -441,7 +441,8 @@ func makeRuntime(ctx context.Context, runtime *Runtime) (retErr error) {
runtime.defaultOCIRuntime = ociRuntime
}
}
logrus.Debugf("Using OCI runtime %q", runtime.defaultOCIRuntime.Path())
logrus.Debugf("Using OCI runtime name: %q", runtime.defaultOCIRuntime.Name())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Using OCI runtime %s with path %s instead of two log lines.

libpod/shimv2.go Outdated
"syscall"
"time"

tasktypes "github.com/containerd/containerd/api/types/task"
Copy link
Collaborator

@flouthoc flouthoc Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure can we abstract this into podman, vendoring seems too much just for a small use-case. I maybe wrong could you help me understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup Size matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll try to abstract this into podman. thanks

@rhatdan
Copy link
Member

rhatdan commented Aug 27, 2021

How does Podman monitor this.

podman run -d --runtime=kata ...

How do we get notified that the container exited, and what is the status? No conmon to watch it. How do we cleanup the storage?


// isShimv2 verifies if the oci runtime needs to use shimv2 daemon
// to create containers i.e kata containers v2
func isShimv2(name string, paths []string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of regexing, it's probably safer to do something in containers.conf indicating this is a shimv2 runtime - explicitly set it in the config file for safety.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's definitely better. I have seen that this is referred as a runtime_type=vm or runtime_version=v2 , is there any preference for this ?

Should this be also configured as a flag when running podman ? i.e podman run --runtime_type=v2 ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just have it be --runtime, let's not add more flags.

libpod/shimv2.go Outdated
@@ -0,0 +1,776 @@
// +build linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a dedicated build tag for this - kata, maybe - so we can optional build out support.

libpod/shimv2.go Outdated
return "", fmt.Errorf("not supported")
}

func (r *shimv2) UpdateContainerStatus(ctr *Container) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only way Kata updates us on containers exiting? IE, We have to poll them to see when containers exited? My primary concerns are round container exit - we don't have a Conmon, so we don't have a cleanup process and the container will not be unmounted into the user manually invokes a Podman command that updates its status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, Podman has gone away, so something needs to record the status of the container. In my opinion this is still conmon. Just we need a way for conmon to wait for container exit.

Copy link
Contributor Author

@EduardoVega EduardoVega Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the shimv2 daemon when is started, uses a ttrpc server exposed by containerd to publish events back.

I tried to mimic what cri-o was doing since both rely on conmon, and I think this is not being set when kata v2 is used. I will try to understand more about how cri-o handles kata container events

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 1, 2021

@EduardoVega: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@EduardoVega EduardoVega changed the title PoC support for shim v2 to run kata containers v2 [WIP] PoC support for shim v2 to run kata containers v2 Sep 22, 2021
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 22, 2021
@EduardoVega
Copy link
Contributor Author

EduardoVega commented Oct 8, 2021

Updates

After some more research, I was able to understand better how the Shimv2 can be used with Podman, so the following diagram will try to explain it

                                                          Kata container
                                                         ┌──────────────┐
                                                         │      VM      │
┌───────────────┐           ┌─────────────┐              │              │
│               │  creates  │             │  creates     │ ┌──────────┐ │
│    Podman     ├──────────►│   Shimv2    ├─────────────►│ │          │ │
│               │           │             │              │ │ container│ │
└───────────────┘           └──────┬──────┘              │ │          │ │
                                   │                     │ └──────────┘ │
                                   │publishes events     │              │
                                   │                     └──────────────┘
                                   ▼
                           ┌───────────────┐
                           │               │
                           │    Podman     │
                           │               │
                           └───────────────┘
  1. Podman validates in containers.conf if runtime supports Shimv2
  2. If supported, Podman creates Shimv2 daemon
  3. Shimv2 manages container operations (create, start, stop, ...)
  4. Shimv2 publishes container events to publish-binary (Podman) that is passed as an arg during Shimv2 creation
    • Podman will provide a new sub command publish that will receive and use this events to update the state of the container

So based on these findings and the feedback provided in this PR, these could be the following items to be worked

  1. Add a new property in containers.conf to determine which runtimes support the Shimv2 implementation
  2. Add small workflow to run a container with --detach flag, handle events, stop and delete container
    • Size matters, if code can be abstracted it will be instead of vendor modules
    • Add flag to enable this support during compilation time
  3. Add support for remaining operations and execution modes (--tty, --interactive, ...)
    • Small PRs to make review process easier
  4. Add testing for Podman and Shimv2

Finally, this past months I've been very busy but I should be able to get more progress on this now. Thanks

@rhatdan
Copy link
Member

rhatdan commented Oct 8, 2021

This sounds good, but we still need to be able to run traditional OCI containers and shimv2 containers at the same time.

podman ps
Needs to show both.

It needs to be easy to switch between the two

podman --runtime shimv2 run ....
podman --runtime crun run ....

Not sure we want to call this shimv2, perhaps katav2 or something like that might be easier for users to understand.

@EduardoVega
Copy link
Contributor Author

@rhatdan understood.

The shimv2 is just an interface that can be implemented by any runtime project, examples:

This is an example of how this could be run

containers.conf

runtime_supports_shimv2 = ["katav2"]

katav2 = [
             "/opt/kata/bin/containerd-shim-kata-v2"
]

Run Podman

podman run --runtime katav2 ...

Let me know if I'm missing something, thanks.

@rhatdan
Copy link
Member

rhatdan commented Oct 8, 2021

No that looks perfect.

This PR is a PoC to integrate the shimv2 as a container runtime to run
kata containers version 2. A very basic functionality has been added to
create, run and stop containers. Some actions are a bit buggy, mainly
those related to attach, interactive and tty but the options can be used.

Signed-off-by: Eduardo Vega <edvegavalerio@gmail.com>
@EduardoVega EduardoVega force-pushed the 8579-shimv2-integration branch from ddf8f4d to 2b8eeb8 Compare November 3, 2021 21:38
@github-actions
Copy link

github-actions bot commented Dec 4, 2021

A friendly reminder that this PR had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Dec 6, 2021

@EduardoVega Still working on this?

@rhatdan rhatdan removed the stale-pr label Dec 6, 2021
@github-actions
Copy link

github-actions bot commented Jan 6, 2022

A friendly reminder that this PR had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jan 6, 2022

Since I have not heard back, I am going to close this PR, if/when someone wants to work on it, we can reopeon or create a new PR.

@rhatdan rhatdan closed this Jan 6, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. stale-pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support shim v2 e.g. for Kata Containers 2.0
5 participants