-
Notifications
You must be signed in to change notification settings - Fork 196
CC: newly pulled pause image by snapshotter stored in an unexpected location #5781
Comments
This PR is to skip a test `Test can pull an unencrypted image inside the guest` for IBM Z secure execution until the containerd is updated to v1.7. Fixes: kata-containers#5781 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Btw, this issue also shows up on other platforms and has surfaced across multiple PRs. It seems likely that this would also affect users deploying our upcoming release. |
If this issue is also the case for other platforms, this would affect users using a cluster (containerd 1.6.x) created without the snapshotter. What do you think? @stevenhorsman @fidencio |
So I think there are potentially two separate things going on, that may, or may not be related:
issues which we've seen a few times on different platforms and
which we've only seen on the s390x system. So either it is not related, or the fact that most of the key already exists errors have happened on the AMD nodes that don't run the same tests, so we wouldn't know, so I think we should potentially separate these issues? |
Yeah, I was thinking that while writing the comment. I would say the latter doesn't seem @fitzthum wanted to bring on the table. We have to discuss whether the |
In the kubernetes agent_image test we currently have a check: ``` echo "Check the image was not pulled in the host" local pod_id=$(kubectl get pods -o jsonpath='{.items..metadata.name}') retrieve_sandbox_id rootfs=($(find /run/kata-containers/shared/sandboxes/${sandbox_id}/shared \ -name rootfs)) [ ${#rootfs[@]} -eq 1 ] ``` to ensure that the image hasn't been pulled onto the host. The reason that the check is for a single rootfs is that we found that the pause image was always pulled on the host, presumably due to it being needed to create the pod sandbox. With the introduction of the nydus-snapshotter code we've found that on some systems (SE and TDX) it appears to be in a different location with nydus-snapshotter, so check for 1, or 0. See an issue at kata-containers#5781 to track this. We don't have time to understand this fully now, so we just want the tests to pass and check that we don't have both the pause and test pod container image pulled, so set the check to pass if there are 1, or 0 rootfs' found in /run/kata-containers/shared/sandboxes/ Fixes: kata-containers#5790 Signed-off-by: stevenhorsman <steven@uk.ibm.com>
I found that test 4 failed due to a stale kata process on the TDX CI machine while running the operator tests.:
after I killing the stale kata process, all the tests(including test 4) passed. |
@BbolroC This could potentially be the reason for the failure of test 4 on the SE machine as well. |
Thanks @ChengyuZhu6. I will check that out today if that is the cause for SE after the kata AC meeting (I have a schedule before it) |
@ChengyuZhu6 @stevenhorsman @fidencio I've confirmed that the 4th test |
Thanks, this means when we move this into |
Description of problem
With a config
IMAGE_OFFLOAD_TO_GUEST=yes
andFORKED_CONTAINERD=no
, a pod creation under IBM Z SE is sometimes stuck in aCreateContainerError
state with the following error:It is a known issue with an upstream containerd
v1.6.8
(#5775 (comment)). A quick remedy would be to remove apause
image and get the snapshotter to pull the image. But the newly pulled image is stored in an unexpected location (originally/run/kata-containers/shared/sandboxes/${sandbox_id}/shared
is expected) as follows:This leads to a test failure for
Test can pull an unencrypted image inside the guest
.tests/integration/kubernetes/confidential/agent_image.bats
Line 71 in 61806ee
This could be resolved by bumping the containerd to v1.7, but is not an option at the moment.
The error looks only happening at http://jenkins.katacontainers.io/job/kata-containers-CCv0-ubuntu-20.04-s390x-SE-daily/. We could skip the test until the update is finished.
The text was updated successfully, but these errors were encountered: