-
Notifications
You must be signed in to change notification settings - Fork 376
storage: create k8s emptyDir inside VM #1485
Conversation
We handle system directories differently, if its a bind mount we mount the guest system directory to the container mount and skip the 9p share mount. However, we should not do this for docker volumes which are directories created by Docker. This introduces a Docker specific check, but that is the only information available to us at the OCI layer. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
k8s host empty-dir is equivalent to docker volumes. For this case, we should just use the host directory even for system directories. Move the isEphemeral function to virtcontainers to not introduce cyclic dependency. Fixes kata-containers#1417 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Thanks for the PR @awprice. Travis is picking up some issues if you can take a look, while do the actual review. |
/test |
3b27b22
to
9628c24
Compare
@bergwolf can you PTAL? |
@@ -311,7 +311,7 @@ func TestIsEphemeralStorage(t *testing.T) { | |||
} | |||
defer os.RemoveAll(dir) | |||
|
|||
sampleEphePath := filepath.Join(dir, k8sEmptyDir, "tmp-volume") | |||
sampleEphePath := filepath.Join(dir, K8sEmptyDir, "tmp-volume") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is k8sEmptyDir, not K8sEmptyDir, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's this const -
runtime/virtcontainers/mount.go
Line 347 in 9628c24
K8sEmptyDir = "kubernetes.io~empty-dir" |
Which is also used here -
runtime/virtcontainers/mount.go
Line 389 in 9628c24
if storageType == K8sEmptyDir { |
/test |
virtcontainers/kata_agent.go
Outdated
if mnt.Type == kataLocalDevType { | ||
// Set the mount source path to a the desired directory point in the VM. | ||
// In this case it is located in the sandbox directory. | ||
mounts[idx].Source = filepath.Join(kataGuestSharedDir, sandboxID, kataLocalDevType, filepath.Base(mnt.Source)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if two empty-dirs have the same Base? Could you make sure that this path is unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't possible for two emptyDirs to have the same base, see example pod spec:
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test
image: ubuntu
args:
- sleep
- "3600"
volumes:
- name: test123
emptyDir:
medium: ""
- name: test123
emptyDir:
medium: ""
$ k create -f test.yaml
The Pod "test" is invalid: spec.volumes[1].name: Duplicate value: "test123"
Should I still make it unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kataGuestSharedDir
is actually tmpfs in the guest. Is it desirable to use memory for k8s empty dir? The other options are to use guest rootfs or container rootfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is NOT desirable to use memory for k8s emptyDir (unless it is of medium: Memory
) and in this case it is NOT on a tmpfs. The full path for the emptyDir directory will be: /run/kata-containers/shared/containers/<sandbox id>/local/<emptyDir name>
, which is on whatever storage the sandbox container (i.e. the pause container is using).
See comment by @mcastelino:
Using the pause container for holding on to the ephemeral volumes makes a lot of sense. It keeps the individual container behaviour unchanged. This also allows the volume to grow to the limit afforded by the graph driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awprice OK, I see. You are relying on the fact that the sandbox id is the same as the first container id. We don't guarantee that but it is also the current fact. Please add some comments above the lines to highlight it so that we don't forget to fix it in case we need to change it.
While this PR would help us to get away from 9p, this PR does introduce creating directories in the rootfs of the sandbox, which happens to be the pause container today. So, this PR adds some explicit dependency for the pause container to be present. cc @bergwolf |
Yep, this change definitely does add a dependency on the pause container. That being said, this is a Kubernetes specific change and when not using Kubernetes, won't affect the operation of kata. There doesn't seem to be any talk of removing the pause container anytime soon as it's relied upon for setting up the network namespace for the pod. |
pkg/katautils/create.go
Outdated
ociSpec.Mounts[idx].Type = "ephemeral" | ||
} | ||
if vc.Isk8sHostEmptyDir(mnt.Source) { | ||
ociSpec.Mounts[idx].Type = "local" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it better to replace "local" with vc.kataLocalDevType ? The same to "ephemeral",even through it isn't changed by this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it makes sense to do that, unfortunately vc.kataLocalDevType
& vc.kataEphemeralDevType
are private, should I make them public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please.
Using the pause container for holding on to the ephemeral volumes makes a lot of sense. It keeps the individual container behaviour unchanged. This also allows the volume to grow to the limit afforded by the graph driver. |
Looks like the tests are failing as we are explicitly checking for 9p mount for host directory based empty-dir volumes in our k8s integration tests: @awprice Can you a raise a PR in the tests repo to fix that? |
yes, when you open the PR in the tests repo please add a |
Depends-on: github.com/kata-containers/runtime#1485 Fixes kata-containers#1432 Signed-off-by: Alex Price <aprice@atlassian.com>
I've raised kata-containers/tests#1433 in the tests repo with I'm not sure the change will fix the issue with the tests failing however, I think they might be reliant on the agent changes - kata-containers/agent#521 We can see in the tests that the test pod for the emptyDir tests doesn't transition into the ready condition -
This is most likely because the agent doesn't know how to provision "local" storage types and doesn't create the pod properly. |
@awprice I almost forgot about the agent PR! The agent PR will need to be merged first and then this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @awprice !
9628c24
to
7c83990
Compare
ffb4685
to
30aa64b
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to include @amshinde or her PR will be merged first?
Nice failing as expected, need agent PR merged first |
@jcvenegas Let's include my commits in this PR. I am going to close the other one. |
This introduces a new storage type: local. Local storage type will tell the kata-agent to create an empty directory in the sandbox directory within the VM. K8s host emptyDirs will then use the local storage type and mount it inside each container. By doing this, we utilise the storage medium that the sandbox uses. In most cases this will be 9p. If the VM is using device mapper for container storage, the containers will benefit from the better performance of device mapper for host emptyDir. Fixes kata-containers#1472 Signed-off-by: Alex Price <aprice@atlassian.com>
30aa64b
to
76c4639
Compare
/test |
Looks like the tests are going to fail until kata-containers/agent#521 is merged so we should prioritise merging that PR first. |
I think next step is to merge the tests PR and then this, right? |
/test |
@chavafg Is it possible to re-run the tests for this PR? I'd like to see if they are passing now that the agent PR is merged. |
/retest |
Ok. That worked. Let's see how this goes @awprice |
@ganeshmaharaj :( they seem to have all failed, but with a java exception? They didn't even manage to get to the emptyDir tests. |
@chavafg @GabyCT Seeing some very unusual java exceptions here, these look like an issue with the Azure instance itself.
We need to investigate these. Meanwhile I am going to try running the CI one more time. |
The Java exceptions are an artifact after the point of failure - they are just (normally) the jenkins stack complaining that the test aborted. Normally we need to look at the lines just above them and:
but, I think what maybe happened here is the jenkins server had to go down for some maintenance - and all CIs have now passed. |
Depends-on: github.com/kata-containers/runtime#1485 Fixes kata-containers#1432 Signed-off-by: Alex Price <aprice@atlassian.com>
func isEmptyDir(path string) bool { | ||
splitSourceSlice := strings.Split(path, "/") | ||
if len(splitSourceSlice) > 1 { | ||
storageType := splitSourceSlice[len(splitSourceSlice)-2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check would not work if container is mounting sub path in the emptyDir volume. In that case subpath would be append to the host path to be mounted and len(splitSourceSlice)-2 will not have kubernetes.io~empty-dir string.
when 9p based, this seems to still leave it as 9p... why isn't it using storage in the vm itself in this case? |
This PR changes the storage location for emptyDir to be in the rootfs of the sandbox, i.e. the pause container. In your case the rootfs of the sandbox is 9p. If you are using devicemapper, then the rootfs of the sandbox will benefit from the speed of devicemapper. The storage of the VM itself is actually a ram disk, see comment by @mcastelino on #1472 -
|
That doesn't do what #1472 requested at all then. :( What about making a qcow2 volume for the emptydirs? |
With kata-containers#1485, we moved the default medium empty-dir creation to the sandbox rootfs. This worked for devicemapper, but in case of overlay the "local" directory was being created outside the sandbox rootfs. As a result we were seeing the behaviour seen in kata-containers#1818. Fixes kata-containers#1818 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
With kata-containers#1485, we moved the default medium empty-dir creation to the sandbox rootfs. This worked for devicemapper, but in case of overlay the "local" directory was being created outside the sandbox rootfs. As a result we were seeing the behaviour seen in kata-containers#1818. Fixes kata-containers#1818 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Some notes about this PR:
This introduces a new storage type: local. Local storage type will
tell the kata-agent to create an empty directory in the sandbox
directory within the VM.
K8s host emptyDirs will then use the local storage type and mount it
inside each container. By doing this, we utilise the storage medium
that the sandbox uses. In most cases this will be 9p.
If the VM is using device mapper for container storage, the containers
will benefit from the better performance of device mapper for
host emptyDir.
Fixes #1472
Signed-off-by: Alex Price aprice@atlassian.com