Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

storage: create k8s emptyDir inside VM #1485

Merged
merged 3 commits into from
Apr 12, 2019

Conversation

awprice
Copy link
Contributor

@awprice awprice commented Apr 5, 2019

Some notes about this PR:


This introduces a new storage type: local. Local storage type will
tell the kata-agent to create an empty directory in the sandbox
directory within the VM.

K8s host emptyDirs will then use the local storage type and mount it
inside each container. By doing this, we utilise the storage medium
that the sandbox uses. In most cases this will be 9p.

If the VM is using device mapper for container storage, the containers
will benefit from the better performance of device mapper for
host emptyDir.

Fixes #1472

Signed-off-by: Alex Price aprice@atlassian.com

We handle system directories differently, if its a bind mount
we mount the guest system directory to the container mount and
skip the 9p share mount.
However, we should not do this for docker volumes which are directories
created by Docker.

This introduces a Docker specific check, but that is the only
information available to us at the OCI layer.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
k8s host empty-dir is equivalent to docker volumes.
For this case, we should just use the host directory even
for system directories.

Move the isEphemeral function to virtcontainers to not
introduce cyclic dependency.

Fixes kata-containers#1417

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
@awprice awprice marked this pull request as ready for review April 5, 2019 00:59
@egernst
Copy link
Member

egernst commented Apr 5, 2019

Thanks for the PR @awprice. Travis is picking up some issues if you can take a look, while do the actual review.

@egernst
Copy link
Member

egernst commented Apr 5, 2019

/test

@egernst
Copy link
Member

egernst commented Apr 5, 2019

@bergwolf can you PTAL?

@@ -311,7 +311,7 @@ func TestIsEphemeralStorage(t *testing.T) {
}
defer os.RemoveAll(dir)

sampleEphePath := filepath.Join(dir, k8sEmptyDir, "tmp-volume")
sampleEphePath := filepath.Join(dir, K8sEmptyDir, "tmp-volume")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is k8sEmptyDir, not K8sEmptyDir, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's this const -

K8sEmptyDir = "kubernetes.io~empty-dir"

Which is also used here -

if storageType == K8sEmptyDir {

@devimc
Copy link

devimc commented Apr 5, 2019

/test

if mnt.Type == kataLocalDevType {
// Set the mount source path to a the desired directory point in the VM.
// In this case it is located in the sandbox directory.
mounts[idx].Source = filepath.Join(kataGuestSharedDir, sandboxID, kataLocalDevType, filepath.Base(mnt.Source))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if two empty-dirs have the same Base? Could you make sure that this path is unique.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't possible for two emptyDirs to have the same base, see example pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
    - name: test
      image: ubuntu
      args:
        - sleep
        - "3600"
  volumes:
    - name: test123
      emptyDir:
        medium: ""
    - name: test123
      emptyDir:
        medium: ""
$ k create -f test.yaml
The Pod "test" is invalid: spec.volumes[1].name: Duplicate value: "test123"

Should I still make it unique?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kataGuestSharedDir is actually tmpfs in the guest. Is it desirable to use memory for k8s empty dir? The other options are to use guest rootfs or container rootfs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is NOT desirable to use memory for k8s emptyDir (unless it is of medium: Memory) and in this case it is NOT on a tmpfs. The full path for the emptyDir directory will be: /run/kata-containers/shared/containers/<sandbox id>/local/<emptyDir name>, which is on whatever storage the sandbox container (i.e. the pause container is using).

See comment by @mcastelino:

Using the pause container for holding on to the ephemeral volumes makes a lot of sense. It keeps the individual container behaviour unchanged. This also allows the volume to grow to the limit afforded by the graph driver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awprice OK, I see. You are relying on the fact that the sandbox id is the same as the first container id. We don't guarantee that but it is also the current fact. Please add some comments above the lines to highlight it so that we don't forget to fix it in case we need to change it.

@amshinde
Copy link
Member

amshinde commented Apr 6, 2019

While this PR would help us to get away from 9p, this PR does introduce creating directories in the rootfs of the sandbox, which happens to be the pause container today. So, this PR adds some explicit dependency for the pause container to be present.

cc @bergwolf

@awprice
Copy link
Contributor Author

awprice commented Apr 8, 2019

While this PR would help us to get away from 9p, this PR does introduce creating directories in the rootfs of the sandbox, which happens to be the pause container today. So, this PR adds some explicit dependency for the pause container to be present.

cc @bergwolf

Yep, this change definitely does add a dependency on the pause container. That being said, this is a Kubernetes specific change and when not using Kubernetes, won't affect the operation of kata. There doesn't seem to be any talk of removing the pause container anytime soon as it's relied upon for setting up the network namespace for the pod.

ociSpec.Mounts[idx].Type = "ephemeral"
}
if vc.Isk8sHostEmptyDir(mnt.Source) {
ociSpec.Mounts[idx].Type = "local"
Copy link
Member

@lifupan lifupan Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it better to replace "local" with vc.kataLocalDevType ? The same to "ephemeral",even through it isn't changed by this commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it makes sense to do that, unfortunately vc.kataLocalDevType & vc.kataEphemeralDevType are private, should I make them public?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please.

@mcastelino
Copy link
Contributor

Using the pause container for holding on to the ephemeral volumes makes a lot of sense. It keeps the individual container behaviour unchanged. This also allows the volume to grow to the limit afforded by the graph driver.

@amshinde
Copy link
Member

amshinde commented Apr 8, 2019

Looks like the tests are failing as we are explicitly checking for 9p mount for host directory based empty-dir volumes in our k8s integration tests:
https://github.com/kata-containers/tests/blob/master/integration/kubernetes/k8s-empty-dirs.bats#L26

@awprice Can you a raise a PR in the tests repo to fix that?
@chavafg This PR would then need to carry a depends-on tag for the PR in the tests repo for the CI to pass?

@chavafg
Copy link
Contributor

chavafg commented Apr 8, 2019

yes, when you open the PR in the tests repo please add a Depends-on: github.com/kata-containers/runtime#nnn on your commit message where nnn is your PR number here. You can also add the tag here using the PR on the tests repo.

awprice added a commit to awprice/kata-tests that referenced this pull request Apr 9, 2019
Depends-on: github.com/kata-containers/runtime#1485

Fixes kata-containers#1432

Signed-off-by: Alex Price <aprice@atlassian.com>
@awprice
Copy link
Contributor Author

awprice commented Apr 9, 2019

@amshinde & @chavafg

I've raised kata-containers/tests#1433 in the tests repo with depends-on in the commit message.

I'm not sure the change will fix the issue with the tests failing however, I think they might be reliant on the agent changes - kata-containers/agent#521

We can see in the tests that the test pod for the emptyDir tests doesn't transition into the ready condition -

not ok 1 Empty dir volumes
# (in test file k8s-empty-dirs.bats, line 22)
#   `kubectl wait --for=condition=Ready pod "$pod_name"' failed
# INFO: k8s configured to use trusted and untrusted annotations
# pod/sharevol-kata created
# error: timed out waiting for the condition
# pod "sharevol-kata" deleted

This is most likely because the agent doesn't know how to provision "local" storage types and doesn't create the pod properly.

@amshinde
Copy link
Member

amshinde commented Apr 9, 2019

@awprice I almost forgot about the agent PR! The agent PR will need to be merged first and then this one.

Copy link
Member

@bergwolf bergwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @awprice !

@awprice
Copy link
Contributor Author

awprice commented Apr 9, 2019

Thanks for the feedback @bergwolf and @lifupan! I've addressed your comments and updated the PR.

@awprice awprice force-pushed the k8s-empty-dir-local branch 2 times, most recently from ffb4685 to 30aa64b Compare April 10, 2019 07:01
@bergwolf
Copy link
Member

/test

Copy link
Member

@jcvenegas jcvenegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to include @amshinde or her PR will be merged first?

@jcvenegas
Copy link
Member

1..1
not ok 1 Empty dir volumes
# (in test file k8s-empty-dirs.bats, line 22)
#   `kubectl wait --for=condition=Ready pod "$pod_name"' failed
# INFO: k8s configured to use trusted and untrusted annotations
# pod/sharevol-kata created
# error: timed out waiting for the condition
# pod "sharevol-kata" deleted

Nice failing as expected, need agent PR merged first

@amshinde
Copy link
Member

@jcvenegas Let's include my commits in this PR. I am going to close the other one.

This introduces a new storage type: local. Local storage type will
tell the kata-agent to create an empty directory in the sandbox
directory within the VM.

K8s host emptyDirs will then use the local storage type and mount it
inside each container. By doing this, we utilise the storage medium
that the sandbox uses. In most cases this will be 9p.

If the VM is using device mapper for container storage, the containers
will benefit from the better performance of device mapper for
host emptyDir.

Fixes kata-containers#1472

Signed-off-by: Alex Price <aprice@atlassian.com>
@lifupan
Copy link
Member

lifupan commented Apr 11, 2019

/test

@awprice
Copy link
Contributor Author

awprice commented Apr 11, 2019

Looks like the tests are going to fail until kata-containers/agent#521 is merged so we should prioritise merging that PR first.

@chavafg
Copy link
Contributor

chavafg commented Apr 12, 2019

I think next step is to merge the tests PR and then this, right?

@awprice
Copy link
Contributor Author

awprice commented Apr 12, 2019

/test

@awprice
Copy link
Contributor Author

awprice commented Apr 12, 2019

@chavafg Is it possible to re-run the tests for this PR? I'd like to see if they are passing now that the agent PR is merged.

@ganeshmaharaj
Copy link
Contributor

/retest

@ganeshmaharaj
Copy link
Contributor

Ok. That worked. Let's see how this goes @awprice

@awprice
Copy link
Contributor Author

awprice commented Apr 12, 2019

@ganeshmaharaj :( they seem to have all failed, but with a java exception? They didn't even manage to get to the emptyDir tests.

@amshinde
Copy link
Member

@chavafg @GabyCT Seeing some very unusual java exceptions here, these look like an issue with the Azure instance itself.

Running command '/usr/bin/docker [docker run --cidfile /tmp/cid798623508/d3KddNfu1XXIn7VEoFV1c9jWWMomP3 --runtime kata-runtime --name d3KddNfu1XXIn7VEoFV1c9jWWMomP3 --rm --cap-drop linux_immutable centos sh -c capsh --print]'
FATAL: command execution failed
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: java.io.IOException: Backing channel 'fedora28-azurec446b0' is disconnected.
	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:214)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)

We need to investigate these.

Meanwhile I am going to try running the CI one more time.
/test

@grahamwhaley
Copy link
Contributor

The Java exceptions are an artifact after the point of failure - they are just (normally) the jenkins stack complaining that the test aborted. Normally we need to look at the lines just above them and:

  • see what stage of the scripts failed
  • if we find a point in the script where it can fail but not give us enough clues, then we should PR an improvement to the scripts ;-)

but, I think what maybe happened here is the jenkins server had to go down for some maintenance - and all CIs have now passed.
I am seeing a big green merge button! - but, I'd feel more comfortable if somebody who has been tracking this PR series did the actual merge ;-)

@amshinde amshinde merged commit 9b622b7 into kata-containers:master Apr 12, 2019
@awprice awprice deleted the k8s-empty-dir-local branch April 15, 2019 00:41
@egernst egernst mentioned this pull request Apr 16, 2019
chavafg pushed a commit to chavafg/tests-1 that referenced this pull request May 2, 2019
Depends-on: github.com/kata-containers/runtime#1485

Fixes kata-containers#1432

Signed-off-by: Alex Price <aprice@atlassian.com>
func isEmptyDir(path string) bool {
splitSourceSlice := strings.Split(path, "/")
if len(splitSourceSlice) > 1 {
storageType := splitSourceSlice[len(splitSourceSlice)-2]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check would not work if container is mounting sub path in the emptyDir volume. In that case subpath would be append to the host path to be mounted and len(splitSourceSlice)-2 will not have kubernetes.io~empty-dir string.

@kfox1111
Copy link

when 9p based, this seems to still leave it as 9p... why isn't it using storage in the vm itself in this case?

@awprice
Copy link
Contributor Author

awprice commented May 19, 2019

when 9p based, this seems to still leave it as 9p... why isn't it using storage in the vm itself in this case?

This PR changes the storage location for emptyDir to be in the rootfs of the sandbox, i.e. the pause container. In your case the rootfs of the sandbox is 9p. If you are using devicemapper, then the rootfs of the sandbox will benefit from the speed of devicemapper.

The storage of the VM itself is actually a ram disk, see comment by @mcastelino on #1472 -

The VM rootfs today is a NVDIMM or initrd. So the VM rootfs is not backed by any host side writable storage.

@kfox1111
Copy link

That doesn't do what #1472 requested at all then. :(

What about making a qcow2 volume for the emptydirs?

amshinde added a commit to amshinde/kata-runtime that referenced this pull request Jun 24, 2019
With kata-containers#1485, we moved the default medium empty-dir creation to the
sandbox rootfs. This worked for devicemapper, but in case of overlay
the "local" directory was being created outside the sandbox rootfs.
As a result we were seeing the behaviour seen in kata-containers#1818.

Fixes kata-containers#1818

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
amshinde added a commit to amshinde/kata-runtime that referenced this pull request Jun 27, 2019
With kata-containers#1485, we moved the default medium empty-dir creation to the
sandbox rootfs. This worked for devicemapper, but in case of overlay
the "local" directory was being created outside the sandbox rootfs.
As a result we were seeing the behaviour seen in kata-containers#1818.

Fixes kata-containers#1818

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubernetes EmptyDir should be not be using 9p