Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(brigade.js): add e2e job #955

Merged
merged 2 commits into from
Oct 29, 2019
Merged

ci(brigade.js): add e2e job #955

merged 2 commits into from
Oct 29, 2019

Conversation

vdice
Copy link
Contributor

@vdice vdice commented Jul 19, 2019

Update 10/28/19 The various kinks/issues seem to have been ironed out and, as seen in the Check suite for this PR, the e2e job is running successfully. This PR is now ready for review.

This is where I'm at with adding @dgkanatsios 's e2e tests to CI. The only real change to enable running in a container was the bin dir setup.

As kind requires a Docker daemon to run (Kubernetes In Docker!), I went the route of running the tests in a Docker-In-Docker fashion, using an image that wraps docker:stable-dind with some utilities we have use for. (https://github.com/vdice/go-dind)

The part I'm stuck on is getting the kind cluster to properly launch in the context of a container in a k8s pod, as is the case when running via Brigade.

It variously fails on setup as in:

...
Creating cluster "kind" ...
 • Ensuring node image (kindest/node:v1.15.0) 🖼  ...
 ✓ Ensuring node image (kindest/node:v1.15.0) 🖼
 • Preparing nodes 📦  ...
time="2019-07-19T21:48:29.177656516Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/62dc5e06a6120ab4ac36556056afd6f40d00cdf079c55f64a9b9a6eb0978da1d/shim.sock" debug=false pid=427
 ✓ Preparing nodes 📦
 • Creating kubeadm config 📜  ...
 ✓ Creating kubeadm config 📜
 • Starting control-plane 🕹️  ...
 ✓ Starting control-plane 🕹️
 • Installing CNI 🔌  ...
 ✗ Installing CNI 🔌
time="2019-07-19T21:49:25.656763378Z" level=info msg="shim reaped" id=62dc5e06a6120ab4ac36556056afd6f40d00cdf079c55f64a9b9a6eb0978da1d
time="2019-07-19T21:49:25.666962354Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Error: failed to create cluster: failed to apply overlay network: exit status 1

Or, the cluster creation may be successful but any attempts at contacting the api server fail...

As a comparison, the kind cluster launches just fine (and e2e tests pass!) when running via the same Docker image directly (using Docker for Mac):

 $ docker run --privileged -v ${PWD}:/go/src/github.com/brigadecore/brigade -w /go/src/github.com/brigadecore/brigade -it quay.io/vdice/go-dind:v0.1.0 sh -c 'dockerd-entrypoint.sh & make e2e'
...
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.15.0) 🖼
⢎⡀ Preparing nodes 📦 INFO[2019-07-19T21:20:06.612233900Z] shim containerd-shim started                  address="/containerd-shim/moby/fa31099ed91c5dd68def0d6ba6b032cf5fce3528650fbf8aa4658db92ce7134c/shim.sock" debug=false pid=436
 ✓ Preparing nodes 📦
 ✓ Creating kubeadm config 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Cluster creation complete. You can now use the cluster with:

export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"
...
Deployment brigade-server-kashti successful. All 1 replicas are ready.
-----Creating a test project-----
Project ID: brigade-5b55ed522537b663e178f751959d234fd650d626f33f70557b2e82
-----Checking if the test project secret was created-----
-----Running a Build on the test project-----
Event created. Waiting for worker pod named "brigade-worker-01dg62jmx3zryxze72t9sntayg".
Build: 01dg62jmx3zryxze72t9sntayg, Worker: brigade-worker-01dg62jmx3zryxze72t9sntayg
prestart: no dependencies file found
[brigade] brigade-worker version: 1.1.0
[brigade:k8s] Creating PVC named brigade-worker-01dg62jmx3zryxze72t9sntayg
==> handling an 'exec' event
[brigade:app] after: default event handler fired
[brigade:app] beforeExit(2): destroying storage
[brigade:k8s] Destroying PVC named brigade-worker-01dg62jmx3zryxze72t9sntayg
-----Cleaning up-----
Deleting cluster "kind" ...

So perhaps k8s-level logic is somehow interfering with running kind (in a docker container in a k8s pod)? (For my testing, I've been using a pretty stock AKS cluster to run the Brigade job off of this branch...)

Thoughts/ideas @dgkanatsios , others?

@netlify
Copy link

netlify bot commented Jul 19, 2019

Deploy preview for brigade-docs ready!

Built with commit 7ad63ab

https://deploy-preview-955--brigade-docs.netlify.com

@vdice vdice force-pushed the ci/e2e branch 2 times, most recently from 77bc606 to c20a3a5 Compare July 19, 2019 22:03
@vdice
Copy link
Contributor Author

vdice commented Jul 19, 2019

The run here got further, but failed with loading images in and then failed on the helm3 upgrade --install (with a vague error msg, but presumably b/c some/all images didn't exist -- or, the kind cluster was in fact in a failed state, as I'd mentioned above...)

@dgkanatsios
Copy link
Contributor

dgkanatsios commented Jul 29, 2019

From the log, I see that only these 3 images fail to load

Error: failed to load image: exit status 1
kind not installed or error loading image: brigadecore/brigade-generic-gateway:c20a3a5
Loading brigadecore/brigade-vacuum:c20a3a5
Loading brigadecore/brig:c20a3a5
Loading brigadecore/brigade-worker:c20a3a5
Error: failed to load image: exit status 1
kind not installed or error loading image: brigadecore/brigade-worker:c20a3a5
Loading brigadecore/git-sidecar:c20a3a5
Error: failed to load image: exit status 1
kind not installed or error loading image: brigadecore/git-sidecar:c20a3a5

So, it's safe to assume that all the other images have been loaded correctly? Weird. Is there any way to freeze the cluster at this stage and do kubectl cluster-info and/or docker images to see what's going on (i.e. kind cluster state + correct image creation)?
Maybe also temporarily remove the @ from kind load command to see the actual error status?

@radu-matei
Copy link
Contributor

radu-matei commented Aug 4, 2019

After some digging, it turns out kind needs some host path mounts in order to work properly in a pod - specifically, /lib/modules, /sys/fs/cgroup, and an emptyDirvolume mounted at /var/lib/docker.

Here's an example of a working pod specification that can be used to start a kind cluster (note that the image used there doesn't contain the binaries, so you have to get them before trying this out):

---
apiVersion: v1
kind: Pod
metadata:
  name: dind-k8s
spec:
  containers:
    - name: dind
      image: radumatei/golang-dind:1.11-dev
      securityContext:
        privileged: true
      volumeMounts:
        - mountPath: /lib/modules
          name: modules
          readOnly: true
        - mountPath: /sys/fs/cgroup
          name: cgroup
        - name: dind-storage
          mountPath: /var/lib/docker
  volumes:
  - name: modules
    hostPath:
      path: /lib/modules
      type: Directory
  - name: cgroup
    hostPath:
      path: /sys/fs/cgroup
      type: Directory
  - name: dind-storage
    emptyDir: {}

This means that with the current Brigade release, we can't set up a pod with these mounts. However, #966 and brigadecore/brigadier#22 add support for this (although there are some checks needed), here's how the above would translate in a Brigade job:

const { events, Job } = require("brigadier")

events.on("exec", (e, p) => {
    const docker = new Job("dind", "radumatei/golang-dind:1.11-dev")
    docker.privileged = true;
    docker.volumeConfig = [
        {
            mount: {
                name: "modules",
                mountPath: "/lib/modules",
                readOnly: true
            },
            volume: {
                name: "modules",
                hostPath: {
                    path: "/lib/modules",
                    type: "Directory"
                }
            }
        },

        {
            mount: {
                name: "cgroup",
                mountPath: "/sys/fs/cgroup",
            },
            volume: {
                name: "cgroup",
                hostPath: {
                    path: "/sys/fs/cgroup",
                    type: "Directory"
                }
            }
        },
        {
            mount: {
                name: "docker-graph-storage",
                mountPath: "/var/lib/docker",
            },
            volume: {
                name: "docker-graph-storage",
                emptyDir: {}
            }
        }
    ]

    docker.tasks = [
        "dockerd-entrypoint.sh &",
        "sleep 20",
        "curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl",
        "chmod +x kubectl",
        "mv kubectl /go/bin/",
        "wget https://github.com/kubernetes-sigs/kind/releases/download/v0.4.0/kind-linux-amd64",
        "chmod +x kind-linux-amd64",
        "mv kind-linux-amd64 /go/bin/kind",
        "docker run hello-world",
        "kind create cluster",
        `export KUBECONFIG="$(kind get kubeconfig-path)"`,
        "kubectl cluster-info",
        "unset $(env | grep KUBERNETES_ | xargs)",
        "kubectl get pods -w"
    ];

    docker.run()
})

Once we agree on the structure of the public API of #966, I'll go ahead and create a new library brigade-utils similar to the GitHub library that completely abstracts all this, and exposes a simple Kind() job where you can run your E2E tests.

Edit: see brigadecore/brigade-utils#20
Edit 2: you can now use radumatei/golang-kind:1.11-0.4, which already comes with kind v0.4 and Go 1.11.

@radu-matei
Copy link
Contributor

@radu-matei radu-matei added this to the 1.2 milestone Aug 5, 2019
@vdice vdice force-pushed the ci/e2e branch 3 times, most recently from 945f23f to 98bd045 Compare August 5, 2019 23:47
@radu-matei
Copy link
Contributor

ref: kubernetes-sigs/kind#303 (comment)

@vdice vdice removed this from the 1.2 milestone Aug 13, 2019
@vdice vdice force-pushed the ci/e2e branch 3 times, most recently from 77fad9f to b8320d3 Compare October 8, 2019 23:52
@vdice
Copy link
Contributor Author

vdice commented Oct 8, 2019

@radu-matei I dusted this branch off... I'm using the latest version of the brigade-utils library (0.2.0) but it appears we probably need a new release to get the KindJob goodness in brigadecore/brigade-utils#20 ?

@radu-matei
Copy link
Contributor

You are right - I did merge the PR, but didn’t release a new version to NPM yet.
I’ll do that manually, but we should really look into that NPM job :)

@vdice
Copy link
Contributor Author

vdice commented Oct 15, 2019

We're getting closer! The e2e job itself runs great when executing the event locally, sans the usual GH Check notifications. But, when wrapped with the latter, I'm currently seeing brigadecore/brigade-utils#29 (that's why the check results never report back here.) cc @radu-matei

@vdice vdice force-pushed the ci/e2e branch 4 times, most recently from abc5371 to 2800558 Compare October 28, 2019 19:42
Signed-off-by: Vaughn Dice <vadice@microsoft.com>
@vdice vdice marked this pull request as ready for review October 28, 2019 21:23
@vdice vdice changed the title WIP ci(brigade.js): add e2e job ci(brigade.js): add e2e job Oct 28, 2019
@vdice
Copy link
Contributor Author

vdice commented Oct 28, 2019

E2E job is passing and this is now ready for review!

@radu-matei
Copy link
Contributor

🎉
I'm really happy this is working now!
Big thanks @vdice and @dgkanatsios!

I propose we monitor the cluster for a few runs, to make sure there aren't any memory / cgoups leaks, then we can go ahead and add some more comprehensive testing (like testing local dependencies, for example).

@vdice vdice merged commit a787590 into brigadecore:master Oct 29, 2019
@vdice vdice deleted the ci/e2e branch October 29, 2019 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants