Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce e2e tests for kubernetes story #15651

Closed
8 tasks done
dmitryax opened this issue Oct 25, 2022 · 21 comments
Closed
8 tasks done

Introduce e2e tests for kubernetes story #15651

dmitryax opened this issue Oct 25, 2022 · 21 comments
Assignees
Labels
ci-cd CI, CD, testing, build issues enhancement New feature or request never stale Issues marked with this label will be never staled and automatically removed processor/k8sattributes k8s Attributes processor receiver/k8scluster receiver/k8sevents receiver/kubeletstats

Comments

@dmitryax
Copy link
Member

dmitryax commented Oct 25, 2022

Problem statement

Kubernetes related components, especially k8s attributes processor are used pretty extensively in the field. At the same time they don't have good e2e test coverage. Recently we had a significant change that introduced a few regressions. We want to make sure that such regressions are caught by CICD going forward.

Proposed solution

We need to introduce a GitHub actions job that would spin up a k8s cluster with a collector built from the current branch.

Proposed steps:

  1. Build a collector docker image with required k8s related components.
  2. Spin up a kind k8s cluster.
  3. Spin up a collector with applications generating a predictable load: metrics, logs, and traces. We can use https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-demo. The collector should be configured with file exporter
  4. The OTLP JSON files written by the collector should be parsed and compared against expected values, e.g. k8s pod/deployment/namespace UIDs/names/labels/annotations etc.

Action items

Once it's introduced in the contrib repo, we can potentially reuse the same tests in the helm repo.

@dmitryax dmitryax added enhancement New feature or request ci-cd CI, CD, testing, build issues processor/k8sattributes k8s Attributes processor receiver/k8scluster needs triage New item requiring triage receiver/k8sevents labels Oct 25, 2022
@dmitryax
Copy link
Member Author

@povilasv as we discussed, PTAL if it makes sense and if you can work on it

@povilasv
Copy link
Contributor

This is cool! Makes sense to me. Do we have something similar somewhere else I can use to bootstrap this?

I can find some time to work on this

@dmitryax
Copy link
Member Author

This is cool! Makes sense to me. Do we have something similar somewhere else I can use to bootstrap this?

https://github.com/open-telemetry/opentelemetry-helm-charts has some smoke tests spinning up a cluster and installing the charts, so they can be used as a starting point

@dmitryax
Copy link
Member Author

@povilasv please let us know if you have time to work on this, otherwise, we will look if someone else can take it

@povilasv
Copy link
Contributor

Hey, I've started to look into this.

Sorry for delay had some public holidays and other work stuff :)

@povilasv
Copy link
Contributor

@dmitryax I've started playing with k8s e2e test framework -> https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/16263/files#diff-92cca7ce2020408b9bcd2fd0eb8d457505f31cebf5e124e2b9cf6f053dd7f19eR47

ATM have it spin up a simple collector deployment. I kind of like the way it's structured.

Do you have any opinions around e2e framework ?:)

@povilasv
Copy link
Contributor

povilasv commented Nov 22, 2022

Writing update:

k cp example-opentelemetry-collector-57774cbc85-fczc6:/files .
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "143fceb59597166
894af5872393f5d6de43ed69825fb54cc74f7732e11ec576e": OCI runtime exec failed: exec failed: unable to start container process: exec: "tar"
: executable file not found in $PATH: unknown

I'm still thinking how to work around this?

Sorry folks for blocking PRs, somehow I keep hitting obstacle after obstacle with no good solution :/

@fatsheep9146
Copy link
Contributor

  • Should we make a pipeline that pushes the image before we spin up a kind cluster? (I tried to do that but got docker login error)

For this question, could we consider only build image locally, but instead of pushing to docker hub, we just use kind load docker-image to load local image to kind node?
@povilasv

@fatsheep9146
Copy link
Contributor

Any progress? @povilasv

@fatsheep9146
Copy link
Contributor

Do you have time on this? @povilasv If not, I'm willing to continue on this work.

@povilasv
Copy link
Contributor

povilasv commented Dec 30, 2022

Hey @fatsheep9146 , got sick and now side tracked by other projects. It would be best if you could take over. Sorry and thank you!

@dmitryax
Copy link
Member Author

For this question, could we consider only build image locally, but instead of pushing to docker hub, we just use kind load docker-image to load local image to kind node?

I agree, we can build a small image locally with needed components only and use it in the kind cluster for testing.

@fatsheep9146
Copy link
Contributor

For this question, could we consider only build image locally, but instead of pushing to docker hub, we just use kind load docker-image to load local image to kind node?

I agree, we can build a small image locally with needed components only and use it in the kind cluster for testing.

Yes, I already tried to do like in this way, but encounter some questions to be solved.

@dmitryax
Copy link
Member Author

dmitryax commented Jan 19, 2023

@fatsheep9146 Thank you for taking care of it! And feel free to share your findings here

@fatsheep9146
Copy link
Contributor

hi @dmitryax @povilasv #17410
I add first version of workflow to verify the proposed solution

the whole workflow basically contains following steps

  1. checkout collector-contrib code and build otel-collector-contrib image
  2. start a test kind cluster
  3. use kind load docker-image command to load otel-collector-contrib image into kind cluster
  4. fetch helm charts of collector and install it into kind with the newly built image, some details
    • use fileexporter to exporter trace data
    • collector work in statefulset mode to make it easier to copy the trace data out of pod
  5. start a pod by (job/statefulset/deployment) with image telemetrygen to produce raw telemetry data and export to collector
    - I do not use opentelemetry-demo as proposed solution designs, because the whole demo is too large to run, I test it in my own virtual machine with 8U16G, I still take minutes to start them all
  6. use kubectl cp command to copy the trace data out of pod, into the processor/k8sattributeprocessor/testdata
  7. cd to processor/k8sattributeprocessor , run test TestJobE2E, the test will scan the traces data found the attributes are correctly added

Since this version is used to verify the solution, some codes are not very carefully wrote, feel free to give any comments.

@fatsheep9146
Copy link
Contributor

@dmitryax
Copy link
Member Author

dmitryax commented Feb 3, 2023

This is an amazing start! Thanks a lot, @fatsheep9146!

@dmitryax
Copy link
Member Author

dmitryax commented Feb 6, 2023

@fatsheep9146 @povilasv I created items for other k8s components. Feel free to add more tickets or take those that are unassigned

@fatsheep9146
Copy link
Contributor

@fatsheep9146 @povilasv I created items for other k8s components. Feel free to add more tickets or take those that are unassigned

I will start to complete other items :)

@fatsheep9146 fatsheep9146 added the never stale Issues marked with this label will be never staled and automatically removed label Feb 7, 2023
@povilasv
Copy link
Contributor

povilasv commented Feb 8, 2023

This is awesome :)

By the way, should we also enhance the k8sattribute tests and check for common cases like Deployments / StatefulSets / DaemonSets?

@fatsheep9146
Copy link
Contributor

By the way, should we also enhance the k8sattribute tests and check for common cases like Deployments / StatefulSets / DaemonSets?

Yes, I agree with that, feel free to make a contribution :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cd CI, CD, testing, build issues enhancement New feature or request never stale Issues marked with this label will be never staled and automatically removed processor/k8sattributes k8s Attributes processor receiver/k8scluster receiver/k8sevents receiver/kubeletstats
Projects
None yet
Development

No branches or pull requests

3 participants