Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing: Save pod logs and resources after e2e test run #234

Merged
merged 2 commits into from
Sep 7, 2023

Conversation

jmprusi
Copy link
Member

@jmprusi jmprusi commented May 29, 2023

Description

Generate a summary of logs/resources after e2e runs.

  • Disabled by default, to enable, pass the ARTIFACT_PATH env var when running the e2e tests: ARTIFACT_PATH=/tmp/e2e/ make test

  • Enabled on GitHub CI, you can access the artifacts via the "e2e / e2e-kind (pull_request)" check/action.

image

Current output of a e2e test run:

.
├── argocd-operator-system
│   └── argocd-operator-controller-manager-deployment.yaml
├── catalogd-system
│   ├── catalogd-controller-manager-77b88f9f46-85hg2-kube-rbac-proxy-logs.txt
│   ├── catalogd-controller-manager-77b88f9f46-85hg2-manager-logs.txt
│   └── catalogd-controller-manager-deployment.yaml
├── cert-manager
│   ├── cert-manager-5f76f6fd49-4926c-cert-manager-logs.txt
│   ├── cert-manager-cainjector-68698c5958-nvxzg-cert-manager-logs.txt
│   ├── cert-manager-cainjector-deployment.yaml
│   ├── cert-manager-deployment.yaml
│   ├── cert-manager-webhook-5c9c98975c-pplrr-cert-manager-logs.txt
│   └── cert-manager-webhook-deployment.yaml
├── crdvalidator-system
│   ├── crd-validation-webhook-67795fdc9d-nvz57-crd-validation-webhook-logs.txt
│   └── crd-validation-webhook-deployment.yaml
├── default
├── local-path-storage
│   ├── local-path-provisioner-684f458cdd-5vsfh-local-path-provisioner-logs.txt
│   └── local-path-provisioner-deployment.yaml
├── operator-controller-system
│   ├── operator-controller-controller-manager-5fdb4997d8-l8sqp-kube-rbac-proxy-logs.txt
│   ├── operator-controller-controller-manager-5fdb4997d8-l8sqp-manager-logs.txt
│   └── operator-controller-controller-manager-deployment.yaml
├── operator-qztk2xvw-bundleDeployment.yaml
├── operator-qztk2xvw-bvww52-bundle.yaml
├── operator-qztk2xvw-operator.yaml
├── rukpak-system
│   ├── core-66d8b5494d-vr494-kube-rbac-proxy-logs.txt
│   ├── core-66d8b5494d-vr494-manager-logs.txt
│   ├── core-deployment.yaml
│   ├── helm-provisioner-68b8df478-p8gp7-kube-rbac-proxy-logs.txt
│   ├── helm-provisioner-68b8df478-p8gp7-manager-logs.txt
│   ├── helm-provisioner-deployment.yaml
│   ├── rukpak-webhooks-5cd9dddf98-f6gx8-webhooks-logs.txt
│   └── rukpak-webhooks-deployment.yaml
└── test-catalog-catalogsource.yaml

9 directories, 29 files

TODO:

  • Enable/Disable
  • Custom e2e artifacts path
  • Download artifact on GitHub actions.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2023
@jmprusi jmprusi force-pushed the jmprusi/test_logs branch 2 times, most recently from 790d618 to 98e238d Compare May 30, 2023 13:39
@jmprusi jmprusi marked this pull request as ready for review May 30, 2023 13:56
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 30, 2023
@jmprusi jmprusi changed the title Expose logs/resources after test run Testing: Save pod logs and resources after e2e test run May 30, 2023
.github/workflows/e2e.yaml Outdated Show resolved Hide resolved
test/e2e/install_test.go Outdated Show resolved Hide resolved
test/e2e/install_test.go Outdated Show resolved Hide resolved
test/e2e/install_test.go Outdated Show resolved Hide resolved

// Get all namespaces
namespaces, err := kubeClient.CoreV1().Namespaces().List(ctx, metav1.ListOptions{})
Expect(err).To(Not(HaveOccurred()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global: I can't remember - does Expect abort the test immediately if the expectation fails? If so, I'd recommend distinguishing between a hard-failure (e.g. couldn't create kubeClient, couldn't create artifactPath) and soft failures (couldn't list something, couldn't marshal something, etc.). For the soft failures, I'd suggest logging the issue and continuing, so you try to capture as much as you can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expect does abort immediately

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a useful logger... fmt.Printf is good enough? what do you propose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://onsi.github.io/ginkgo/#logging-output has details on what's available

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, PTAL.

test/e2e/install_test.go Outdated Show resolved Hide resolved
test/e2e/install_test.go Outdated Show resolved Hide resolved
test/e2e/install_test.go Outdated Show resolved Hide resolved
Comment on lines 229 to 236
buf := make([]byte, 1024)
for {
_, err := logs.Read(buf)
if err != nil {
break
}
}

// Save logs to artifact path
err = os.WriteFile(filepath.Join(namespaceArtifactPath, pod.Name+"-"+container.Name+"-logs.txt"), buf, 0644)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut is telling me this will only read 1024 bytes of the log at a time, and only write out the last 1024 bytes? (Although golang surprises me sometimes.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding is that this will read from the logs 1024 bytes at a time until it encounters an error (ideally and EOF error to signal the end of the file, but the error check doesn't seem to differentiate - and maybe there isn't a need to) and then will break and write the output. I would personally prefer to substitute this for loop with a buf, err := io.ReadAll(logs) as it should handle all the reading for us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs can be very long and artifact gathers that get OOMKilled write nothing :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I previously suggested a small buffer and io.Copy...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be reading it wrong, but it looks like io.ReadAll() internally reads a small buffer at a time: https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/io/io.go;drc=dc8e2a6a8ec94f2c98ba20edd57932eba284efb1;l=690

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree with the suggestion of io.Copy for writing as the stream is read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It starts small, but because this function returns the entire contents, io.ReadAll continually increases the size of the buffer as it reads more data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using io.Copy now, let me know if that's better

Comment on lines 23 to 31
ARTIFACT_PATH=/tmp/artifacts make e2e

- uses: cytopia/upload-artifact-retry-action@v0.1.7
if: ${{ always() }}
with:
name: e2e-artifacts
path: /tmp/artifacts/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to make the artifact path a variable we define once and use in multiple places. Some docs on github action (environment) variables: https://docs.github.com/en/actions/learn-github-actions/variables#about-variables

test/e2e/install_test.go Outdated Show resolved Hide resolved
test/e2e/install_test.go Show resolved Hide resolved

// Get all namespaces
namespaces, err := kubeClient.CoreV1().Namespaces().List(ctx, metav1.ListOptions{})
Expect(err).To(Not(HaveOccurred()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expect does abort immediately

Comment on lines 229 to 236
buf := make([]byte, 1024)
for {
_, err := logs.Read(buf)
if err != nil {
break
}
}

// Save logs to artifact path
err = os.WriteFile(filepath.Join(namespaceArtifactPath, pod.Name+"-"+container.Name+"-logs.txt"), buf, 0644)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding is that this will read from the logs 1024 bytes at a time until it encounters an error (ideally and EOF error to signal the end of the file, but the error check doesn't seem to differentiate - and maybe there isn't a need to) and then will break and write the output. I would personally prefer to substitute this for loop with a buf, err := io.ReadAll(logs) as it should handle all the reading for us.

@jmprusi jmprusi force-pushed the jmprusi/test_logs branch 2 times, most recently from c69e58f to 97bba47 Compare May 31, 2023 12:59
test/e2e/install_test.go Outdated Show resolved Hide resolved
@jmprusi jmprusi force-pushed the jmprusi/test_logs branch 3 times, most recently from 3a5881c to 5f6a597 Compare June 1, 2023 11:06
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 1, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 1, 2023
@jmprusi jmprusi force-pushed the jmprusi/test_logs branch 3 times, most recently from 6234c66 to 393c7b8 Compare June 1, 2023 17:47
@jmprusi jmprusi requested review from ncdc and everettraven June 1, 2023 21:54
@jmprusi jmprusi requested a review from a team as a code owner July 19, 2023 08:34
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 19, 2023
@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Patch coverage has no change and project coverage change: -4.42% ⚠️

Comparison is base (1ec4a23) 81.64% compared to head (b9824db) 77.23%.

❗ Current head b9824db differs from pull request most recent head 29b8758. Consider uploading reports for the commit 29b8758 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #234      +/-   ##
==========================================
- Coverage   81.64%   77.23%   -4.42%     
==========================================
  Files          21       18       -3     
  Lines         937      795     -142     
==========================================
- Hits          765      614     -151     
- Misses        119      146      +27     
+ Partials       53       35      -18     
Flag Coverage Δ
e2e ?
unit 77.23% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2023
@jmprusi jmprusi force-pushed the jmprusi/test_logs branch 2 times, most recently from 20aaceb to ca58e6d Compare July 31, 2023 10:49
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2023
@joelanford
Copy link
Member

What's the status of this PR? Functionality like this would help shed light on this apparent e2e code coverage change. #336 (comment)

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 21, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 24, 2023
const (
defaultTimeout = 30 * time.Second
defaultPoll = 1 * time.Second
testCatalogRef = "localhost/testdata/catalogs/test-catalog:e2e"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed - e2e_suite_test.go currently has consts for this, and a helper function getCatalogImageRef which supports optionally changing the catalog image via an env var.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// - bundle
// - bundledeployments
// - catalogsources

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -222,3 +243,163 @@ var _ = Describe("Operator Install", func() {

})
})

// getArtifactsOutput gets all the artifacts from the test run and saves them to the artifact path.
// right now it will save:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// right now it will save:
// Currently it saves:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ncdc
Copy link
Member

ncdc commented Sep 7, 2023

@jmprusi if you can rebase and make my requested changes, we can merge

Signed-off-by: Joaquim Moreno Prusi <joaquim@redhat.com>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 7, 2023
Signed-off-by: Joaquim Moreno Prusi <joaquim@redhat.com>
@ncdc ncdc added this pull request to the merge queue Sep 7, 2023
Merged via the queue into operator-framework:main with commit 2560015 Sep 7, 2023
9 checks passed
@jmprusi jmprusi deleted the jmprusi/test_logs branch September 8, 2023 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants