Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert dumpDiag to use kind export logs #591

Merged
merged 4 commits into from
Apr 25, 2023
Merged

Conversation

astoycos
Copy link
Contributor

Today in KTF we manually step through all the pods/objects in the cluster to get debug information. KIND now supports simply running kind export logs which does a better job of giving us a cluster debug overview. Switch the DumpDiagnostics function to use this command for cluster log dumping instead.

Intentionally leave the logic which allows each addon to write their own specific cleanup functionality.

@astoycos astoycos requested review from a team and shaneutt as code owners March 21, 2023 15:52
@CLAassistant
Copy link

CLAassistant commented Mar 21, 2023

CLA assistant check
All committers have signed the CLA.

@astoycos astoycos marked this pull request as draft March 21, 2023 15:52
@astoycos
Copy link
Contributor Author

testing before moving out of draft state

@astoycos astoycos temporarily deployed to gcloud March 21, 2023 15:59 — with GitHub Actions Inactive
@astoycos astoycos temporarily deployed to gcloud March 21, 2023 16:19 — with GitHub Actions Inactive
@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2023

Codecov Report

Patch coverage has no change and project coverage change: -47.94 ⚠️

Comparison is base (5e019ce) 57.89% compared to head (ef75273) 9.95%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #591       +/-   ##
==========================================
- Coverage   57.89%   9.95%   -47.94%     
==========================================
  Files          43      43               
  Lines        3451    3455        +4     
==========================================
- Hits         1998     344     -1654     
- Misses       1189    3056     +1867     
+ Partials      264      55      -209     
Flag Coverage Δ
integration-test 9.95% <0.00%> (-47.94%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/clusters/diagnostics.go 0.00% <0.00%> (-41.99%) ⬇️
pkg/clusters/types/kind/cluster.go 44.87% <0.00%> (-30.13%) ⬇️
pkg/clusters/types/kind/utils.go 24.75% <0.00%> (-21.92%) ⬇️

... and 35 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@astoycos
Copy link
Contributor Author

Example dump from a KIND cluster can be seen in artifacts here: https://github.com/redhat-et/bpfd/actions/runs/4481401852

@astoycos astoycos temporarily deployed to gcloud March 21, 2023 17:57 — with GitHub Actions Inactive
@astoycos astoycos temporarily deployed to gcloud March 21, 2023 19:18 — with GitHub Actions Inactive
@czeslavo
Copy link
Contributor

From the functional perspective that looks good to me now. 🎉

The integration tests seem to be failing because of missing access to secrets, we need to figure out a way around it to make the PR green.

In the meantime, we will also need a changelog entry for the changes. Bonus points from me for an integration test for Kind implementation that would ensure the dump functionality indeed works as expected if you have time for it. :)

@czeslavo czeslavo temporarily deployed to gcloud March 22, 2023 13:15 — with GitHub Actions Inactive
@czeslavo
Copy link
Contributor

Rebased onto main after making the workflow skip tests that require secrets: #597

@astoycos astoycos marked this pull request as ready for review March 22, 2023 13:53
@astoycos
Copy link
Contributor Author

Thanks @czeslavo! I tested against a KIND cluster as mentioned above, but don't have a GKE cluster to test against :(

@czeslavo
Copy link
Contributor

Thanks @czeslavo! I tested against a KIND cluster as mentioned above, but don't have a GKE cluster to test against :(

Understood, let's skip GKE for now. What I meant was adding an automated integration test to our test suite under test/integration for Kind. You could take some inspiration from test/integration/kind_cluster_test.go and kind_version_test.go. I'd imagine the test could go as follows:

  1. Create a Kind cluster.
  2. Call its DumpDiagnostics method.
  3. Ensure that the returned directory exists and is not empty.

You can run an individual integration test using TEST_RUN=YourTestName make test.integration.

@astoycos
Copy link
Contributor Author

+1 sorry I just saw your previous comment, I'll try and get to that Int test today!

@shaneutt
Copy link
Contributor

Just checking in on this one, any help needed?

@astoycos
Copy link
Contributor Author

Yep sorry for the snail pace I started this locally just need to finish it up...

Copy link
Contributor

@rainest rainest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a test run with this on https://github.com/Kong/kubernetes-ingress-controller/actions/runs/4694072692?pr=3886 and it looks the describe info and resource dumps are getting lost with this change.

diagnostics-integration-tests-enterprise-postgres.zip has logs in the new KIND logs format, but no kubectl_get_all.yaml or kubectl_describe_all.txt. Am I missing somewhere else they got moved, or are those indeed getting lost somehow?

@randmonkey
Copy link
Contributor

randmonkey commented Apr 19, 2023

I did a test run with this on https://github.com/Kong/kubernetes-ingress-controller/actions/runs/4694072692?pr=3886 and it looks the describe info and resource dumps are getting lost with this change.

diagnostics-integration-tests-enterprise-postgres.zip has logs in the new KIND logs format, but no kubectl_get_all.yaml or kubectl_describe_all.txt. Am I missing somewhere else they got moved, or are those indeed getting lost somehow?

I think the dump of kubectl get all and kubectl describe all are removed.

@astoycos
Copy link
Contributor Author

astoycos commented Apr 20, 2023

Updated to add back kubectl-decribe-all and kubectl-get-all for ALL diagnostic runs, Also added an integration test and updated the existing addon one.

Just need to make sure TestKindDiagnosticDump and TestKongAddonDiagnostics passes correctly in GH actions.

@astoycos astoycos temporarily deployed to gcloud April 21, 2023 02:29 — with GitHub Actions Inactive
@shaneutt
Copy link
Contributor

Looks like we need to rebase for latest main but then otherwise tests are passing @astoycos 👍

@randmonkey randmonkey temporarily deployed to gcloud April 21, 2023 09:27 — with GitHub Actions Inactive
randmonkey
randmonkey previously approved these changes Apr 21, 2023
pkg/clusters/diagnostics.go Outdated Show resolved Hide resolved
test/integration/kind_diagnostics_test.go Outdated Show resolved Hide resolved
Today in KTF we manually step through all the pods/objects in the
cluster to get debug information. KIND now supports simply
running `kind export logs` which does a better job of giving us
a cluster debug overview. Switch the DumpDiagnostics function to
use this command for cluster log dumping instead.

Intentionally leave the logic which allows each addon to write their own
specific cleanup functionality.

Also continue to export the `kubectl_describe_all.txt` and `kubectl_get_all.yaml`
files for all cluster types.

Signed-off-by: Andrew Stoycos <astoycos@redhat.com>
Add new integration test for using `kind export logs` in DumpDiagnostics.
Update old addon test to use the new filepath for pod logs.

Signed-off-by: Andrew Stoycos <astoycos@redhat.com>
Add godoc comment to DumpAllDescribeAll function.

use t.cleanup instead of defer.

Signed-off-by: Andrew Stoycos <astoycos@redhat.com>
randmonkey
randmonkey previously approved these changes Apr 25, 2023
Copy link
Member

@pmalek pmalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing major from my side, just 2 nits to consider

pkg/clusters/diagnostics.go Outdated Show resolved Hide resolved
pkg/clusters/diagnostics.go Outdated Show resolved Hide resolved
czeslavo
czeslavo previously approved these changes Apr 25, 2023
Copy link
Contributor

@czeslavo czeslavo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Standardize function signatures for all cluster diagnostics functions.

Standardize name of diagnostic output directory from a mixture of "output"
and "outDir" to just "outdir".

Signed-off-by: Andrew Stoycos <astoycos@redhat.com>
@astoycos astoycos dismissed stale reviews from czeslavo and randmonkey via d66cb8b April 25, 2023 13:17
@shaneutt shaneutt requested review from pmalek and czeslavo April 25, 2023 14:29
@astoycos astoycos temporarily deployed to gcloud April 25, 2023 14:29 — with GitHub Actions Inactive
Copy link
Member

@pmalek pmalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@shaneutt shaneutt enabled auto-merge (squash) April 25, 2023 14:32
@shaneutt shaneutt merged commit ba60f3a into Kong:main Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants