-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpk: Improve k8s bundle errors + better admin API fallback #19473
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
r-vasquez
requested review from
twmb,
gene-redpanda and
Deflaimun
as code owners
June 11, 2024 01:09
r-vasquez
force-pushed
the
improve-k8s-bundle
branch
from
June 11, 2024 01:10
e26c7af
to
06b2eb1
Compare
twmb
previously approved these changes
Jun 11, 2024
r-vasquez
added
kind/enhance
New feature or request
area/k8s
and removed
area/k8s
labels
Jun 11, 2024
JakeSCahill
reviewed
Jun 11, 2024
@r-vasquez when you get the chance, can you rebase this PR on top of tip of |
r-vasquez
force-pushed
the
improve-k8s-bundle
branch
from
June 11, 2024 17:18
06b2eb1
to
2f62cd4
Compare
twmb
previously approved these changes
Jun 11, 2024
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/50122#01900911-f28e-4ad4-8a01-59c50aa2f669 |
Most of the time this step fails due to a permission error.
If a user provides a configuration file without redpanda.data_directory, rpk won't know where to find the controller log dirs. We now provide a better error message instead of: * lstat redpanda/controller/0_0: no such file or directory Either way, a configuration file (redpanda.yaml) without a data_directory is an invalid config file,
When a command fails to run, rpk will return: - couldn't save 'foo.txt': exit status 1 And will save stderr in foo.txt for full debugging. This is not clear, so users may be lost about what happened and won't know how to get pass this error. We are adding a hint of where is the rest of the error (which might be multiple lines of text)
Clusters deployed with helm/operator will now have the rpk section of the redpanda.yaml filled with the Admin API addresses of the cluster. We fallback to these addresses in case rpk can't discover the API addresses using the k8s API.
Now we want to check if the authenticated user account has authorization to collect the k8s resources needed for the debug bundle process. If not, we avoid running all the steps and instead providing a single, meaningful error message with a hint on how to solve this (link to our docs).
r-vasquez
force-pushed
the
improve-k8s-bundle
branch
from
June 12, 2024 16:23
2f62cd4
to
e779bf3
Compare
twmb
approved these changes
Jun 12, 2024
/backport v24.1.x |
This was referenced Jun 18, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Debug bundles are often collected when things are not working properly, so it is normal that
rpk debug bundle
hits some errors along the collection steps. This PR aims to improve the error messages and provide better hints when errors occur, it focuses on the Kubernetes experience.Fixes #18057
Main Changes:
/proc/slabinfo
collection often fails becauserpk debug bundle
is not being executed with root permissions:redpanda.data_directory
in the configuration file (redpanda.yaml), this is also necessary to start Redpanda, so it is often a sign of a corrupted or invalid config file. The error we were printing was not a clear indication of thatBackports Required
Release Notes
Improvements