-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PESDLC-1113 Copy logs from agent and RP pods #17772
Conversation
EC2 test run:
|
/ci-repeat 1 |
new failures in https://buildkite.com/redpanda/redpanda/builds/48123#018f085c-7474-4b3e-b2ac-9d1bd0d8dae4:
new failures in https://buildkite.com/redpanda/redpanda/builds/48230#018f119a-6ef6-4e46-8a17-b7cbef6e856e:
|
97ac409
to
5eaa5ef
Compare
/ci-repeat 1 |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48230#018f11b9-e027-4032-843c-412a6f75cd07 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48709#018f40f3-a1d2-4340-9aaa-98448686dddb ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48709#018f40e3-aea2-40b8-8a37-2b638952338f ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48856#018f5aa3-ebf1-4884-951e-bd3e87bc614b ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48856#018f5df0-c8dd-447e-906a-0e826719a6a4 |
c72decb
to
39550ff
Compare
/ci-repeat 1 |
Linter error are from separate module and not related to this change
|
/ci-repeat 1 |
21bdc15
to
53314a2
Compare
tests/rptest/clients/kubectl.py
Outdated
for line in process.stdout: # type: ignore | ||
yield line | ||
except Exception as e: | ||
raise RuntimeError(f'Failed to capture output') from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What value does this exception chaining add? Actually I find the error confusing because it's not really the capture that failed in almost all cases but the process itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a way to show some helpful errors if a process failed with something meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, there should be separate ticket/task to update this to return actual error from shell. Right now underlying error masks it under "return code 1" exception and this is useful when something wrong with the environment
Implement simple class to track time and log milestones with ease.
Implement simple script that extracts specific log lines from rp log based on broker name and time.
To eliminate additional function call, just implement fake account class inside Cloud broker with similar functionality
To unify calls to node shell, put it inside CloudBroker class. This way actual shell pod could be left intact and not being created/deleted for each activity
Enable KubeNodeShell to be initialized in demand and clean when needed
Use CloudBroker initialization to inject script onto broker node
Copy agent logs first Then use ThreadPoolExecutor to copy logs from brokers Use stopwatch to track time
f0e51fc
to
91d418b
Compare
Rebased with latest changes from @travisdowns on capture errors from cmd output |
91d418b
to
0cb51c8
Compare
Also refactor capture cmd to support latest updates to cmd
0cb51c8
to
97928f5
Compare
@@ -16,6 +25,10 @@ def __init__(self, pod, kubectl, logger) -> None: | |||
self._meta = pod['metadata'] | |||
self.name = self._meta['name'] | |||
|
|||
# Backward compatibility | |||
# Various classes will use this to hash and compare nodes | |||
self.account(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this account
need to be a class, that we instantiate?
Actually I don't really understand what is happening here at all :).
After this line, the CloudBroker object is unchanged right?
Looking further down, may be the intent is something like self.account = account(pod)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is a mimic of ducktape mechanism that is defined here.
Ducktape uses separate class to define sophisticated way to do SSH to node and run commands: Linux_RemoteAccount
. Since we need to be consistent with normal logs extraction when searching for errors inside them in raise_on_badlogs
, we just defining fake class with one needed attribute: hostname
which is consumed in multiple places as if it is duck-tape's node.account. class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primary cause for this dance is here.
Class BafLogLines used in separate module and there was a discussion couple of weeks back not to overcomplicate it with inheritance (which was original solution)
c074747
to
5ef6f10
Compare
5ef6f10
to
a8909aa
Compare
Take advantage of call to logs property and simulate DT log copy using streaming approach to eliminate huge memory consumptions.
Backports Required
Release Notes