-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieving logs on pod ready #5229
Comments
Yes previously it was only on watchLog.
This was touched on a little in #4741 but there wasn't any follow up at that time. The default retry logic should now be responsible for 500 errors. That just leaves a narrow possibility of some 400 errors that we're taking responsibility to wait for - but with a check on pod status that is looking for ready or succeeded. That existing check, even for watchLog, is problematic as you're noting in determining an early exit. Options include:
|
I'm not sure what IMO, from the context of #4741 and from what I see in the described use-case, if we get an error, I think we should just fail fast. Once the connection has been established, there shouldn't be any recovery attempt since there's no way to determine what was already processed (the purpose of bookmarks in watches). If the connection has not yet been established, then we should fail fast with an Exception providing the reason (Pod ceased to exist, Pod is not accepting connections, and so on). |
It means there's no wait nor retry for ready or any other condition when performing pod operations. |
I think that for Pod Log retrieval, we should basically do the same as kubectl then (don't bother), and fail fast with an exception. |
Just to make sure there's no confusion about what's there currently - the timeout has soft enforcement and defaults to 5 seconds. If the pod does not become ready or succeeded in that amount of time, the operation will still proceed. Granted even 5 seconds could be too long of an artificial wait if expecting an early exit. |
OK, I clearly missed this. Then we should improve Javadoc to reflect this, both watchLog and getLog are subject to this. Adding a clarifying paragraph in all entries stating that by default it will wait 5 seconds for the Pod to be ready, but that you can changing with |
BTW, the time units seem to be wrong too. Line 61 in f63507c
And in others is used as seconds: Lines 125 to 126 in 1c3baf9
Lines 127 to 128 in bafc518
|
That came from the existing javadoc on withLogWaitTimeout - so that's always been wrong too :( |
Since ms is the public exposed unit (+ is more fine-grained), I think it can be easy to change everything internally to match it |
I had the exact same problem because of this, when the pod failed very fast, no logs could be retrieved as the condition did not evaluate to true anymore for readiness. The pod is log streamable if its in failed, in succeeded or in running, so it doesn't have to be ready though. I fixed that by waiting until the pod was ready to stream logs with the following code: fun Pod.ready(): Boolean = when (status?.phase) {
"Running", "Succeeded", "Failed" -> true
else -> false
}
client.pods()
.withLabel(JOB_ID_LABEL, "$jobId")
.waitUntilCondition({ pod: Pod? -> pod?.ready() == true }, 150, TimeUnit.SECONDS) Checking the status works very reliable. |
Can you elaborate on that - were you explicitly setting withReadyWaitTimeout? If so the likely problem was the mismatch between the javadocs (ms) and the logic actually expecting seconds compounded by the lack of fail-fast behavior. This issue should address that mismatch and improve the fail-fast nature of the check being performed. |
okay, picture it the following. I create a Pod. The pod now runs for instance a Python script. This python script has a syntax error in the first line. This results in the pod failing. It is now in the phase The problem now arises from try {
// Wait for Pod to become ready or succeeded
podOperation.waitUntilCondition(p -> {
podRef.set(p);
return p != null && (Readiness.isPodReady(p) || Readiness.isPodSucceeded(p));
},
logWaitTimeout,
TimeUnit.SECONDS);
} If the Pod is already terminated, with Even kubectl allows querying a Pod for logs if its dead. The logic from above works only, if the Pod has not started yet, finished successfully or is currently Running. |
Yes, we're on the same page, that is what I'm referring to as better fail fast behavior. |
When I understood you correctly, this means as a user of the client I have to make sure on my own, that the pod queryable for logs, right? The lib is not doing any conditional waiting anymore until some ready state is reached? kubectl doesn't have this feature currently, but there are a few discussions about that and its a request by multiple users:
Can't we just adjust the existing |
also fixing the timeout units to match the javadocs
also fixing the timeout units to match the javadocs
Please review #5245 |
Sorry for the buzz, looks good 😊 |
also fixing the timeout units to match the javadocs
also fixing the timeout units to match the javadocs
Description
In #4695 / #4637 we introduced some waiting logic for regular operations. One of these waits was related to the log retrieval.
Although this seems like a good idea, it doesn't fit all purposes. Some Pods might have started but aren't ready, or might have failed which makes them unready. However, they might have logged something which can be crucial to detect bugs and so on.
For all of these use cases, we should remove the Readiness check and replace by something else to simply detect if the Pod is live and can be queried for logs.
/cc @shawkins
The text was updated successfully, but these errors were encountered: