-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Policy Server Readiness Probe Fails after some time #239
Comments
Happend again but now i have an error
|
Can you share with us which policies are being enforced by the policy server? |
@flavio monitor mode: |
Found something in the controller logs |
Happend again with this error in the controller , maybe its a controller problem?
|
Running with disallow-service-loadbalancer works, no errors, but also not much "traffic" |
I think the root cause is located in the controller
|
@flavio the error happens when the kubewarden controller has readiness problems So its the controller not the policy server.. |
I think we have to work on trying to reproduce this issue. As an idea: is the machine under high load/pressure? I see several pointers in this direction:
All this together leads me to think that the machine might be under high load/pressure or having slow disks, and in this situation is common to see things acting in a non-optimal way and finding this kind of errors in the logs. Could it be? However, we should try to reproduce this problems. |
@ereslibre i try my best that you can reproduce this - in my environment i can reproduce this in about ~60min The environment is an Azure AKS Cluster, i have no insight in the Controlplane - i even do not see the instances |
@ereslibre @flavio i updated to the latest versions, cannot repeoduce this anymore, but maybe this ist the real cause I closing this one - the discussion is better suited in the controller bug |
Prior to this commit, we used the low level hyper to create our HTTP server. The code being used was pretty complex due to the low level nature of this library. Moreover, the code wasn't robus enough. In certain cases the HTTP server could fail and cause the whole policy-server to drop incoming requests. This kind of failures was hard to reproduce, but some users have run into that. I was able to reproduce that too with minikube. Malformed client TLS requests could cause the server to reject them and then enter an error state. Instead of implementing all the possible workarounds for these kind of situations, this code now implements the HTTP and HTTPS server using the warp crate. Warp is built on top of hyper, but provides a ready to be consumed high level API. Our core business is not implementing HTTP(s) servers, hence by using this library we make our code more robust and improve its overall quality. As a matter of fact, thanks to this change, a lot of obscure code and repetitive one has been dropped. Also, a lot of top level dependencies have been removed, because they are now pulled via warp. FIXES kubewarden#239 Signed-off-by: Flavio Castelli <fcastelli@suse.com>
I ran into the issue while testing rc2. I found a fix :) |
Prior to this commit, we used the low level hyper to create our HTTP server. The code being used was pretty complex due to the low level nature of this library. Moreover, the code wasn't robus enough. In certain cases the HTTP server could fail and cause the whole policy-server to drop incoming requests. This kind of failures was hard to reproduce, but some users have run into that. I was able to reproduce that too with minikube. Malformed client TLS requests could cause the server to reject them and then enter an error state. Instead of implementing all the possible workarounds for these kind of situations, this code now implements the HTTP and HTTPS server using the warp crate. Warp is built on top of hyper, but provides a ready to be consumed high level API. Our core business is not implementing HTTP(s) servers, hence by using this library we make our code more robust and improve its overall quality. As a matter of fact, thanks to this change, a lot of obscure code and repetitive one has been dropped. Also, a lot of top level dependencies have been removed, because they are now pulled via warp. FIXES kubewarden#239 Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Is there an existing issue for this?
Current Behavior
After some time (~140m) the Policy Server Readiness Probe fails
Because there is no liveness probe, i have no idea if the application works
We have run the policy server with tracing enabled - no error in the log
Warning Unhealthy 52s (x92 over 157m) kubelet Readiness probe failed: Get "https://10.244.14.179:8443/readiness": dial tcp 10.244.14.179:8443: connect: connection refused
Latest Log Messages
2022-04-28T07: 42: 57.434789Z INFO validation{host="policy-server-default-7dbfb95cb8-nwh29" policy_id="clusterwide-allow-pod-privileged-psp-policy" kind="Pod" kind_group="" kind_version="v1" name="redacted-84f9d67d94-ljv5k" namespace="redacted" operation="CREATE" request_uid="7ac29b7c-b2ee-4bb3-84d1-462de6bf612a" resource="pods" resource_group="" resource_version="v1" subresource=""
}:policy_eval: policy_server: :worker: policy evaluation (monitor mode) policy_id="clusterwide-allow-pod-privileged-psp-policy" allowed_to_mutate=false response="ValidationResponse { uid: "7ac29b7c-b2ee-4bb3-84d1-462de6bf612a", allowed: true, patch_type: None, patch: None, status: Some(ValidationResponseStatus { message: Some(""), code: None }) }"
2022-04-28T07: 42: 57.434921Z DEBUG validation{host="policy-server-default-7dbfb95cb8-nwh29" policy_id="clusterwide-allow-pod-privileged-psp-policy" kind="Pod" kind_group="" kind_version="v1" name="redacted-84f9d67d94-ljv5k" namespace="redacted" operation="CREATE" request_uid="7ac29b7c-b2ee-4bb3-84d1-462de6bf612a" resource="pods" resource_group="" resource_version="v1" subresource="" allowed=true mutated=false
}: policy_server: :api: policy evaluated response="{"apiVersion":"admission.k8s.io/v1","kind":"AdmissionReview","response":{"uid":"7ac29b7c-b2ee-4bb3-84d1-462de6bf612a","allowed":true}}"
Expected Behavior
Policy Server Readiness does not fail without a reason
Steps To Reproduce
Policy Server runs in a Azure AKS Cluster
Default installation from kubewarden helm charts
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: