Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self check always fail #863

Closed
slavoren opened this issue Aug 29, 2018 · 24 comments
Closed

Self check always fail #863

slavoren opened this issue Aug 29, 2018 · 24 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@slavoren
Copy link

Describe the bug:
Unable to pass "self check" when Ingress Service is using NodePort and public IP is on HA proxy (tcp mode) outside the Kubernetes cluster. We can simulate the test from cert-manger container (kubectl exec) using curl (fetching /.well-known/...), which is successful. The same applies from outside the cluster.

Logs:

helpers.go:188 Found status change for Certificate "myip-secret" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-29 14:36:25.387757463 +0000 UTC m=+2049.620517469
sync.go:244 Error preparing issuer for certificate pwe/pwe-secret: http-01 self check failed for domain "www.example.com"
controller.go:190 certificates controller: Re-queuing item "default/myip-secret" due to error processing: http-01 self check failed for domain "www.example.com"

We replaced real domain name in this bug report for www.example.com

The cert-manager is working only when public IP is on Kubernetes cluster and Ingress Service is using LoadBalancer method.

Expected behaviour:
self check to pass with NodePort on Ingress Service

Steps to reproduce the bug:

cat <<EOF > /root/nginx-ingress.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress
  namespace: nginx-ingress
spec:
  externalTrafficPolicy: Local
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
    name: http
    nodePort: 31080
  - port: 443
    targetPort: 443
    protocol: TCP
    name: https
    nodePort: 31443
  selector:
    app: nginx-ingress
EOF


cat <<EOF > /root/letsencrypt-staging.yml
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  # Adjust the name here accordingly
  name: letsencrypt-staging
spec:
  acme:
    # The ACME server URL
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: name@example.com
    # Name of a secret used to store the ACME account private key from step 3
    privateKeySecretRef:
      name: letsencrypt-staging-private-key
    # Enable the HTTP-01 challenge provider
    http01: {}
EOF

cat <<EOF > /root/myip-ingress.yml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: myip-ingress
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "nginx"
    certmanager.k8s.io/cluster-issuer: letsencrypt-staging
spec:
  tls:
  - hosts:
    - www.example.com
    secretName: myip-secret
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: myip-svc
          servicePort: 80
EOF

# Nginx ingress
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/ns-and-sa.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/default-server-secret.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/nginx-config.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/rbac/rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/daemon-set/nginx-ingress.yaml
kubectl create -f /root/nginx-ingress.yaml

# CertManager
kubectl create -f https://raw.githubusercontent.com/jetstack/cert-manager/master/contrib/manifests/cert-manager/with-rbac.yaml
kubectl create -f /root/letsencrypt-staging.yml

# MyApp
kubectl run myip --image=cloudnativelabs/whats-my-ip --replicas=1 --port=8080
kubectl expose deployment myip-svc --port=8080 --target-port=8080
kubectl create -f /root/myip-ingress.yml
openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -keyout /root/tls.key -out /root/tls.crt -subj "/CN=www.example.com"
kubectl create secret tls myip-secret --key /root/tls.key --cert /root/tls.crt

Anything else we need to know?:
It is not clear to us, what exactly the self check is expecting to find, because the fetch of /well-known key is successful (confirmed via wireshark), but the self check is running again and again and still failing. Some more details about the reason of fail would be great.

Wireshark captured data - request from Cluster Node to HA proxy:

GET /.well-known/acme-challenge/B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k HTTP/1.1
Host: pwe.kube.freebox.cz
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip

HTTP/1.1 200 OK
Server: nginx/1.15.2
Date: Wed, 29 Aug 2018 14:42:26 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 87
Connection: keep-alive

B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k.6RElade5K0jHqS1ysziuv2Gm3_LgD-D9APNRg5k8sak

Environment details::

  • Kubernetes version v1.11.2
  • cert-manager version (v0.4.1)
  • nginx-ingress (v1.15.2)
  • Install method (primary via kubectl, but we also tried helm [following this guide - https://dzone.com/articles/secure-your-kubernetes-services-using-cert-manager] with the same result):

/kind bug

@jetstack-bot jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 29, 2018
@AdrianRibao
Copy link

It happens the same to me. I'm setting a HA cluster and this is blocking us from moving the apps.

We have used the helm package for installing it. Is there any workaround that could help us to continue deploying our infrastructure?

@AdrianRibao
Copy link

Fixed in my case. The problem was that the nginx configuration in the load balancer was redirecting connections to port 80 to 443.

@julienfig
Copy link

the same here.
i got a ha cluster with nginx reverse proxy (pointing dns entry on it) and i redirect http/https port on public ips of the kubernetes nodes

Then i have my kubernetes cluster with ingress-nginx controller configured like this:

apiVersion: v1
kind: Service
metadata:
name: ingress-nginx
namespace: ingress-nginx
spec:
type: NodePort
ports:

  • name: http
    port: 80
    targetPort: 80
    protocol: TCP
  • name: https
    port: 443
    targetPort: 443
    protocol: TCP
    externalIPs:
    • public-IP-node1
    • public-IP-node2
    • public-IP-node3
      selector:
      app: ingress-nginx

This way when i use cert-manager to ger my cert, i have always a self check error (by the way, all acme challenge are checked if i do it manually inside and outside the cluster.

if i change my dns entry for one of the kubernetes nodes public ip, all is good and the certificate is issuing (but this is a big SPOF if the node where is the dns entry is going down)

@slavoren slavoren changed the title Issue certificate with Self check always fail Sep 6, 2018
@retest-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2018
@raphaelpereira
Copy link

The same happens here, but using DNAT on public IP to internal MetalLB load balance configuration.

@raphaelpereira
Copy link

I found out that the problem was that the cluster wasn't able to resolve the DNS. I solved that and it worked.

@ptjhuang
Copy link

ptjhuang commented Jan 4, 2019

Solve this myself too after a long time of messing about. Self-check is kinda tricky on your network confirmation. The certificate-mgr resolver tries to connect to itself to verify LetsEncrypt can access data at .well-known/acme-challenge/. This is often deceptively complicated in many networks. It requires the resolver being able to connect to itself using what would often resolve to a public IP address. Do a wget/curl to the .well-known/acme-challenge to see if it succeeds from the resolver container. In my case, I had to setup hairpin NAT at the router.

Is it a good idea to optionally skip self-check?

@munnerz
Copy link
Member

munnerz commented Jan 10, 2019

I'm going to close this issue out as it seems to be more related to network configuration than anything else. Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges, and exposing your ingress controller to the public internet (either via a LoadBalancer service or a NodePort) is outside the scope of cert-manager itself. We just need port 80 to work 😄

@munnerz munnerz closed this as completed Jan 10, 2019
@ptjhuang
Copy link

Port 80 isn't the issue, that's a given. The IP address is though. All installations behind NAT is likely going to fail without hairpin config. If not allow self-check be disabled, maybe mention it in docs?

@intellix
Copy link

intellix commented Apr 20, 2019

Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges

I guess this means "Cloudflare Always Use HTTPS" was causing this for me. Perhaps something about requiring port 80 and HTTP access to the domain here would be good: https://docs.cert-manager.io/en/latest/getting-started/troubleshooting.html

@ghost
Copy link

ghost commented May 1, 2019

Same issue here. I would like to disable self-check or provide the ip address of the loadbalancer because of hairpinning

@MichaelOrtho
Copy link

The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster.

Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot.

For curl if I do (from inside the cluster):

curl -I https://myhost.domain.com

it fails.

If I do (from inside the cluster):

curl -I https://myhost.domain.com --haproxy-protocol

it works.

@MichaelOrtho
Copy link

The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster.

Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot.

For curl if I do (from inside the cluster):

curl -I https://myhost.domain.com

it fails.

If I do (from inside the cluster):

curl -I https://myhost.domain.com --haproxy-protocol

it works.

I was informed by DigitalOcean team that there is a fix for this behavior. They added an additional annotation to nxinx-ingress controller service that forces Kubernetes to use domain name of public IP instead of IP and that tricks Kubernetes to think that it is not "ours" and routes network around through LB.

https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster
This is it: (I just added this one)

kind: Service
apiVersion: v1
metadata: 
  name: nginx-ingress-controller
  annotations: 
    service.beta.kubernetes.io/do-loadbalancer-hostname: "hello.example.com"

@vitobotta
Copy link

@MichaelOrtho Hi, do you know if a similar workaround exists for Scaleway? I am testing their managed Kubernetes and am having the same problem. Thanks

@AlexsJones
Copy link

@vitobotta I have found on Scaleway you need to restart coredns and it will usually succeed.

@vitobotta
Copy link

@AlexsJones Not for me. I had to add the annotation below

"service.beta.kubernetes.io/scw-loadbalancer-use-hostname": "true"

@btwiuse
Copy link

btwiuse commented May 29, 2020

...
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress
  namespace: nginx-ingress
spec:
  externalTrafficPolicy: Local
  type: NodePort
...

After changing externalTrafficPolicy: Local to externalTrafficPolicy: Cluster, I was able to perform self check.

Reason being, pod with the certificate-issuer wound up on a different node than the load balancer did, so it couldn’t talk to itself through the ingress.

@compumike
Copy link

Hi all, I ran into the same issue. I've recently published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy

It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager's self-check to pass.

@shibumi
Copy link

shibumi commented Jan 4, 2021

@munnerz I think you misunderstood the problem here. You wrote:

I'm going to close this issue out as it seems to be more related to network configuration than anything else. Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges, and exposing your ingress controller to the public internet (either via a LoadBalancer service or a NodePort) is outside the scope of cert-manager itself. We just need port 80 to work smile

The problem is not that Let's Encrypt can't reach the LoadBalancer... the problem is that certificate manager self-check can't reach it. The connection from LE to the LoadBalancer is fine, due to Destination NAT. The certificate manager inside the cluster how ever tries to resolve the domain name with the external IP and this will fail in DNAT scenarios.

@munnerz there is already a whole project just for fixing this issue. Is there really no option to just disable self-checks?

@shibumi
Copy link

shibumi commented Jan 4, 2021

Here is another possible solution:

You can use coredns for broadcasting wrong DNS records. Just create host aliases for the domains and link them to the internal cluster IPs. Then propagate these host/IP tuples via:

hosts {
    fallthrough
}

in your coredns config. This way you can use the internal IP addresses inside of your cluster. You just have to maintain another list (or you might just automate this via a custom operator or script).

@trantor1
Copy link

In DNAT Scenarios just set externalIP of a ingress Service to your external IP Addresses.

apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress-ext
  namespace: nginx-ingress
spec:
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
    name: http
  - port: 443
    targetPort: 443
    protocol: TCP
    name: https
  selector:
    app: nginx-ingress-ext
  externalIPs:
    - 11.22.33.44

kubernetes, configured with iptables, mostly standard setup,
creates iptables rules to redirect cluster internal requests to external ip's to apropriate services.

$ sudo iptables-save  | grep 11.22.33.44
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-VMPDTJD5TKOUD6KL
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -m addrtype --dst-type LOCAL -j KUBE-SVC-VMPDTJD5TKOUD6KL
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-SUC36V4R4VKNMIWK
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -m addrtype --dst-type LOCAL -j KUBE-SVC-SUC36V4R4VKNMIWK

@jeffmccune
Copy link

jeffmccune commented May 21, 2021

Switching from ipvs to iptables mode solved this for me, see also kubernetes/kubernetes#75262

@exenin
Copy link

exenin commented Sep 18, 2021

I was stuck with this selfcheck issue for the longest time.

Issue was:
My kube is a VM running RKE on openstack. My internet gateway / router forwards all 443 & 80 port traffic to the cluster.
I was able to curl the challenege and get response from my laptop.
My issue seemed to be that i could not curl the challenge from within a pod...

I just came back to say thank you to @ptjhuang for the solution on setting hairpin on the gateway/router. I wanted to just let people know that this worked for me since I felt lost for the longest time. Hope this inspires others to try this solution

@stigok
Copy link

stigok commented Jan 23, 2022

As @vitobotta points out, but with lack of context, for cert-manager running in a Scaleway Kubernetes cluster

"service.beta.kubernetes.io/scw-loadbalancer-use-hostname": "true"

This annotations should be applied to the LoadBalancer service created by ingress-nginx.

service.beta.kubernetes.io/scw-loadbalancer-use-hostname
This is the annotation that forces the use of the LB hostname instead of the public IP. This is useful when it is needed to not bypass the LoadBalacer for traffic coming from the cluster.

If you're configuring ingress-nginx with Helm, you can set the value controller.service.annotations.\"service\\.beta\\.kubernetes\\.io/scw-loadbalancer-use-hostname\" to "true"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests