Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http-01 self check failed for domain #656

Closed
AmbroiseCouissin opened this issue Jun 14, 2018 · 67 comments
Closed

http-01 self check failed for domain #656

AmbroiseCouissin opened this issue Jun 14, 2018 · 67 comments

Comments

@AmbroiseCouissin
Copy link

AmbroiseCouissin commented Jun 14, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
I get the message: http-01 self check failed for domain ""

$ kubectl describe certificates website-cert

Name:         website-cert
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"certmanager.k8s.io/v1alpha1","kind":"Certificate","metadata":{"annotations":{},"name":"website-cert","namespace":"default"},"spe...
API Version:  certmanager.k8s.io/v1alpha1
Kind:         Certificate
Metadata:
  Cluster Name:
  Creation Timestamp:  2018-06-14T14:56:48Z
  Generation:          0
  Resource Version:    14514530
  Self Link:           /apis/certmanager.k8s.io/v1alpha1/namespaces/default/certificates/website-cert
  UID:                 2a172bc7-6fe3-11e8-a23d-00163e0067a2
Spec:
  Acme:
    Config:
      Domains:
        <redacted>.com
      Http 01:
        Ingress:  ingress
  Common Name:
  Dns Names:
    <redacted>.com
  Issuer Ref:
    Name:       letsencrypt-issuer-staging
  Secret Name:  website-cert
Status:
  Acme:
    Order:
      Challenges:
        Authz URL:  https://acme-staging-v02.api.letsencrypt.org/acme/authz/d4lkE7p4egv_GNHKOGkIZeNxANPhc4icVwX6ceSfvfQ
        Domain:     <redacted>.com
        Http 01:
          Ingress:  ingress
        Key:        VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw.UYrPMOqVi1SlKjy8hYE4t6mdtpuoNxCAANIaDzkZhw0
        Token:      VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
        Type:       http-01
        URL:        https://acme-staging-v02.api.letsencrypt.org/acme/challenge/d4lkE7p4egv_GNHKOGkIZeNxANPhc4icVwX6ceSfvfQ/135522965
        Wildcard:   false
      URL:          https://acme-staging-v02.api.letsencrypt.org/acme/order/6285995/2040425
  Conditions:
    Last Transition Time:  2018-06-14T14:56:56Z
    Message:               http-01 self check failed for domain "<redacted>.com"
    Reason:                ValidateError
    Status:                False
    Type:                  Ready
Events:
  Type    Reason       Age   From          Message
  ----    ------       ----  ----          -------
  Normal  CreateOrder  4s    cert-manager  Created new ACME order, attempting validation...

If I get all the events:

I0614 15:03:16.667525       1 controller.go:177] certificates controller: syncing item 'default/website-cert'
I0614 15:03:16.667660       1 sync.go:239] Preparing certificate default/website-cert with issuer
I0614 15:03:16.667674       1 acme.go:159] getting private key (letsencrypt-issuer-staging->tls.key) for acme issuer default/letsencrypt-issuer-staging
I0614 15:03:16.668072       1 logger.go:27] Calling GetOrder
I0614 15:03:16.876856       1 logger.go:52] Calling GetAuthorization
I0614 15:03:17.065635       1 logger.go:72] Calling HTTP01ChallengeResponse
I0614 15:03:17.065678       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/website-cert
I0614 15:03:17.065696       1 logger.go:47] Calling GetChallenge
I0614 15:03:17.266766       1 helpers.go:162] Found status change for Certificate "website-cert" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-14 15:03:17.266752283 +0000 UTC m=+20046.828096097
I0614 15:03:17.266805       1 sync.go:241] Error preparing issuer for certificate default/website-cert: http-01 self check failed for domain "<redacted>.com"
E0614 15:03:17.272906       1 sync.go:168] [default/website-cert] Error getting certificate 'website-cert': secret "website-cert" not found
E0614 15:03:17.272958       1 controller.go:186] certificates controller: Re-queuing item "default/website-cert" due to error processing: http-01 self check failed for domain "<redacted>.com"

What you expected to happen:
The self check to succeed

How to reproduce it (as minimally and precisely as possible):
Here is my Ingress:

spec:
  tls:
    - hosts:
        - <redacted>.com
      secretName: website-cert
  rules:
    - host: <redacted>.com
      http:
        paths:
          - backend:
              servicePort: 80
              serviceName: website
            path: /
          - backend:
              servicePort: 8089
              serviceName: cm-acme-http-solver-7lvgt
            path: >-
              /.well-known/acme-challenge/VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
apiVersion: extensions/v1beta1
status:
  loadBalancer:
    ingress:
      - ip: {IP}
kind: Ingress
metadata:
  uid: 6c304201-6fe2-11e8-8294-00163e020142
  resourceVersion: '14515959'
  name: ingress
  creationTimestamp: '2018-06-14T14:51:30Z'
  selfLink: /apis/extensions/v1beta1/namespaces/default/ingresses/ingress
  generation: 4
  namespace: default

Here is my Issuer:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
  name: letsencrypt-issuer-staging
  namespace: default
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: <redacted>

    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-issuer-staging
    http01: {}

Here is my certificate:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: website-cert
spec:
  secretName: website-cert
  dnsNames:
  - <redacted>.com
  acme:
    config:
    - http01:
        ingress: ingress
      domains:
      - <redacted>.com
  issuerRef:
    name: letsencrypt-issuer-staging

Anything else we need to know?:
When I navigate to

http://<redacted>.com/.well-known/acme-challenge/VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw

I get:

VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw.UYrPMOqVi1SlKjy8hYE4t6mdtpuoNxCAANIaDzkZhw0

Also, if I look at the logs of the cm-acme pod:

2018/06/14 17:31:58 [<redacted>.com] Validating request. basePath=/.well-known/acme-challenge, token=VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
2018/06/14 17:31:58 [<redacted>.com] Comparing actual host '<redacted>.com' against expected '<redacted>.com'
2018/06/14 17:31:58 [<redacted>.com] Got successful challenge request, writing key...

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:26:04Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-18T23:58:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration**: Aliyun Container Service
  • Install tools:
  • Others:

I've been struggling for two days. It's probably something really stupid from my side :)

Any idea?

@AmbroiseCouissin
Copy link
Author

The problem solved by itself today. I don't know how.

Thanks for cert-manager. It's really a great tool!

@arianitu
Copy link

I'm running into the same thing. I see in the logs say writing key..., but if I look at the certificate, it says its still validating it.

Super buggy

@munnerz
Copy link
Member

munnerz commented Jun 19, 2018 via email

@dunjoye4real
Copy link

I am running into this same issue, How long does it take?

@AmbroiseCouissin
Copy link
Author

It took me two-three days. But now when generate certificates for other subdomains, it takes less than a minute.

@oanogin
Copy link

oanogin commented Jun 27, 2018

it's strange behavior, i have 2 dns names (asd.team1.example.com and asd.another.com), and interesting points are:

  • with asd.team1.example.com domain - everything works fine
  • with certbot on this machine and both dns - everything works fine
  • both domains is also accessible ( http version of service works fine )

And only for asd.another.com i can't obtain cert by cert-manager, but with certbot on this machine everything works fine

:(

@hekonsek
Copy link

I encountered the same issue. Any address I choose for my app works, except a single one whose validation is blocked by http-01 self check failed for domain error. In particular http://foo.mydomain.com doesn't work, but for example http://foo-app.mydomain.com works like a charm and can be validated in less than a minute.

I'm trying to figure our from logs what could be a reason for this single subdomain to fail self check validation.

@gabx
Copy link

gabx commented Jul 28, 2018

Same error as the OP. Here is my certificate.yaml file. certificate has been created, but since then, no LTS certificate from let's encrypt.

% cat longhorn-certificate.yaml 
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: longhorn-thetradinghall-com
  namespace: default
spec:
  secretName: longhorn-thetradinghall-com-tls
  issuerRef:
    name: letsencrypt-cluster
    kind: ClusterIssuer
  dnsNames:
  - longhorn.thetradinghall.com
  acme:
    config:
    - http01:
        ingressClass: nginx
      domains:
      - longhorn.thetradinghall.com

@maresende
Copy link

Same error as OP.(2)

@Zetanova
Copy link

Zetanova commented Aug 3, 2018

updated now from v2.5
I tried ingressClass: nginx and ingress: my-ingress

With the last the ingress get with the acme collange extended
and can be quired successfully in the browser.

cert-manager still logs:
http-01 self check failed for domain "mydomain.tt"

@Zetanova
Copy link

Zetanova commented Aug 3, 2018

I could solve it.

The hairpin mode of the NLB in front of the cluster didnt work.

@ngo275
Copy link

ngo275 commented Aug 10, 2018

I ran into the same problem.. but I tried it again after a while then it succeeded..!
This is weird..

@Darwinyo
Copy link

i just have same problem today.
I have 5 domains to validate.
www.farmersflorals.com, farmersflorals.com, api.farmersflorals.com, identity.farmersflorasl.com, and blog.farmersflorals.com.

only 3 of those validated www.farmersflorals.com, farmersflorals.com, and identity.farmersflorals.com

other not.

all of those refer to same IP, i could access all of them, but only 3 validated, that's weird.

i'm using helm chart version 0.4.1
with ingress on GKE

@Antiarchitect
Copy link

Having the same issue. I have two clusters tuned absolutely identically in terms of nginx-ingress and cert-manager and the third one is lagging. all three domains self check failed. I have clusters for prod and staging - now it's QA turn. Nothing works. Logs don't say anything useful

@Antiarchitect
Copy link

I'm on the GCP using GKE. Removing nginx-ingress and turn on it back helped. Ephemeral external IP seems preserved magically.

@innovia
Copy link

innovia commented Aug 26, 2018

same issue here - the pod of the challenge are up running with no logs, and cert manager is failing the self check.

I manually deleted the secret for the TLS and it successfully generated the cert.

I have tested a pod with the same service account name to create and update a secret and it succeeded so its not an RBAC solution.

here's my log:

sync.go:127] Certificate "web-backend-prod-tls" for ingress "backend-web-gunicorn-nginx-ingress-config" is up to date
controller.go:152] ingress-shim controller: syncing item 'backend-prod/cm-acme-http-solver-gp4h8'

logger.go:52] Calling GetChallenge

sync.go:49] Not syncing ingress backend-prod/cm-acme-http-solver-cr47k as it does not contain necessary annotations

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-cr47k"

logger.go:52] Calling GetChallenge

controller.go:152] ingress-shim controller: syncing item 'backend-prod/cm-acme-http-solver-tmz4r'

logger.go:52] Calling GetChallenge

controller.go:152] ingress-shim controller: syncing item 'backend-prod/cm-acme-http-solver-srrjm'

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-srrjm"

controller.go:195] certificates controller: Finished processing work item "backend-prod/web-backend-prod-tls"

controller.go:152] ingress-shim controller: syncing item 'backend-prod/backend-web-gunicorn-nginx-ingress-config'

service.go:35] No existing HTTP01 challenge solver service found for Certificate "backend-prod/web-backend-prod-tls". One will be created.

sync.go:124] Certificate "web-backend-prod-tls" for ingress "backend-web-gunicorn-nginx-ingress-config" already exists

helpers.go:188] Found status change for Certificate "web-backend-prod-tls" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-25 19:55:25.649155183 +0000 UTC m=+23.060496450

sync.go:174] Certificate backend-prod/web-backend-prod-tls scheduled for renewal in -728 hours

sync.go:49] Not syncing ingress backend-prod/cm-acme-http-solver-gp4h8 as it does not contain necessary annotations

ingress.go:33] Looking up Ingresses for selector certmanager.k8s.io/acme-http-domain=3600485562,certmanager.k8s.io/acme-http-token=1325141813

ingress.go:86] No existing HTTP01 challenge solver ingress found for Certificate "backend-prod/x-server-backend-prod-tls". One will be created.

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-tmz4r"

ingress.go:86] No existing HTTP01 challenge solver ingress found for Certificate "backend-prod/web-backend-prod-tls". One will be created.

sync.go:49] Not syncing ingress backend-prod/cm-acme-http-solver-srrjm as it does not contain necessary annotations

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/backend-web-gunicorn-nginx-ingress-config"

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-gp4h8"

pod.go:49] No existing HTTP01 challenge solver pod found for Certificate "backend-prod/web-backend-prod-tls". One will be created.

sync.go:282] Error preparing issuer for certificate backend-prod/web-backend-prod-tls: [http-01 self check failed for domain "web.backend.server.com", http-01 self check failed for domain "web.server.com"]

@stefanvladvoinea
Copy link

I ran into the same issue today

@sjbarrio
Copy link

I have the same issue too

@jpfaria
Copy link

jpfaria commented Sep 1, 2018

me too

@xaralis
Copy link

xaralis commented Sep 5, 2018

@munnerz Could this issue be reopened? Seems to be happening to a lot of people, myself included.

Kubectl reports "http-01 self check failed" while solver logs claim "Got successfull challenge request, writing key ..." and seem to be stuck in loop.

xaralis@h90-dockertest1-gateway1:~$ kubectl describe certificate cert-test-rancher-f-app-it-letsencrypt
Name:         cert-test-rancher-f-app-it-letsencrypt
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  certmanager.k8s.io/v1alpha1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2018-09-05T12:16:48Z
  Generation:          1
  Resource Version:    5504
  Self Link:           /apis/certmanager.k8s.io/v1alpha1/namespaces/default/certificates/cert-test-rancher-f-app-it-letsencrypt
  UID:                 90110fb7-b105-11e8-8c33-00163e000206
Spec:
  Acme:
    Config:
      Domains:
        test.rancher.f-app.it
      Http 01:
        Ingress:
        Ingress Class:  nginx
  Common Name:          test.rancher.f-app.it
  Dns Names:
    test.rancher.f-app.it
  Issuer Ref:
    Kind:       ClusterIssuer
    Name:       letsencrypt-staging
  Secret Name:  test-rancher-f-app-it-letsencrypt-tls
Status:
  Acme:
    Order:
      Challenges:
        Authz URL:  https://acme-staging-v02.api.letsencrypt.org/acme/authz/fpiC_AFvxd3BK6450WXXWuu_18iHr0PQ6ewYHeT-e34
        Domain:     test.rancher.f-app.it
        Http 01:
          Ingress:
          Ingress Class:  nginx
        Key:              r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0.TIJdwGgLPcC8d-Ki7ofbRruiCs47RHeBVc2TttYrT34
        Token:            r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
        Type:             http-01
        URL:              https://acme-staging-v02.api.letsencrypt.org/acme/challenge/fpiC_AFvxd3BK6450WXXWuu_18iHr0PQ6ewYHeT-e34/167935148
        Wildcard:         false
      URL:                https://acme-staging-v02.api.letsencrypt.org/acme/order/6875195/7290623
  Conditions:
    Last Transition Time:  2018-09-05T12:23:17Z
    Message:               http-01 self check failed for domain "test.rancher.f-app.it"
    Reason:                ValidateError
    Status:                False
    Type:                  Ready
Events:
  Type    Reason       Age   From          Message
  ----    ------       ----  ----          -------
  Normal  CreateOrder  6m    cert-manager  Created new ACME order, attempting validation...

Solver log:

2018/09/05 12:19:37 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:39 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:39 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:39 [test.rancher.f-app.it] Got successful challenge request, writing key...
2018/09/05 12:19:40 [test.rancher.f-app.it] Validating request. basePath=/.well-known/acme-challenge, token=r7Lj-NqP1KWqmn76ccJdt-2nApm1WNFcVOUCjlyFzV0
2018/09/05 12:19:40 [test.rancher.f-app.it] Comparing actual host 'test.rancher.f-app.it' against expected 'test.rancher.f-app.it'
2018/09/05 12:19:40 [test.rancher.f-app.it] Got successful challenge request, writing key...

Cert-manager log (repeats this again and again):

I0905 12:31:22.735124       1 sync.go:242] Preparing certificate default/cert-test-rancher-f-app-it-letsencrypt with issuer
I0905 12:31:22.735137       1 acme.go:169] getting private key (letsencrypt-staging->tls.key) for acme issuer kube-system/letsencrypt-staging
I0905 12:31:22.735520       1 logger.go:27] Calling GetOrder
I0905 12:31:22.952199       1 logger.go:57] Calling GetAuthorization
I0905 12:31:23.137759       1 logger.go:77] Calling HTTP01ChallengeResponse
I0905 12:31:23.137792       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/cert-test-rancher-f-app-it-letsencrypt
I0905 12:31:23.137826       1 logger.go:52] Calling GetChallenge
I0905 12:31:23.356645       1 ingress.go:33] Looking up Ingresses for selector certmanager.k8s.io/acme-http-domain=1490028511,certmanager.k8s.io/acme-http-token=923668044
I0905 12:31:23.356919       1 helpers.go:188] Found status change for Certificate "cert-test-rancher-f-app-it-letsencrypt" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-09-05 12:31:23.35691253 +0000 UTC m=+1534.926270721
I0905 12:31:23.357090       1 sync.go:244] Error preparing issuer for certificate default/cert-test-rancher-f-app-it-letsencrypt: http-01 self check failed for domain "test.rancher.f-app.it"
E0905 12:31:23.357267       1 sync.go:165] [default/cert-test-rancher-f-app-it-letsencrypt] Error getting certificate 'test-rancher-f-app-it-letsencrypt-tls': secret "test-rancher-f-app-it-letsencrypt-tls" not found
E0905 12:31:23.379806       1 controller.go:190] certificates controller: Re-queuing item "default/cert-test-rancher-f-app-it-letsencrypt" due to error processing: http-01 self check failed for domain "test.rancher.f-app.it"
I0905 12:32:23.379648       1 controller.go:181] certificates controller: syncing item 'default/cert-test-rancher-f-app-it-letsencrypt'

@munnerz
Copy link
Member

munnerz commented Sep 5, 2018

Hey @xaralis - this issue has been closed as this error message is expected whilst the self check is failing, and otherwise issues like this can become catch-alls for common misconfiguration by users.

If you are experiencing issues, please try and put together a reproducible test case and open a new issue with instructions for how it can be reproduced, if you think you've encountered an actual bug in the self checking flow so we can (1) encode that test case into an actual automated test and (2) fix that test 😄

There's a real wide variety of issues that can cause this message to be printed - although your case, with the self check pod clearly receiving requests, does seem odd. That said, the timestamps between the two differ by 12-13 minutes, so it seems like you may be looking at different self check attempts here.

We are trying to keep the repositories issue board clean of "support" related issues, and so would prefer if you could post on Slack in order to help debug the problem. Once we've identified that it is in fact a bug, and not simply a misconfiguration, opening an issue will then be the best route so we can track and triage an actual fix 😄

@sjbarrio
Copy link

sjbarrio commented Sep 5, 2018

Mi problem persists only in prod server (https://acme-v02.api.letsencrypt.org/directory)... in stanging server it works good (https://acme-staging-v02.api.letsencrypt.org/directory)... How can I obtain more information ?

@xaralis
Copy link

xaralis commented Sep 5, 2018

@munnerz OK, I'll try the slack tomorrow if this doesn't fix itself. What is the reasonable amount of time to wait?

@saward
Copy link

saward commented Sep 5, 2018

Just in case it's helpful, I had a situation where the well-known path was set for both my main ingress and the one created by cert-manager. I think what happened is that the path set for my main ingress was the chosen one, and was automatically redirecting to SSL and failing because the certificate wasn't found.

Removing the main ingress completely and recreating seemed to resolve the issue for me.

@innovia
Copy link

innovia commented Sep 5, 2018

@saward what do you mean the main ingress for the well-known path? is this a bug? did you manually set it up before cert manager? for me once the secret was deleted it was created immediately on the already running challenge pod

@saward
Copy link

saward commented Sep 5, 2018

It might be a bug. I'll explain a bit clearer, but I am not great with the terminology and concepts yet so I may not describe things well.

I have my own ingress I've created with a few rules. cert-manager appears to create its own ingress for the domain with a rule matching a specific path, the 'well-known' path, used by let's encrypt to verify ownership.

While cert-manager was trying and failing with the self check, I checked all ingresses (kubectl describe ing). I noticed that a 'well-known' path rule existed for both the ingress I'd created as well as the one created by cert manager, even though I had never added such a rule to my own ingress. I can only assume that cert manager created the rule under both ingresses, but why and under what conditions, I'm not sure.

Edit: I just remembered, this may have been a result of me misconfiguring the certificate object, leading to the creation of an extraneous rule.

@xaralis
Copy link

xaralis commented Sep 6, 2018

For the record, my problem was:

I've been following Rancher HA setup guide which suggests having public-facing nginx load balancer. That is OK, but the problem is: their sample nginx config redirects all the HTTP traffic to HTTPS. I was having HTTPS enabled using their default self-signed certificate. That was obviously stopping let's encrypt from reaching the challenge URL.

So, if you bump into this, make sure your traffic either allows HTTP or has HTTPS with a trusted cert.

@szymonpk
Copy link

szymonpk commented Sep 6, 2018

@xaralis I had the same issue, and it wasn't the case. Nginx redirected http traffic to well-known location without a problem. Still, validation failed. I am using nginx-ingress-0.23.0 and cert-manager-v0.4.1 helm charts.

@ernoaapa
Copy link

ernoaapa commented Sep 7, 2018

@saward I faced the same problem. What misconfiguration you had?

@EIrwin
Copy link

EIrwin commented Feb 25, 2019

I know there is a lot of chatter on this topic and wanted to give what I was seeing as well as what fixed it.

In my case, I have had ingress successfully setup with cert-manager for two domains mydomain.com and www.mydomain.com running for awhile without an issue.

I recently added another host/rule/backend api.mydomain.com so that my ingress.yaml looks like the following

kind: Ingress
metadata:
  name: web
  annotations:
    kubernetes.io/ingress.class: nginx
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - mydomain.io
        - www.mydomain.io
        - api.mydomain.io <-- THIS IS WHAT WAS ADDED
      secretName: letsencrypt-prod
  rules:
    - host: mydomain.io
      http:
        paths:
          - backend:
              serviceName: web
              servicePort: 80
    - host: www.mydomain.io
      http:
        paths:
          - backend:
              serviceName: web
              servicePort: 80
    - host: api.mydomain.io <-- THIS IS WHAT WAS ADDED
      http:
        paths:
          - backend:
              serviceName: api
              servicePort: 80

I also saw the following in the ingress logs

W0225 03:46:38.166926       7 controller.go:1080] Validating certificate against DNS names. This will be deprecated in a future version.
W0225 03:46:38.166932       7 controller.go:1085] SSL certificate "default/letsencrypt-prod" does not contain a Common Name or Subject Alternative Name for server "api.mydomain.io": x509: certificate is valid for mydomain.io, www.mydomain.io, not api.mydomain.io

Additionally, (and what led me to this thread) was the output of kubectl describe certificate showed there was an issue with self check

http-01 self check failed for domain "www.mydomain.io"

Upon trying different things, within seconds of running a command to delete the letsencrypt-prod seret, it was regenerated and now everything works.

kubectldo delete secret letsencrypt-prod

@rpetteruti
Copy link

Hello I've the same problem described in this issue, waited for 5 days but cert-manager loop on "http-01 self check failed for domain" don't know what can I do in order to figure out the problem, on the same machine if I shutdown the docker enviroment and try to use the certbot client everithing works fine, I'm using the http-01 challenge.

@rbq
Copy link

rbq commented Apr 17, 2019

I'd try curl-ing the challenge endpoint from within your cluster. Had a similar problem and in my case it was the missing NAT reflection (or split DNS) that prevented cert-manager inside my cluster from verifying that the challenge was available.

@bertoost
Copy link

bertoost commented May 5, 2019

@rbq can you explain how to do that? I am facing kinda same issue and getting these errors;

I0505 20:10:05.505800       1 controller.go:206] challenges controller: syncing item 'example/letsencrypt-3860812899-1'
I0505 20:10:05.506489       1 ingress.go:49] Looking up Ingresses for selector certmanager.k8s.io/acme-http-domain=640746824,certmanager.k8s.io/acme-http-token=1208225418
I0505 20:10:05.545080       1 sync.go:176] propagation check failed: wrong status code '404', expected '200'

@rbq
Copy link

rbq commented May 6, 2019

@bertoost My Ingress was available from my workstation via HTTP, yet cert-manager complained that it couldn't verify that the HTTP challenge it added was visible to letsencrypt. So I finally figured out that it couldn't reach my WAN address from behind the NAT.

To verify, I started a container with curl (something like kubectl run -it --rm my-test --namespace=test --image=ubuntu -- bash) and tried to request anything from my Ingress using its public DNS name: curl myapp.example.com.

But yours looks like a totally different problem to me—it seems to be stuck before it even gets to the self-check.

@bertoost
Copy link

bertoost commented May 6, 2019

Hm okay. It's on my hosted VPS (not my local machine) and the weird thing is, I can access the host and view the website. Therefor I have successfully requested more certificates earlier for other projects, the same way, the same setup etc.. So, I really don't understand why this one is not working

@rbq
Copy link

rbq commented May 6, 2019

@bertoost I think it would make sense to open a separate issue and post some configuration details.

@bertoost
Copy link

bertoost commented May 6, 2019

somehow it is working .. just want to continue working on it, and suddenly it has a valid certificate retrieved from LetsEncrypt.. weird, but okay

@alepaez
Copy link

alepaez commented Jun 14, 2019

Just got into this error

wrong status code '404', expected '200'

This is my config:

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: api
  namespace: production
  annotations:
    kubernetes.io/ingress.class: "nginx"
    certmanager.k8s.io/issuer: "letsencrypt-prod"
    certmanager.k8s.io/acme-challenge-type: http01
spec:
  tls:
  - hosts:
    - my.domain
    secretName: api-tls
  rules:
  - host: my.domain
    http:
      paths:
      - path: /
        backend:
          serviceName: api
          servicePort: 3000

Found this on my nginx ingress logs:

conflicting server name "my.domain" on 0.0.0.0:80, ignored

"GET /.well-known/acme-challenge/AK94LF_RCdMq_yriPKU7IlAdxPclVzNmIAxpIfEkX-c HTTP/1.1" 404 209 "http://my.domain/.well-known/acme-challenge/AK94LF_RCdMq_yriPKU7IlAdxPclVzNmIAxpIfEkX-c" "Go-http-client/1.1" "-"

Just changed the host option on my ingress rule and the issue was fixed:

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: api
  namespace: production
  annotations:
    kubernetes.io/ingress.class: "nginx"
    certmanager.k8s.io/issuer: "letsencrypt-prod"
    certmanager.k8s.io/acme-challenge-type: http01
spec:
  tls:
  - hosts:
    - my.domain
    secretName: api-tls
  rules:
  - host: my2.domain
    http:
      paths:
      - path: /
        backend:
          serviceName: api
          servicePort: 3000

After that I had to put it back in place.

@juwalter
Copy link

Also encountered could not reach 'http://HOST.domain.NET/.well-known/acme-challenge/NldjKBM648vvka9A7VCSIKqqFwBCxM2DP5rIBgNr80s': wrong status code '404', expected '200' in kubectl -n istio-system logs -f certmanager-1c1c1c1c1c1-xnxxnnxnx

After looking at all ingresses kubectl get ingress --all-namespaces I realized that istio had created its own ingress to intercept the .well-known/acme-challenge/ call from letsencrypt.

This "letsencrypt cm-acme-http-solver" ingress is a temporary one and apparently there to intercept and answer the call to .well-known/acme-challenge/ - its rules configuration for matching a particular backend is identical to the original ingress needed for my service, except the paths: section contains the very specific path matching rule; my service was initially without a path match and probably chosen as the catch all, preventing the acme challenge from resolving.

Not working:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: istio
  name: my-dashboard-ingress
  namespace: frontend
spec:
  rules:
    - host: "host.domain.com"
      http:
        paths:
          - backend:
              serviceName: dashboard
              servicePort: 80

Working


apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: istio
  name: my-dashboard-ingress
  namespace: frontend
spec:
  rules:
    - host: "host.domain.com"
      http:
        paths:
          - backend:
              serviceName: dashboard
              servicePort: 80
            path: /

(notice the very last line path: / )

Not sure if this is just a lucky coincidence now, or if it is really needed - ymmv

@MakG10
Copy link

MakG10 commented Nov 16, 2019

My case: I changed NS DNS records, but after TTL expired, the nameserver set in kubernetes node was still pointing to the old server, which obviously was returning 404 for the HTTP challange. You can verify this using curl from node machine.

As a quick workaround, I temporally changed the nameserver in the node that was running cert-manager in /etc/resolv.conf to Google's 8.8.8.8 and set dnsPolicy to Default in cert-manager deployment. I guess you could also set dnsConfig for the cert-manager deployment instead of modyfing node's resolv.conf

If there is a better solution, then I'd be happy to hear it.

@Bram-Zijp
Copy link

Removing the NGINX ingress, the cert manager and the deployment that had a failing certificate, and adding it all back afterwards, fixed it for me too.

@ac10n
Copy link

ac10n commented May 6, 2020

I had this problem, I was following a tutorial that suggested to install nginx-ingress as well as cert-manager using kubectl apply -f .

I installed everything using helm and things worked like a charm:

helm install my-nginx-ingress stable/nginx-ingress
helm repo add jetstack https://charts.jetstack.io
helm repo update 
helm install  cert-manager jetstack/cert-manager  --namespace cert-manager --version v0.15.0 --set installCRDs=true

@jsangco
Copy link

jsangco commented May 26, 2020

I encountered this problem and the issue ended up being due to the fact that I was the setting loadBalancerSourceRanges on my ingress controller.

This caused the self check GET request to return a "connection timed out" error.

Removing the IP restrictions allowed the certificate to be successfully granted.

@ghost
Copy link

ghost commented May 31, 2020

I had this problem, I was following a tutorial that suggested to install nginx-ingress as well as cert-manager using kubectl apply -f .

I installed everything using helm and things worked like a charm:

helm install my-nginx-ingress stable/nginx-ingress
helm repo add jetstack https://charts.jetstack.io
helm repo update 
helm install  cert-manager jetstack/cert-manager  --namespace cert-manager --version v0.15.0 --set installCRDs=true

I was using also yaml files, installing with helm fixed the issue.

@hvaoc
Copy link

hvaoc commented Aug 5, 2020

@AlirezaHaghshenas & @jc-delvalle comments helped. For anyone who have wasted enough time using kuberctl appy with yaml and getting the this issue, here are the full set of commands

# Install Nginx Ingress using Helm
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install my-nginx-ingress ingress-nginx/ingress-nginx
kubectl --namespace default get services -o wide -w my-nginx-ingress-ingress-nginx-controller

# Install cert-manager using Helm
kubectl create namespace cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v0.16.0 \
  --set installCRDs=true 

@bergkvist
Copy link

@rbq I have the same problem as you it seems. I can't reach the endpoint from within the cluster, but I can reach it from my local workstation.

From within the cluster vs from local workstation:
image

How did you fix it?

@bergkvist
Copy link

bergkvist commented Sep 8, 2020

Turns out the internal services in the cluster were not able to reach things within the cluster through external IP because I had enabled PROXY protcol in my load balancer.

When I disabled PROXY protocol, the certificates were issued almost immediately.

kubectl patch -ngitlab-managed-apps service/ingress-nginx-ingress-controller -p '{"metadata":{"annotations":{"service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol":"false"}}}'
kubectl patch -ngitlab-managed-apps configmap/ingress-nginx-ingress-controller -p '{"data":{"use-proxy-protocol":"false"}}'

Which meant I could turn PROXY protocol back on:

kubectl patch -ngitlab-managed-apps service/ingress-nginx-ingress-controller -p '{"metadata":{"annotations":{"service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol":"true"}}}'
kubectl patch -ngitlab-managed-apps configmap/ingress-nginx-ingress-controller -p '{"data":{"use-proxy-protocol":"true"}}'

My application requires the use of PROXY protocol in order to check the users IP addresses. Is there a way of fixing this without having to switch PROXY protcol on and off every 90 days to renew my certs?

@ashleydavies
Copy link

ashleydavies commented Sep 11, 2020

Huge thanks @bergkvist, same issue here, appreciate you sharing the solution; spent all day trying everything except that it seems 😅

@stopsopa
Copy link

stopsopa commented Sep 25, 2020

Patch for this definitely needed, and it seems that there are people/organizations willing and knowing how to help:
kubernetes/kubernetes#66607 (comment)
I hope it will be addressed soon because @bergkvist solution is brilliant but also pretty nasty but at the moment it looks like that's all what we have.

@bergkvist
Copy link

bergkvist commented Sep 25, 2020

@stopsopa So there is actually another alternative, which makes the self-checks work even with PROXY protocol enabled.

kubectl patch -ngitlab-managed-apps service/ingress-nginx-ingress-controller -p '{"metadata":{"annotations":{"service.beta.kubernetes.io/do-loadbalancer-hostname":"example.com"}}}'

Notice that you have to explicitly write your hostname ("example.com") in order for the kubernetes iptables issue to be worked around. Not sure how this would work if you have multiple hostnames pointing to the same loadbalancer.

Subdomains work fine though (like www.example.com, subdomain.example.com etc.)

@stopsopa
Copy link

stopsopa commented Sep 25, 2020

Thanks @bergkvist mate I spent two evenings trying to make it work (cert-manager together with ip address), pulling my hair off, and actually I've been doing this too but with other configuration. I have created new cluster and tried only do-loadbalancer-hostname and... Finally worked. 🎉

@compumike
Copy link

Hi all, I ran into the same issue. I've just published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy

It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager's self-check to pass.

@nabsul
Copy link

nabsul commented Oct 24, 2020

What a coincidence. Just today I published https://github.com/nabsul/k8s-letsencrypt with instructions on how to manually issue certificates in your Kubernetes cluster. The hope being that I'll only need to manually issue certs a few times until this issue is fixed.

I wish I'd seen @compumike 's solution sooner!!

@richardthombs
Copy link

@compumike Thanks so much!! 🥇

@ahmed-adly-khalil
Copy link

I was able to fix this, the chain of issues started as follow:

I had the following in the annotation in my ingress controller

nginx.ingress.kubernetes.io/use-regex: "true"

nginx.ingress.kubernetes.io/rewrite-target: /

this caused all URLs to be rewritten to /
this caused the cert-manager to fail on self-check before communicating to let's encrypt
this caused certificate generation not to start at all
this also caused the DNS resolution from inside the cluster to fail

commenting these 2 lines made things work

@SacMV
Copy link

SacMV commented Apr 26, 2022

In my case:
Error message: cert manager challenge remote error: tls: unrecognized name
I added in my ingress annotations:
cert-manager.io/issue-temporary-certificate: "true"
acme.cert-manager.io/http01-edit-in-place: "true"

It worked.

@fabioespinosa
Copy link

As @ahmed-adly-khalil said:

Removing

nginx.ingress.kubernetes.io/rewrite-target: /$1

Worked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests