Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudflare: external-dns takes over existing DNS record #3706

Closed
korobass opened this issue Jun 19, 2023 · 13 comments
Closed

Cloudflare: external-dns takes over existing DNS record #3706

korobass opened this issue Jun 19, 2023 · 13 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@korobass
Copy link

korobass commented Jun 19, 2023

What happened:
external-dns was ignoring existing DNS record(created outside of external-dns) for few days from 13th till 15th of June , and suddenly on 15th of June it deleted it and created a new one according to ingress definition deployed on the 13th of June

What you expected to happen:
external-dns ignores the existing DNS record. We've been before planned migration from one system to the other one, and this happened unexpectedly in production, based on existing ingress definition.

How to reproduce it (as minimally and precisely as possible):
I can't reproduce it by creating a new record, but as you see from the Pod logs, it happened.
The related Ingress object and DNS record was created on 13th of June(2023-06-13T08:30:21Z) and external-dns didn't apply any changes to the DNS record until 15th of June(2023-06-15T10:11:38Z).

Anything else we need to know?:

{
  "action": "CREATE",
  "level": "info",
  "msg": "Changing record.",
  "record": "api.example.com",
  "time": "2023-06-15T10:09:54Z",
  "ttl": 1,
  "type": "CNAME",
  "zone": "zone_id"
}
{
  "action": "CREATE",
  "level": "error",
  "msg": "failed to create record: An A, AAAA, or CNAME record with that host already exists. For more details, refer to <https://developers.cloudflare.com/dns/manage-dns-records/troubleshooting/records-with-same-name/>. (81053)",
  "record": "api.example.com",
  "time": "2023-06-15T10:09:55Z",
  "ttl": 1,
  "type": "CNAME",
  "zone": "zone_id"
}
--
{
  "action": "DELETE",
  "level": "info",
  "msg": "Changing record.",
  "record": "api.example.com",
  "time": "2023-06-15T10:11:38Z",
  "ttl": 1,
  "type": "CNAME",
  "zone": "zone_id"
}
{
  "action": "CREATE",
  "level": "info",
  "msg": "Changing record.",
  "record": "api.example.com",
  "time": "2023-06-15T10:11:39Z",
  "ttl": 1,
  "type": "CNAME",
  "zone": "zone_id"
}

Environment:

  • External-DNS version (use external-dns --version): v20230529-v0.13.5
  • DNS provider: cloudflare
  • Others: EKS cluster 1.26
  • external-dns configuration:
containers:
     - args:
       - --log-level=info
       - --log-format=json
       - --interval=1m
       - --source=ingress
       - --policy=upsert-only
       - --registry=txt
       - --txt-owner-id=env-eks-cluster
       - --txt-prefix=env-
       - --domain-filter=example.com
       - --provider=cloudflare
       - --zone-id-filter=zone_id
       - --regex-domain-filter=.*.example.com
       - --cloudflare-proxied
@korobass korobass added the kind/bug Categorizes issue or PR as related to a bug. label Jun 19, 2023
@zeqk
Copy link

zeqk commented Jun 19, 2023

I have the same problem to...

failed to create record: DNS Validation Error (1004)

time="2023-06-16T00:37:36Z" level=info msg="Changing record." action=DELETE record=hello-world-ingress.mydomain.com.ar ttl=1 type=A zone=c878d0959b80baf39244ea04f3bcecba
time="2023-06-16T00:37:37Z" level=info msg="Changing record." action=CREATE record=hello-world-ingress.mydomain.com.ar ttl=1 type=A zone=c878d0959b80baf39244ea04f3bcecba
time="2023-06-16T00:37:37Z" level=error msg="failed to create record: DNS Validation Error (1004)" action=CREATE record=hello-world-ingress.mydomain.com.ar ttl=1 type=A zone=c878d0959b80baf39244ea04f3bcecba
time="2023-06-16T00:37:37Z" level=info msg="Changing record." action=UPDATE record=hello-world-ingress.mydomain.com.ar ttl=1 type=TXT zone=c878d0959b80baf39244ea04f3bcecba
time="2023-06-16T00:37:37Z" level=info msg="Changing record." action=UPDATE record=a-hello-world-ingress.mydomain.com.ar ttl=1 type=TXT zone=c878d0959b80baf39244ea04f3bcecba

This is my manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mydomaincomar-external-dns
  namespace: external-dns
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mydomaincomar-external-dns
  template:
    metadata:
      labels:
        app: mydomaincomar-external-dns
    spec:
      containers:
        - name: external-dns
          image: registry.k8s.io/external-dns/external-dns:v0.13.5
          args:
            - '--source=ingress'
            - '--domain-filter=mydomain.com.ar'
            - '--provider=cloudflare'
            - '--cloudflare-proxied'
            - '--cloudflare-dns-records-per-page=5000'
            - '--log-level=debug'
            - '--txt-owner-id=aks-itools-iprd-ue'
          env:
            - name: CF_API_TOKEN
              valueFrom:
                secretKeyRef:
                  name: cloudflare
                  key: mydomaincomar-token
                  optional: false
          resources:
            limits:
              cpu: 10m
              memory: 32Mi
            requests:
              cpu: 5m
              memory: 16Mi

the same problem with v0.13.4 to

@szuecs
Copy link
Contributor

szuecs commented Jun 21, 2023

Can you please show the ingress resources that have the same hostname in spec including status?

@johngmyers
Copy link
Contributor

It looks like the CloudFlare provider handles updates where the targets change by doing a delete followed by an insert.

So I suspect the source ingress changed its target. The resulting update was rendered as a delete followed by an insert. The provider then unexpectedly matched the delete request to the existing DNS record, so deleted the existing DNS record.

@zeqk
Copy link

zeqk commented Jun 30, 2023

@szuecs yes, this is my ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hello-world-ingress-static
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/rewrite-target: /static/$2
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - hello-world-ingress.mydomain.com.ar
      secretName: hello-world-ingress.mydomain.com.ar--tls
  rules:
    - host: hello-world-ingress.mydomain.com.ar
      http:
        paths:
          - path: /static(/|$)(.*)
            pathType: Prefix
            backend:
              service:
                name: aks-helloworld-one
                port:
                  number: 80

and this another has the same problem

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: loki-loki-distributed-gateway
  namespace: monitoring
  labels:
    app.kubernetes.io/component: gateway
    app.kubernetes.io/instance: loki
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: loki-distributed
    app.kubernetes.io/version: 2.6.1
    argo-tracking/instance: iprd-loki
    helm.sh/chart: loki-distributed-0.67.1
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - loki-gateway.mydomain.com.ar
      secretName: loki-gateway.mydomain.com.ar-tls
  rules:
    - host: loki-gateway.mydomain.com.ar
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: loki-loki-distributed-gateway
                port:
                  number: 80
time="2023-06-30T18:25:04Z" level=debug msg="Endpoints generated from ingress: monitoring/loki-loki-distributed-gateway: [loki-gateway.mydomain.com.ar 0 IN A  31.151.135.2 [] loki-gateway.mydomain.com.ar 0 IN A  31.151.135.2 []]"

time="2023-06-30T18:25:04Z" level=debug msg="Removing duplicate endpoint loki-gateway.mydomain.com.ar 0 IN A  31.151.135.2 []"

time="2023-06-30T18:25:04Z" level=info msg="Changing record." action=CREATE record=loki-gateway.mydomain.com.ar ttl=1 type=A zone=c878d0959b80baf39244ea04f3bcecb5

time="2023-06-30T18:25:05Z" level=error msg="failed to create record: DNS Validation Error (1004)" action=CREATE record=loki-gateway.mydomain.com.ar ttl=1 type=A zone=c878d0959b80baf39244ea04f3bcecb5

time="2023-06-30T18:25:05Z" level=info msg="Changing record." action=CREATE record=loki-gateway.mydomain.com.ar ttl=1 type=TXT zone=c878d0959b80baf39244ea04f3bcecb5

time="2023-06-30T18:25:05Z" level=info msg="Changing record." action=CREATE record=loki-gateway.mydomain.com.ar ttl=1 type=TXT zone=c878d0959b80baf39244ea04f3bcecb5

time="2023-06-30T18:25:05Z" level=info msg="Changing record." action=CREATE record=a-loki-gateway.mydomain.com.ar ttl=1 type=TXT zone=c878d0959b80baf39244ea04f3bcecb5

time="2023-06-30T18:26:04Z" level=debug msg="Endpoints generated from ingress: monitoring/loki-loki-distributed-gateway: [loki-gateway.mydomain.com.ar 0 IN A  31.151.135.2 [] loki-gateway.mydomain.com.ar 0 IN A  31.151.135.2 []]"

time="2023-06-30T18:26:04Z" level=debug msg="Removing duplicate endpoint loki-gateway.mydomain.com.ar 0 IN A  31.151.135.2 []"

time="2023-06-30T18:26:05Z" level=info msg="Changing record." action=CREATE record=loki-gateway.mydomain.com.ar ttl=1 type=A zone=c878d0959b80baf39244ea04f3bcecb5

time="2023-06-30T18:26:05Z" level=error msg="failed to create record: DNS Validation Error (1004)" action=CREATE record=loki-gateway.mydomain.com.ar ttl=1 type=A zone=c878d0959b80baf39244ea04f3bcecb5

time="2023-06-30T18:26:05Z" level=info msg="Changing record." action=CREATE record=loki-gateway.mydomain.com.ar ttl=1 type=TXT zone=c878d0959b80baf39244ea04f3bcecb5
time="2023-06-30T18:26:06Z" level=error msg="failed to create record: Record already exists. (81057)" action=CREATE record=loki-gateway.mydomain.com.ar ttl=1 type=TXT zone=c878d0959b80baf39244ea04f3bcecb5
time="2023-06-30T18:26:06Z" level=info msg="Changing record." action=CREATE record=a-loki-gateway.mydomain.com.ar ttl=1 type=TXT zone=c878d0959b80baf39244ea04f3bcecb5
time="2023-06-30T18:26:06Z" level=error msg="failed to create record: Record already exists. (81057)" action=CREATE record=a-loki-gateway.mydomain.com.ar ttl=1 type=TXT zone=c878d0959b80baf39244ea04f3bcecb5

The result, no A DNS record

image

@zeqk
Copy link

zeqk commented Jun 30, 2023

A think I found the problem. If I try to use the cloudflare api

curl --request POST \
  --url https://api.cloudflare.com/client/v4/zones/c8tdtrstrsatarstarstb5/dns_records \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer B0Ftrstarstsrtrastyma6' \
  --data '{
  "content": "198.51.100.4",
  "name": "loki-gateway.mydomain.com.ar",
  "proxied": true,
  "type": "A",
  "comment": "Domain verification record",
  "tags": [
    "owner:dns-team"
  ],
  "ttl": 3600
}'

this is the result

{"result":null,
"success":false,
"errors":[
    {"code":1004,"message":"DNS Validation Error","error_chain":[{"code":9300,"message":"DNS record has 1 tags, exceeding the quota of 0."}]}],
"messages":[]}

dns records tags has a quota limit https://developers.cloudflare.com/dns/manage-dns-records/reference/record-attributes/#record-tags

cloudflare/cloudflare-docs@5f28cc7#diff-cf34244c4a2521367ec10c503fd5ef084ad5ef94cc93a693ced66232f3a87175

there is any way to see the error_chain in the container logs?
external-dns try to add the A record with a tag? if so, there is any way to avoid the use of record tags?

@johngmyers
Copy link
Contributor

I don't see any references to the Tags field of cloudflare.DNSRecord from external-dns code.

@zeqk
Copy link

zeqk commented Jul 17, 2023

So why the error "DNS Validation Error"? Is there a way to see more details of the error?

@johngmyers
Copy link
Contributor

The "DNS Valdiation Error" was not reported in the initial description. It is probably a separate, unrelated issue.

@ashtonian
Copy link

Still an issue.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 26, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 27, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants