Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v.0.14.0 [AWS] External-DNS cannot remove records from 2 Route 53 hosted zones (InvalidChangeBatch: [The request contains an invalid set of changes]) #4241

Closed
leonardocaylent opened this issue Feb 7, 2024 · 25 comments · Fixed by #4296
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@leonardocaylent
Copy link
Contributor

What happened:
External-DNS pod can create records but cannot delete records from 2 different hosted zones since 0.14.0. This doesn't happen on 0.13.6
What you expected to happen:
External-DNS detects A & TXT records on 2 Hosted zones and can remove them without making the pod crash
On version 0.14.0:
level=error msg="Failure in zone internal.dev.mydomain.com. [Id: /hostedzone/<HOSTEDZONE1>] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'A ....
On version 0.13.6 and earlier:

How to reproduce it (as minimally and precisely as possible):
Create 2 Hosted Zones with overlapping names (internal.dev.yourdomain.com & dev.yourdomain.com)
Install External-DNS 0.14.0 on EKS
Create an ingress that the host is testapplication.internal.dev.yourdomain.com
Wait for external-dns to detect the changes
External-DNS will create the records correctly in the 2 hosted zones
Remove the ingress created
Wait for external-dns to detect the changes
Error will show up in the external-dns pod logs:

Desired change: DELETE testapplication.internal.dev.yourdomain.com A [Id: /hostedzone/<HostedZoneNº1>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com A [Id: /hostedzone/<HostedZoneNº1>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com TXT [Id: /hostedzone/<HostedZoneNº1>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com TXT [Id: /hostedzone/<HostedZoneNº1>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com A [Id: /hostedzone/<HostedZoneNº2>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com A [Id: /hostedzone/<HostedZoneNº2>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com TXT [Id: /hostedzone/<HostedZoneNº2>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com TXT [Id: /hostedzone/<HostedZoneNº2>]
level=error msg="Failure in zone internal.dev.yourdomain.com. [Id: /hostedzone/<HostedZoneNº1>] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set

How to reproduce the expected/previous behaviour?:
Create 2 Hosted Zones with overlapping names (internal.dev.yourdomain.com & dev.yourdomain.com)
Install External-DNS 0.13.6 on EKS
Create an ingress that the host is testapplication.internal.dev.yourdomain.com
Wait for external-dns to detect the changes
External-DNS will create the records correctly in the 2 hosted zones
Remove the ingress created
Wait for external-dns to detect the changes
Success will show up in the external-dns pod logs:

msg="Applying provider record filter for domains: [internal.sandbox.yourdomain.com. sandbox.yourdomain.com.]"
msg="Desired change: DELETE cname-testapplication.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
msg="Desired change: DELETE testapplication.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
msg="Desired change: DELETE testapplication.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"
msg="Desired change: DELETE cname-testapplication.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
msg="Desired change: DELETE testapplication.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
msg="Desired change: DELETE testapplication.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"

Anything else we need to know?:
This is working fine for ingresses that uses only 1 hosted zone (it can be easily tested with the same ingress example using the host testapplication.dev.yourdomain.com)
Environment:

  • External-DNS version (use external-dns --version): 0.14.0
  • DNS provider:
  • Others: EKS 1.26
@leonardocaylent leonardocaylent added the kind/bug Categorizes issue or PR as related to a bug. label Feb 7, 2024
@leonardocaylent
Copy link
Contributor Author

I can confirm this was introduced with #3747

@cilindrox
Copy link

Can confirm, reverting to 0.13.6 addresses this issue, and UPSERTS for ALIAS on TXT work as expected.

@leonardocaylent
Copy link
Contributor Author

@cilindrox Thank you for sharing that here. Can you confirm the use case that you are using is the same? 2 Hosted zones with similar names as (internal.dev.yourdomain.com & dev.yourdomain.com)?

@cilindrox
Copy link

correct, several instances of the above ^

We deploy this with another provider and 1.4.0 seems a-ok there. It's only Route53 that seems broken so far.

@leonardocaylent
Copy link
Contributor Author

I tried to contact with the creator of #3747 but I still didn't have any response. I think #3747 needs to be rollbacked or we need a hotfix for this use case. We also tried using the prefix but that doesn't resolve the issue

@MitchIonascu
Copy link

MitchIonascu commented Feb 16, 2024

I can confirm that the issue is also present for us since updating to the latest build. Reverting this to pre 0.14.0 fixed the issue.

@leonardocaylent
Copy link
Contributor Author

I can confirm that the issue is also present for us since updating to the latest build. Reverting this to pre 0.14.0 fixed the issue.

Thank you for reporting this

@cronik
Copy link
Contributor

cronik commented Feb 18, 2024

@leonardocaylent I can try and add a test case I just need to know what records are in play and possibly more logs to know whats happening.

  • What are the current records?
  • What are the desired records?

(if you are generating custom builds for testing, you could log plan to see what the generated plan records are before and after the Calculate. )

plan = plan.Calculate()

Do you see a log message like:

Domain %s contains conflicting record type candidates; discarding CNAME record

@leonardocaylent
Copy link
Contributor Author

@leonardocaylent I can try and add a test case I just need to know what records are in play and possibly more logs to know whats happening.

  • What are the current records?
  • What are the desired records?

(if you are generating custom builds for testing, you could log plan to see what the generated plan records are before and after the Calculate. )

plan = plan.Calculate()

Do you see a log message like:

Domain %s contains conflicting record type candidates; discarding CNAME record

Hi @cronik, here are the debugging logs for the 2 versions 0.13.6 and 0.14.0:

At creation (0.13.6)(Success):

level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"

At removal (0.13.6)(Success):

level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"

At creation (0.14.0)(Success):

level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"

At removal (0.14.0)(Failure):

level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=error msg="Failure in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: 4******c-8ca4-4c49-bb9c-3**********4"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=error msg="Failure in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: 0******5-4b9f-48b3-ac49-2**********c"
level=fatal msg="failed to submit all changes for the following zones: [/hostedzone/HostedZoneNº1 /hostedzone/HostedZoneNº2]"

@leonardocaylent
Copy link
Contributor Author

leonardocaylent commented Feb 19, 2024

More information about the Delete Requests:
Success on DELETE RECORDS (0.13.6) 2 ChangeResourceRecordSet calls to AWS:

    "requestParameters": {
        "hostedZoneId": "Z*******************S",
        "changeBatch": {
            "changes": [
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "cname-testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "A",
                        "aliasTarget": {
                            "hostedZoneId": "Z1H1FL5HABSF5",
                            "dNSName": "k8s-sandboxtoolsingre-e********8-6*******2.us-west-2.elb.amazonaws.com",
                            "evaluateTargetHealth": true
                        }
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                }
            ]
        }
    },
    "requestParameters": {
        "hostedZoneId": "Z*******************J",
        "changeBatch": {
            "changes": [
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "cname-testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "A",
                        "aliasTarget": {
                            "hostedZoneId": "Z1H1*********",
                            "dNSName": "k8s-sandboxtoolsingre-e********8-6*******2.us-west-2.elb.amazonaws.com",
                            "evaluateTargetHealth": true
                        }
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                }
            ]
        }
    },

Failure on DELETE RECORDS (0.14.0) 1 ChangeResourceRecordSet call to AWS:

 "errorMessage": "[The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']",
    "requestParameters": {
        "hostedZoneId": "Z*******************S",
        "changeBatch": {
            "changes": [
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "cname-testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "cname-testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "A",
                        "aliasTarget": {
                            "hostedZoneId": "Z1H1*********",
                            "dNSName": "k8s-sandboxtoolsingre-e********8-6*******2.us-west-2.elb.amazonaws.com",
                            "evaluateTargetHealth": true
                        }
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "A",
                        "aliasTarget": {
                            "hostedZoneId": "Z1H1*********",
                            "dNSName": "k8s-sandboxtoolsingre-e********8-6*******2.us-west-2.elb.amazonaws.com",
                            "evaluateTargetHealth": true
                        }
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                }
            ]
        }
    },

Seems like version 0.14.0 is grouping the 6 DELETES in the 2 batchs where they should be 3 DELETES per batch (3 records per hosted zone)

@Raffo
Copy link
Contributor

Raffo commented Feb 21, 2024

Ack, I've seen this and will try to reproduce it and see if we can ship a fix. I was planning a release of the next version, I will consider this a showstopper if I manage to reproduce it. Will keep you posted, probably next week.

@leonardocaylent
Copy link
Contributor Author

Thank you!
Here is the yaml file for quick-testing the issue:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: default
  name: deployment-2048
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: app-2048
  replicas: 5
  template:
    metadata:
      labels:
        app.kubernetes.io/name: app-2048
    spec:
      containers:
      - image: public.ecr.aws/l6m2t8p7/docker-2048
        imagePullPolicy: Always
        name: app-2048
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  namespace: default
  name: service-2048
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: NodePort
  selector:
    app.kubernetes.io/name: app-2048
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: default
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:12345678:certificate/blablablablabla
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/group.name: dev-tools-ingress
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'

spec:
  ingressClassName: alb
  rules:
    - host: testdeploy.internal.dev.yourdomain.com
      http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: service-2048
              port:
                number: 80

@Raffo
Copy link
Contributor

Raffo commented Feb 24, 2024

@leonardocaylent I am not sure I understand what the desired behavior should be. I haven't worked with overlapping zones, so I may be confused on what you actually desire that it would happen. Can you make an example with what the behavior of overlapping zone was, is today and what you expect it to be? I would personally assume that we don't double write records to zones that overlap.

@leonardocaylent
Copy link
Contributor Author

@Raffo this is behavior of overlapping zones on all versions of external-dns:

The hostname for the ingress is:
https://testdeploy.internal.sandbox.yourdomain.com/

The Route 53 Hosted Zones:

internal.sandbox.yourdomain.com (Type Private hosted zone)
sandbox.yourdomain.com (Type Public hosted zone)

The 3 Records on Hosted Zone internal.sandbox.yourdomain.com:

Type A: testdeploy.internal.sandbox.yourdomain.com (to k8s-sandboxtoolsingre-e8f5f***.elb)
Type TXT: testdeploy.internal.sandbox.yourdomain.com
Type TXT: cname-testdeploy.internal.sandbox.yourdomain.com
"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048"

The 3 Records on Hosted Zone sandbox.yourdomain.com:

Type A: testdeploy.internal.sandbox.yourdomain.com (to k8s-sandboxtoolsingre-e8f5f***.elb)
Type TXT: testdeploy.internal.sandbox.yourdomain.com
Type TXT: cname-testdeploy.internal.sandbox.yourdomain.com
"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048"

Both records are identical on the 2 different hosted zones. The behavior of creating the Route53 records on all the hosted zones that finishes in sandbox.yourdomain.com is expected, the issue that started in 0.14.0 is that external-dns is not able to delete the records. If we would have another private or public zone that is called yourdomain.com, we probably would have another 3 records in that hosted zone also. It would be great if external-dns could know that we only want to create the records in internal.sandbox.yourdomain.com private hosted zone, but I believe for retro-compatibility and other users use cases, they may need to keep the behavior just as it is right now, fixing the grouping of the DELETEs on the ChangeResourceRecordSet api call

@leonardocaylent
Copy link
Contributor Author

@Raffo @cronik
I have 2 important updates about this issue:

1)Found the culprit of this issue:
FilterEndpointsByOwnerID is generating 2 duplicate records for changes.Delete = endpoint.FilterEndpointsByOwnerID(p.OwnerID, changes.Delete) on plan.go

We could fix that doing something like this:

func FilterEndpointsByOwnerID(ownerID string, eps []*Endpoint) []*Endpoint {
	filtered := []*Endpoint{}
	visited := make(map[EndpointKey]bool) // Initialize the visited map

	for _, ep := range eps {
		key := EndpointKey{DNSName: ep.DNSName, RecordType: ep.RecordType, SetIdentifier: ep.SetIdentifier}
		if visited[key] { //Do not contain duplicated endpoints
			log.Debugf(`Already loaded endpoint %v `, ep)
			continue 
		}
		if endpointOwner, ok := ep.Labels[OwnerLabelKey]; !ok || endpointOwner != ownerID {
			log.Debugf(`Skipping endpoint %v because owner id does not match, found: "%s", required: "%s"`, ep, endpointOwner, ownerID)
		} else {
			filtered = append(filtered, ep)
			log.Debugf(`Added endpoint %v because owner id matches, found: "%s", required: "%s"`, ep, endpointOwner, ownerID)
		}
		visited[key] = true
	}

We will also add more granular Debug logs as they were super useful to fix this issue.

  1. With the adittion of Update the OCI Provider to incorporate SoftError to avoid CrashLoopBackoff #4229 now the pod doesn't crash anymore, which is great
    level=error msg="Failed to do run once: soft error\nfailed to submit all changes for the following zones: [/hostedzone/HostedZoneNº1 /hostedzone/HostedZoneNº2]"

Waiting for thoughts/comments

@leonardocaylent
Copy link
Contributor Author

Behavior with the fix:

level=debug msg="Refreshing zones list cache"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: sandbox.yourdomain.com.)"

level=info msg="Applying provider record filter for domains: [sandbox.yourdomain.com. .sandbox.yourdomain.com. us-west-2.sandbox.yourdomain.com. .us-west-2.sandbox.yourdomain.com. 

level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e****8-69***32.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id does not match, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Already loaded endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e****8-69***32.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] "
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e****8-69***32.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id does not match, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Refreshing zones list cache"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"

@leonardocaylent
Copy link
Contributor Author

Adding more details on each file call:

On Create at version 0.14.0 with the fix:

level=debug msg="Considering zone: /hostedzone/HZ1 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ2 (domain: sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ1] were successfully updated"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HZ2] were successfully updated"

On Delete at version 0.14.0 with the fix:

level=debug msg="Considering zone: /hostedzone/HZ1 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Start filter on plan.go"
level=debug msg="1 - All changes.Delete"
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*****8-6**2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Already loaded endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*****8-6***2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] "
level=debug msg="2- All changes.UpdateOld"
level=debug msg="3- All changes.UpdateNew"
level=debug msg="Filter on txt.go"
level=debug msg="3- All changes.UpdateNew"
level=debug msg="2- All changes.UpdateOld"
level=debug msg="1 - All changes.Delete"
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e***8-6**2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Refreshing zones list cache"
level=debug msg="Considering zone: /hostedzone/HZ2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ1 (domain: sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2] were successfully updated"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1] were successfully updated"

On Create at version 0.14.0 without the fix: Same behavior (no changes)

On Delete at version 0.14.0 without the fix:

level=debug msg="Start filter on plan.go"
level=debug msg="1 - All changes.Delete"
level=debug msg="Warning: Without the continue: Already loaded endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] "
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
d does not match, found: \"\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="2- All changes.UpdateOld"
level=debug msg="3- All changes.UpdateNew"
level=debug msg="Filter on txt.go"
level=debug msg="3- All changes.UpdateNew"
level=debug msg="2- All changes.UpdateOld"
level=debug msg="1 - All changes.Delete"
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Warning: Without the continue: Already loaded endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] "
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Refreshing zones list cache"
level=debug msg="Considering zone: /hostedzone/HZ1 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=error msg="Failure in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: X"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=error msg="Failure in zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: X"
level=error msg="Failed to do run once: soft error\nfailed to submit all changes for the following zones: [/hostedzone/HZ2 /hostedzone/HZ1]"

It's creating two times the Route53 record so that should maybe also grouped by or fixed in another ticket

@Raffo
Copy link
Contributor

Raffo commented Mar 2, 2024

@leonardocaylent please open a PR with the proposed fix. I would love to understand what is the impact of this change and it's hard to reason about it without a proposed code change.

@leonardocaylent
Copy link
Contributor Author

@Raffo I'll open the pr. The changes would add the "Group by endpoint" that were needed on #3747 :
1)Validate creating duplicated endpoints per Hosted Zone
2)Validate removing duplicated endpoints per Hosted Zone
3)Adding tests if possible

@leonardocaylent
Copy link
Contributor Author

@Raffo #4296 is ready to review. I needed to add the RecordType as part of the key because without this some necessary records were being skipped

@mloiseleur
Copy link
Contributor

The behavior of creating the Route53 records on all the hosted zones that finishes in sandbox.yourdomain.com is expected

I'm not sure to agree with that 🤔 .
The DNS is a reference system, it's designed to be a source of truth.
I don't see how it can be a source of truth with duplicated records.
It doesn't work like that on TLD. It should also not work like that on this domain level.

=> IMHO, the expected behavior should be to create and delete records only in internal.sandbox.yourdomain.com sub zone.

@mloiseleur
Copy link
Contributor

@leonardocaylent Am I wrong to think there is an easy workaround ?

I mean : if you run two different instances of external-dns, one per overlapping zone, then it may behave as (you) expect.

@leonardocaylent
Copy link
Contributor Author

The behavior of creating the Route53 records on all the hosted zones that finishes in sandbox.yourdomain.com is expected

I'm not sure to agree with that 🤔 . The DNS is a reference system, it's designed to be a source of truth. I don't see how it can be a source of truth with duplicated records. It doesn't work like that on TLD. It should also not work like that on this domain level.

=> IMHO, the expected behavior should be to create and delete records only in internal.sandbox.yourdomain.com sub zone.

@mloiseleur Maybe there is a confusion about what is "expected" and how external-dns was behaving with all the previous versions. The bug was reported since external-dns lost the ability of deleting Route53 records on multiple hosted zones with the same name, which wouldn't be needed if it's only created on the correct/best matching hosted zone(which is a feature that I guess is not on external-dns yet).
For example:
application.internal.sandbox.yourdomain.com is expecting to be only on the Private Route53 hosted zone, and having the record also in sandbox.yourdomain.com is a consecuence of having a matching result for a hosted zone finishing with the same name.
A possible solution would be to only insert/manage the records on the best-matching candidate, but that would need a full regression in order to be applied on current external-dns versions, since when this new feature is applied that would cause some of the old records to be ignored.

@leonardocaylent
Copy link
Contributor Author

@leonardocaylent Am I wrong to think there is an easy workaround ?

I mean : if you run two different instances of external-dns, one per overlapping zone, then it may behave as (you) expect.

@mloiseleur I considered doing something like that but it would be a huge impact for people that has more than 1 overlapping hosted zone, or more than 5 eks clusters. It would dramatically increase the number of pods or IaC code to mantain and they'd need to have different filters on each deployment. A possible solution is to "Feature Flag" the FilterEndpointsByOwnerId function and keep that as an optional between the previous behavior and the new one. What do you think about that?

@leonardocaylent
Copy link
Contributor Author

@mloiseleur Small update: there is a new commit on #4296 that is a good candidate to solve the issue without using feature flags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants