Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external-dns is unable to sync AWS records #2793

Closed
bharathbethi opened this issue Jun 3, 2022 · 9 comments
Closed

external-dns is unable to sync AWS records #2793

bharathbethi opened this issue Jun 3, 2022 · 9 comments
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@bharathbethi
Copy link

Recently I tried migrating service loadbalancer from one cluster to another but when looking the external-dns logs I see below error.

level=error msg="Failure in zone icebrg.io. [Id: /hostedzone/XXXXX]"
time="2022-06-03T21:44:43Z" level=error msg="InvalidChangeBatch: [Tried to create resource record set [name='XXX, type='A'] but it already exists, Tried to create resource record set [name='XXXX', type='TXT'] but it already exists]\n\tstatus code: 400, request id: eb025016-7a79-42e4-81c9-c4a18c4a40d2"
time="2022-06-03T21:44:43Z" level=error msg="failed to submit all changes for the following zones: [/hostedzone/XXX]"

for policy, I got upsert-only.

Please let me know if I'm something here.

@bharathbethi bharathbethi added the kind/support Categorizes issue or PR as a support question. label Jun 3, 2022
@cg-lkusnadi
Copy link

cg-lkusnadi commented Jun 8, 2022

We are seeing the same error message external dns v.0.12.0
In addition, there's a "cname-" prefix added to the TXT record. The txtPrefix in helm is set to "".

level=info msg="Desired change: DELETE cname-host.domain.io TXT [Id: /hostedzone/ZNNNNNNNNMMMU5]"
level=info msg="Desired change: DELETE host.domain.io A [Id: /hostedzone/ZNNNNNNNNMMMU5]"
level=info msg="Desired change: DELETE host.domain.io TXT [Id: /hostedzone/ZNNNNNNNNMMMU5]"

I believe the "cname-" prefix is causing the batch apply to fail.

Anyone has an idea why the "cname-"? Thanks.

@tombokombo
Copy link

tombokombo commented Jun 8, 2022

Now it creates/deletes old and new txt owner records - new records are with type eg.cname

// old TXT record format
txt := endpoint.NewEndpoint(im.mapper.toTXTName(r.DNSName), endpoint.RecordTypeTXT, r.Labels.Serialize(true)).WithSetIdentifier(r.SetIdentifier)
txt.ProviderSpecific = r.ProviderSpecific
// new TXT record format (containing record type)
txtNew := endpoint.NewEndpoint(im.mapper.toNewTXTName(r.DNSName, r.RecordType), endpoint.RecordTypeTXT, r.Labels.Serialize(true)).WithSetIdentifier(r.SetIdentifier)
txtNew.ProviderSpecific = r.ProviderSpecific

Changes to provider are send in bateches and it looks like AWS refused whole batch when it hits non existing record ( new one with type ).

EDIT: batch size is configurable, i would not recommend this in production, but you could pass --aws-batch-change-size=1, this would allow to create and delete records, only delete operation for nonexistent txt owner records (new once with type) will fail. It will work, because failing batch will always contain only 1 record, not all changes as default for batch is 1000.

@rust84
Copy link

rust84 commented Jun 9, 2022

We are also seeing these errors. In sync mode.

time="2022-06-03T14:13:09Z" level=error msg="InvalidChangeBatch: [Tried to delete resource record set [name='cname-my-service-pr32.example.com.', type='TXT'] but it was not found]\n\tstatus code: 400, request id: 551d501f-32cb-4e8b-a952-00c5779f72f1"
time="2022-06-03T14:13:09Z" level=error msg="failed to submit all changes for the following zones: [/hostedzone/ABC123456789]"

The zone is used for testing so I went as far as to delete all of the records but the error came back. The issue is not seen on 0.11. Definitely seems to be related to #2157

@k0da
Copy link
Contributor

k0da commented Jun 9, 2022

I'm trying to reproduce this

@m15o
Copy link

m15o commented Jun 13, 2022

I got the same error when I delete a resource created by external-dns v0.11.0 with v0.12.0 running.
Resources created by v0.11.0 do not seems to have prefixed TXT records until resource is modified.

@k0da
Copy link
Contributor

k0da commented Jun 13, 2022

Ignoring an error won't help here. I'm working on some sort of init container to perform a cleanup

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 11, 2022
@njuettner
Copy link
Member

This is fixed in v0.12.2

@Raffo Raffo unpinned this issue Oct 17, 2022
@Pyrrha
Copy link

Pyrrha commented Nov 14, 2022

Hello,

S'ill facing this issue today on AWS. Bumping from an old version to latest (with handle registry on TXT records) causes logs indicating BatchError for the CREATE on cname-xxx records.

We resolved this with #2897 by creating manually the TXT records. Lastly, external-dns succeed to update those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

10 participants