Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the migration to the new TXT format: create missing records #2811

Merged

Conversation

alebedev87
Copy link
Contributor

@alebedev87 alebedev87 commented Jun 14, 2022

Description
Handle the migration to the new TXT record format automatically by ExternalDNS.

Should help fixing issue #2793.

Checklist

  • Unit tests updated
  • End user documentation updated

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 14, 2022
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 14, 2022
@alebedev87 alebedev87 force-pushed the handle-missing-txt-records branch 2 times, most recently from 239daa2 to b19c1bd Compare June 14, 2022 11:25
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 14, 2022
@alebedev87 alebedev87 force-pushed the handle-missing-txt-records branch 2 times, most recently from 45e878d to af8dbdd Compare June 14, 2022 18:12
@alebedev87 alebedev87 marked this pull request as ready for review June 14, 2022 18:14
@alebedev87 alebedev87 changed the title [WIP] Handle the migration to the new TXT format: create missing records Handle the migration to the new TXT format: create missing records Jun 14, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 14, 2022
@alebedev87
Copy link
Contributor Author

/assign @k0da

@k8s-ci-robot
Copy link
Contributor

@alebedev87: GitHub didn't allow me to assign the following users: k0da.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @k0da

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@k0da k0da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@k0da
Copy link
Contributor

k0da commented Jun 14, 2022

/assign @Raffo

@alebedev87
Copy link
Contributor Author

Tested the migration manually e2e.

Records created with v0.10.2 version:

$ ./build/external-dns --version
v0.10.2-2-gf4850c47

$ ./build/external-dns --provider=aws --source=service --zone-id-filter=Z0912008TL9E1WE4YS1J --log-level=debug --service-type-filter=ClusterIP --fqdn-template='{{.Name}}.alebedev.test.externaldns0120.info'
...
INFO[0006] Desired change: CREATE image-registry-operator.alebedev.test.externaldns0120.info A [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE image-registry-operator.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE machine-approver.alebedev.test.externaldns0120.info A [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE machine-approver.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE metrics.alebedev.test.externaldns0120.info A [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE metrics.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE network-check-source.alebedev.test.externaldns0120.info A [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE network-check-source.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE network-metrics-service.alebedev.test.externaldns0120.info A [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE network-metrics-service.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE node-tuning-operator.alebedev.test.externaldns0120.info A [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE node-tuning-operator.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE sdn-controller.alebedev.test.externaldns0120.info A [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE sdn-controller.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE sdn.alebedev.test.externaldns0120.info A [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE sdn.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0007] 16 record(s) in zone alebedev.test.externaldns0120.info. [Id: /hostedzone/Z0912008TL9E1WE4YS1J] were successfully updated

Then v0.12.0 with this PR is run against the same hosted zone:

$ ./external-dns --version
v0.12.0-21-g25addee3-dirty

$ ./external-dns --provider=aws --source=service --zone-id-filter=Z0912008TL9E1WE4YS1J --log-level=debug --service-type-filter=ClusterIP --fqdn-template='{{.Name}}.alebedev.test.externaldns0120.info'
...
INFO[0006] Desired change: CREATE a-image-registry-operator.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE a-machine-approver.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE a-metrics.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE a-network-check-source.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE a-network-metrics-service.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE a-node-tuning-operator.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE a-sdn-controller.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] Desired change: CREATE a-sdn.alebedev.test.externaldns0120.info TXT [Id: /hostedzone/Z0912008TL9E1WE4YS1J] 
INFO[0006] 8 record(s) in zone alebedev.test.externaldns0120.info. [Id: /hostedzone/Z0912008TL9E1WE4YS1J] were successfully updated 
...
DEBU[0067] No endpoints could be generated from service openshift-kube-apiserver/apiserver 
DEBU[0067] Refreshing zones list cache                  
DEBU[0067] Considering zone: /hostedzone/Z0912008TL9E1WE4YS1J (domain: alebedev.test.externaldns0120.info.) 
INFO[0067] Applying provider record filter for domains: [alebedev.test.externaldns0120.info. .alebedev.test.externaldns0120.info.] 
INFO[0067] All records are already up to date 

@k0da
Copy link
Contributor

k0da commented Jun 15, 2022

should fix #2817

@alebedev87
Copy link
Contributor Author

/assign @njuettner

@seanmalloy
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 22, 2022
@Raffo
Copy link
Contributor

Raffo commented Jun 22, 2022

@alebedev87 can you tell us exactly how this is fixing the error and what is the exact error? Not this nor the issue linked are clear. I would also appreciate clear instructions on how to reproduce this so that we can verify. /cc @k0da

Are there some situations in which this bug doesn't happen? Trying to understand what was missed during testing.

@seanmalloy
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 8, 2022
@alebedev87
Copy link
Contributor Author

@seh , @Raffo : any comments on the latest change?

plan/plan.go Show resolved Hide resolved
registry/txt.go Outdated Show resolved Hide resolved
registry/txt.go Outdated Show resolved Hide resolved
registry/txt.go Outdated Show resolved Hide resolved
registry/txt.go Outdated
missingDesiredTXT = append(missingDesiredTXT, desiredTXT)
}
}
if len(desiredTXTs) > len(missingDesiredTXT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard condition is confusing. It's not possible that "missingDesiredTXT[s]" could be longer than "desiredTXTs." The former can only ever be shorter or the same length. Therefore, this condition is only false when the two slices are of the same length. What does it mean when they're of equal length? That there are no desired TXT records? That all of them are missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both slices are equal in lengths this means that the record is not managed by externaldns. Because if it wsa managed by externaldns, at least one TXT record would exist. I added additional comment to make this clearer.

However, this made we think of the records owned by different externaldns instances. I think that we should not migrate the TXT records for the other instances. I added a guarding condition for this here.

@Raffo , @seh : what do you think about the migration of other (defferent txt-owner-id) externaldns instances records?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should only manipulate records that are managed by this ExternalDNS instance.

When trying to transfer record ownership between instances (orphaning and adopting), we've written a tool to revise the corresponding TXT records. It would be nice it one could invoke ExternalDNS with a command-line flag (or run it in a different mode, as a one-time tool rather than a long-running service) to rewrite the TXT records from an orphaning ExternalDNS instance to an adopting ExternalDNS instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's something we can think of, in the followup PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense to me 👍

registry/txt.go Outdated Show resolved Hide resolved
@Raffo
Copy link
Contributor

Raffo commented Jul 13, 2022

@alebedev87 I'm good with the explanations and fixes, I'd defer to @seh's comments to be fixed, then I'm happy to approve and trigger the official release process.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 13, 2022
@alebedev87
Copy link
Contributor Author

Test after the addition of the check for the owner-id.

The state before (unmanaged records, old format TXT records only, TXT records from different owner):

$ gcloud dns record-sets list --zone=2424156220314232391
NAME                                                   TYPE   TTL    DATA
testmissingtxtrecords.example.info.                    NS     21600  ns-cloud-b1.googledomains.com.,ns-cloud-b2.googledomains.com.,ns-cloud-b3.googledomains.com.,ns-cloud-b4.googledomains.com.
testmissingtxtrecords.example.info.                    SOA    21600  ns-cloud-b1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
hello-openshift.testmissingtxtrecords.example.info.    A      300    10.217.4.183
hello-openshift.testmissingtxtrecords.example.info.    TXT    300    "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/hello-openshift/hello-openshift"
hello-openshift-2.testmissingtxtrecords.example.info.  A      300    10.217.5.222
hello-openshift-2.testmissingtxtrecords.example.info.  TXT    300    "heritage=external-dns,external-dns/owner=another,external-dns/resource=service/hello-openshift-2/hello-openshift-2"
unmanaged1.testmissingtxtrecords.example.info.         A      300    192.0.2.91
unmanaged2.testmissingtxtrecords.example.info.         CNAME  300    unmanaged1.testmissingtextrecords.example.info.

Run of external-dns:

./external-dns-50f196 --provider=google --google-project=xxx --zone-id-filter=2424156220314232391 --log-level=debug --fqdn-template='{{.Name}}.testmissingtxtrecords.example.info' --source=service --service-type-filter=ClusterIP --label-filter extdnspublish=yes --publish-internal-services
...
INFO[0006] Add records: a-hello-openshift.testmissingtxtrecords.example.info. TXT ["heritage=external-dns,external-dns/owner=default,external-dns/resource=service/hello-openshift/hello-openshift"] 300 
DEBU[0065] Matching zones against domain filters: []
...
DEBU[0065] Skipping endpoint hello-openshift-2.testmissingtxtrecords.example.info 300 IN A  10.217.5.222 [] because owner id does not match, found: "another", required: "default" 
DEBU[0065] Skipping endpoint unmanaged1.testmissingtxtrecords.example.info 300 IN A  192.0.2.91 [] because owner id does not match, found: "", required: "default" 
DEBU[0065] Skipping endpoint unmanaged2.testmissingtxtrecords.example.info 300 IN CNAME  unmanaged1.testmissingtextrecords.example.info [] because owner id does not match, found: "", required: "default" 
INFO[0065] All records are already up to date

The state after:

$ gcloud dns record-sets list --zone=2424156220314232391
NAME                                                   TYPE   TTL    DATA
testmissingtxtrecords.example.info.                    NS     21600  ns-cloud-b1.googledomains.com.,ns-cloud-b2.googledomains.com.,ns-cloud-b3.googledomains.com.,ns-cloud-b4.googledomains.com.
testmissingtxtrecords.example.info.                    SOA    21600  ns-cloud-b1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
a-hello-openshift.testmissingtxtrecords.example.info.  TXT    300    "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/hello-openshift/hello-openshift"
hello-openshift.testmissingtxtrecords.example.info.    A      300    10.217.4.183
hello-openshift.testmissingtxtrecords.example.info.    TXT    300    "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/hello-openshift/hello-openshift"
hello-openshift-2.testmissingtxtrecords.example.info.  A      300    10.217.5.222
hello-openshift-2.testmissingtxtrecords.example.info.  TXT    300    "heritage=external-dns,external-dns/owner=another,external-dns/resource=service/hello-openshift-2/hello-openshift-2"
unmanaged1.testmissingtxtrecords.example.info.         A      300    192.0.2.91
unmanaged2.testmissingtxtrecords.example.info.         CNAME  300    unmanaged1.testmissingtextrecords.example.info.

yuha0 added a commit to yuha0/home-infra that referenced this pull request Jul 17, 2022
@Raffo
Copy link
Contributor

Raffo commented Jul 20, 2022

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 20, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alebedev87, JAORMX, Raffo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2022
@k8s-ci-robot k8s-ci-robot merged commit 3d7437c into kubernetes-sigs:master Jul 20, 2022
@azalenski-branch
Copy link

@Raffo Is there any estimate of when this is gonna be officially released?

@Raffo
Copy link
Contributor

Raffo commented Jul 22, 2022

@azalenski-branch I'll kick off a release next week.

@Raffo
Copy link
Contributor

Raffo commented Jul 26, 2022

I'm still having problems on the v0.12.1 release candidate image:

Applying provider record filter for domains: [external.dns. .external.dns.]"
time="2022-07-26T07:17:45Z" level=info msg="Desired change: UPSERT cname-externaldns-e2e.external.dns TXT [Id: /hostedzone/]"
time="2022-07-26T07:17:45Z" level=info msg="Desired change: UPSERT externaldns-e2e.external.dns A [Id: /hostedzone/]"
time="2022-07-26T07:17:45Z" level=info msg="Desired change: UPSERT externaldns-e2e.external.dns TXT [Id: /hostedzone/]"
time="2022-07-26T07:17:45Z" level=info msg="Desired change: CREATE cname-externaldns-e2e.external.dns TXT [Id: /hostedzone/]"
time="2022-07-26T07:17:46Z" level=error msg="Failure in zone external.dns. [Id: /hostedzone/]"
time="2022-07-26T07:17:46Z" level=error msg="InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-externaldns-e2e.external.dns.']\n\tstatus code: 400, request id: "

Could it be because it is trying to both upsert and create the same txt? I only have one service in the cluster with the annotation external-dns.alpha.kubernetes.io/hostname: externaldns-e2e.external.dns. This is a private hosted zone on AWS.

Is there anything that you think I can debug @alebedev87 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants