Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not replace TXT records with A/CNAME records in planner (#580) #581

Merged
merged 5 commits into from
Nov 14, 2018
Merged

Do not replace TXT records with A/CNAME records in planner (#580) #581

merged 5 commits into from
Nov 14, 2018

Conversation

jchv
Copy link
Contributor

@jchv jchv commented Jun 1, 2018

The Problem

If you have a zone that has unrelated TXT records, such as SPF records, you might think you can use the --txt-prefix option to allow the registry to continue working. However, this is not the case, because even with the registry fully disabled, the planner will see and treat TXT records equally. This means when it sees a record that has the same name as a planned A or CNAME record that does not yet exist, it will attempt to delete the TXT record even though it doesn't conflict.

This actually happens in planTable.getUpdates:

// TODO: allows record type change, which might not be supported by all dns providers
func (t planTable) getUpdates() (updateNew []*endpoint.Endpoint, updateOld []*endpoint.Endpoint) {
	for _, row := range t.rows {
		if row.current != nil && len(row.candidates) > 0 { //dns name is taken
			update := t.resolver.ResolveUpdate(row.current, row.candidates)
			// compare "update" to "current" to figure out if actual update is required
			if shouldUpdateTTL(update, row.current) || targetChanged(update, row.current) {
				inheritOwner(row.current, update)
				updateNew = append(updateNew, update)
				updateOld = append(updateOld, row.current)
			}
			continue
		}
	}
	return
}

As far as I can tell, it will happily include TXT records here, and you'll have plans that look like:

  • Delete: TXT myrecord.com
  • Create: A myrecord.com 1.2.3.4

The Solution

Since TXT records are the domain of the TXT registry, which does not use the Plan.Calculate method, I decided to implement the solution there, by filtering TXT records out. There may be other ways to go about doing this, probably in planTable.getUpdates, but I wanted to be as careful as possible to not disturb other functionality.

Closes #580

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 1, 2018
@jchv
Copy link
Contributor Author

jchv commented Jun 1, 2018

No idea why gometalinter deadlined on Travis. I ran it locally and everything completed pretty quickly.

@njuettner
Copy link
Member

njuettner commented Jun 1, 2018

Thanks for the PR.

Currently we have a problem running gometalinter in Travis. We already incresed the timeout to 120s but unfortunately it doesn't seem enough. However I will take a look and probably create a PR to increase the deadline. Afterwards I can retrigger the build again.

Update: I've retriggered the job, it seems to pass now

We will have a look on your PR anyway :).

plan/plan.go Outdated
filtered := []*endpoint.Endpoint{}

for _, record := range records {
switch record.RecordType {
Copy link
Member

@njuettner njuettner Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it only affect TXT records? If so it would make sense to rewrite the logic IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my intent here is to be explicit that only certain records "conflict." The truth is, the only record that "conflicts" with anything is CNAME. From RFC 1034 (emphasis mine:)

The domain system provides such a feature using the canonical name
(CNAME) RR. A CNAME RR identifies its owner name as an alias, and
specifies the corresponding canonical name in the RDATA section of the
RR. If a CNAME RR is present at a node, no other data should be
present; this ensures that the data for a canonical name and its aliases
cannot be different.
This rule also insures that a cached CNAME can be
used without checking with an authoritative server for other RR types.

So, if you have a CNAME record, then you MUST NOT have ANY other kind of record.

This actually brings a pretty interesting conundrum: It means that ExternalDNS can only create a CNAME record if no other record exists with this name, but it can create A records if any other record exists.

ExternalDNS today only cares about CNAME, A, and TXT records, and it only cares about TXT records for the TXT registry. To fix this bug properly, I argue one would have to include every kind of record and consider it. And in fact, I think that would probably not be a bad idea. However, it would only allow you to output a better error message, probably, since you can't actually do much to fix the problem itself from ExternalDNS's PoV.

So I opted to go for a simple solution of explicitly stating which records the planner cares about, which I believe is CNAME and A records. Specifically, the planner never wants to create CNAME records together with A records anyways, so logically this makes sense. Likewise, the planner probably never wants to delete a TXT record that it had nothing to do with, so this probably makes sense too.

As a sidenote... doesn't this mean the TXT registry should never have a blank prefix, since it could never work with CNAME records? I imagine this is probably an issue for Amazon ELB users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point and I agree.

There's currently also an IPv6 PR (#576) open, so it is also planned to support AAAA records.
Additionally to that there's another PR open #579 which maybe make sense to merge first.

We have a logic for CNAME's in AWS which will be handled as A or AAAA records. It will create ALIAS records pointing to ELB's and a TXT record as well.

Could you add some comment above the filterRecordsForPlan func?

Otherwise LGTM

@jchv
Copy link
Contributor Author

jchv commented Jun 3, 2018

Thanks for your input. I've added a few comments rationalizing the filtering function, hopefully this is sufficient.

@njuettner
Copy link
Member

/LGTM

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2018
plan/plan.go Outdated
@@ -175,3 +175,27 @@ func shouldUpdateTTL(desired, current *endpoint.Endpoint) bool {
}
return desired.RecordTTL != current.RecordTTL
}

// filterRecordsForPlan removes records that are not relevant to the planner.
// Currently this just removes TXT records to prevent them from being
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"currently this just removes TXT records" --- I think the comment is now wrong as all records except A and CNAME are filtered out...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say currently because TXT, CNAME and A are the only record types that ExternalDNS looks at, as far as I can tell. Should I reword that?

@linki
Copy link
Member

linki commented Jun 4, 2018

@ideahitme Can you please leave your opinion here?

Ideally, this conversion from TXT to A

  • Delete: TXT myrecord.com
  • Create: A myrecord.com 1.2.3.4

shouldn't happen as there's no corresponding TXT record with the owner label.

@jchv
Copy link
Contributor Author

jchv commented Jun 4, 2018

Of course, that's assuming that you use the TXT registry; if you use the noop registry, you'd get this bad behavior either way. Frankly, I would like a registry that stores data elsewhere, but maybe that can be a different PR.

@jchv
Copy link
Contributor Author

jchv commented Jun 25, 2018

Hey @linki @hjacobs @njuettner, thanks for your help thusfar. It seems this PR fell through the cracks. Is there anything I can do to help push this through, such as particular improvements that could be made?

@ideahitme
Copy link

@linki I will take a look at this in the nearest time possible. Sorry I was extremely busy with travelling recently

@ideahitme
Copy link

@jchv is there a particular test which reveals the problem ? I currently see no difference if u use noop vs txt registry. Both will include custom TXT records and should not remove them as per ConflictResolver logic.

@jchv
Copy link
Contributor Author

jchv commented Jul 22, 2018

@ideahitme Firstly, it is worth noting that the problem may have been resolved by an unrelated change since this PR was made. I have not looked, but there is at least one conflict.

With that out of the way, I'll try to be as terse as possible.

  • The problem does not relate to registries. It occurs regardless of whether you use noop or TXT registry, and occurs even when a TXT prefix is set.
  • The problem does not occur when all records in question are managed by external-dns. Crucially, the problem is only exhibited when a TXT record exists, but an A or CNAME record does not, for a given domain, a condition that external-dns does not cause, but is useful to exist. In my case, it made it hard for me to use external-dns to manage my apex record, which needed a TXT record for SPF.
  • The problem occurs because external-dns keeps track of TXT records, which is done for the TXT registry but inadvertently is also read by the planner, which doesn't need it.

For an example of a test that (should) fail when this bug is present, see TestIgnoreTXT in this PR. I apologize if it actually does not exhibit the problem anymore, I created this PR when I was working at a different employer and no longer face this issue personally (but for their sake would like to see the bug fixed :)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 23, 2018
@arnisoph
Copy link
Contributor

arnisoph commented Oct 10, 2018

I can confirm that txt registry conflicts with exiting TXT (SPF) records and --txt-prefix doesn't seem to to anything.

See also #573

@arnisoph
Copy link
Contributor

arnisoph commented Nov 2, 2018

This bug (see also #573, txt-prefix doesn't work here) is becoming a pain in the ass for me. Is someone willing to get PR up2date or provide another PR? I could try get this done but I'm fairly new to the external-dns codebase.

@jchv
Copy link
Contributor Author

jchv commented Nov 2, 2018

Well, I haven't checked, but I'm guessing this is still the same problem and same solution. This PR still doesn't conflict according to GitHub. Is there anything that needs updating? I'm relatively certain of the approach as long as the planner hasn't been re-hauled.

@bracki
Copy link

bracki commented Nov 8, 2018

Is there anything that can be done to move this forward?
This patch makes the --txt-prefix option functional for domains where you have a TXT-record set. Otherwise --txt-prefix is useless and external-dns can't be used to setup apex domains. This is especially important when you use zone delegation to host subdomains of your domain with Route53 or some other provider.

@arnisoph
Copy link
Contributor

arnisoph commented Nov 9, 2018

👍 Yes this hits me on GCE Cloud DNS and using SPF in TXT records.

@njuettner
Copy link
Member

/LGTM

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2018
@njuettner
Copy link
Member

/APPROVE

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njuettner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2018
@k8s-ci-robot k8s-ci-robot merged commit edc29a1 into kubernetes-sigs:master Nov 14, 2018
@arnisoph
Copy link
Contributor

nice! will test it soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

External DNS attempts to delete unrelated TXT records.
9 participants