Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi target plan ( Page Not Found ) #404

Merged
merged 18 commits into from
Dec 14, 2017
Merged

Multi target plan ( Page Not Found ) #404

merged 18 commits into from
Dec 14, 2017

Conversation

ideahitme
Copy link

@ideahitme ideahitme commented Nov 28, 2017

Wanted to continue on #261 but unfortunately that branch is protected from push. Continue here

This PR is one of the steps outlined in #396 and does the following:

  • Each service/ingress sets additional label "Resource" to indicate who generated the endpoint
  • "Conflict Resolver" is added to abstract away the logic of resolving targets for the same hostname
  • "PerResource" resolver uses "Resource" label and acts as a "Conflict Resolver" by making targets the same as specified in the resource who first acquired a dns name (until it is deleted)
  • Plan is changed and tests rewritten. Now plan makes sure that dns provider will never get two endpoints which are requesting same dns name. However now plan allows to update record type (consequence of which we will have to test). It provides a consistent behaviour aligned with the proposal in add first proposal version for multi target #396
  • Registry adds the resource label into the txt records. It also parses the labels and correspondingly populates "Labels" map of each endpoint.
  • Docs outline the major breaking change introduced by this PR changelog should suffice

/cc everyone, even though not all checkboxes are checked a premature review would be nice to have to make sure we are on the same page
Please review @szuecs @linki

Stan Lagun and others added 5 commits July 5, 2017 22:54
With this change it becomes possible to work with endpoint
of empty type in packages other than "provider". Also
it seems logical for a smart property getter without side effects
to be a method rather than a function in different package
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 28, 2017
@ideahitme ideahitme changed the title [WIP] Multi target plan [WIP] Multi target plan ( Page Not Found ) Nov 28, 2017
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 29, 2017
@ideahitme
Copy link
Author

ideahitme commented Nov 29, 2017

@linki would u be available for review, PR should be functionally complete now. But need to test it on actual dns provider to make sure it works as expected.

@ideahitme ideahitme requested a review from linki November 29, 2017 15:48
@ideahitme
Copy link
Author

ideahitme commented Nov 29, 2017

I will also change Labels field to have a type, so that (de)serialization are available as methods - DONE

@szuecs
Copy link
Contributor

szuecs commented Nov 30, 2017

One question from reading the PR description: How does a migration from one resource to another works?

Example case:

I have a service type LoadBalancer defnition for "foo.example.org" and now I want to shift it to an Ingress for example (could be also 2 ingress object or 2 services). Can I add both and delete the old one such that external DNS will shift the DNS record from one to the other as "atomic" operation delete and create in one cycle or do you have to wait a minute are eventually offline for that time frame?

@ideahitme
Copy link
Author

@szuecs yes, it will happen as an atomic operation, from DNS provider point of view it will be:

  1. Updating targets and record type (if needed)
  2. Updating TXT record (to update resource label)

plan/plan.go Outdated
// TODO: allows record type change, which might not be supported by all dns providers
func (t planTable) getUpdates() (updateNew []*endpoint.Endpoint, updateOld []*endpoint.Endpoint) {
for _, row := range t.rows {
if row.current != nil && len(row.candidates) != 0 { //dns name is taken
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= 0 should be > 0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, that makes more sense

if row.current != nil && len(row.candidates) > 0 { //dns name is taken
update := t.resolver.ResolveUpdate(row.current, row.candidates)
// compare "update" to "current" to figure out if actual update is required
if shouldUpdateTTL(update, row.current) || targetChanged(update, row.current) {
Copy link
Author

@ideahitme ideahitme Dec 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we maybe not allow record type change, or make it optional, or provide an option to disallow it ?

@ideahitme ideahitme mentioned this pull request Dec 4, 2017
@ideahitme ideahitme changed the title [WIP] Multi target plan ( Page Not Found ) Multi target plan ( Page Not Found ) Dec 4, 2017
@dereulenspiegel
Copy link
Contributor

I just wanted to provide some feedback. This PR seems to work perfectly fine Google Cloud DNS. I have multiple ingress definitions with multiple targets each. Of course only one A record per DNS name is currently created, but this works as expected.
Now multiple Targets per Endpoint are missing.

@dereulenspiegel
Copy link
Contributor

I have started refactoring Endpoint.Target as []string (actually as its own type which is based on []string). You can see the progress here https://github.com/connctd/external-dns/tree/multi-target.

Right now all tests are green (but without putting much effort or thought into it, so there are probably a lot of new bugs), and ingress source and Google Cloud DNS provider should be able to handle multiple Targets on and endpoint. Now I have to work on the plan to make sure that the desired endpoint for a DNS name is selected.
My assumption right now is that CNAMEs are preferred over endpoints with A records and that endpoints with more A records are preferred over endpoints with fewer A records (if they come from the same ingress).
I hope that I can test this in a real world scenario this week.

@ideahitme
Copy link
Author

@dereulenspiegel I would postpone this, there are multiple PRs already trying to implement this. I would strongly advise to check open PRs first: #326 and #243 . I would not want anyone to work on something and later find that all efforts are wasted

As for the plan, if u based it on this PR, then it should require minimum changes

@dereulenspiegel
Copy link
Contributor

@ideahitme I checked these PRs actually, but I was assuming that they are stale.
Since I am quite far along, I will test my changes anyway including my custom conflict resolver. Probably this can still be interesting here.
Implementing this I hit the situation that in Ingress can still create multiple Endpoint instances. For example one for all IP addresses associated with it, and one for each hostname (through annotation or from the IngressStatus resource). Right now I return all of them and let the conflict resolver handle the selection.
My question would now be, if my assumption is correct that the Ingress resource can still create multiple Endpoint instances per Ingress or if this something which should not happen.

@dereulenspiegel
Copy link
Contributor

My fork is now actually working as expected on GKE with multiple ingress resources, each having multiple IP addresses.
@ideahitme The more I think about the conflict resolution I get the feeling we need to specify this more.
If we actually get multiple endpoints from one ingress with different record types, what is the preferred record type or is this condition an error?
If we have multiple CNAME endpoints which CNAME should be chosen?
How do we handle multiple endpoint with A records but with different sets of targets? Is the longest target list the preferred one or the one with the least changes to the current one (if it makes any difference at all)?
Right now I assume that it is possible to generate multiple endpoints per Ingress (at least the API of the Ingress resource implies this).
This implies from my point of view, that we

  • prefer CNAME based endpoints over A record based endpoints, because the CNAME based endpoints likely point to some kind of load balancer
  • prefer A record based endpoints with the longer target list over A record based endpoints with a shorter target list, because we can assume that both target lists satisfy our needs to route traffic, but the endpoint with the longer list provides more redundancy.
  • currently my conflict resolver will only change the record of an existing record of no candidates matches the given record type

But of course all my assumptions could be wrong, since I am rather new to the kubernetes thing.

Just to advertise my fork a bit :) In my version Targets on Endpoint is of a type Targets just like Labels which possibly provides a bit nicer API than handling a raw string slice.

@ideahitme
Copy link
Author

ideahitme commented Dec 6, 2017

@dereulenspiegel thanks for thoughtful approach :)

If we actually get multiple endpoints from one ingress with different record types, what is the preferred record type or is this condition an error?

I mentioned this point in my proposal. I think for now we can just ignore this case and let DNS provider fail, or simply prefer IP over Hostname in this sort of scenario, or the other way round. This scenario is not common so I would not spend too much time thinking of best approach

If we have multiple CNAME endpoints which CNAME should be chosen?

This one is a bit more tricky. Some DNS providers support this scenario, therefore I think we should leave all of them in the Targets list and let DNS provider logic do what it should

How do we handle multiple endpoint with A records but with different sets of targets? Is the longest target list the preferred one or the one with the least changes to the current one (if it makes any difference at all)?

If I understood you correctly: Similar to this PR if it is update then prefer targets of current resource. If not, then even simply hashing all targets and picking minimum will do (for me the only worry here is consistency). It could also be two-step decision algorithm: pick maximum length target list. If there are multiple such lists, hash and pick minimum/maximum (again consistency)

currently my conflict resolver will only change the record of an existing record of no candidates matches the given record type

It should only change record type if owning resource indicates record type change

P.S: you are welcome to ping me in slack if you'd like to discuss this. Help in this direction is definitely appreciated


// when we delete TXT records for which value has changed (due to new label) this would still work because
// !!! TXT record value is uniquely generated from the Labels of the endpoint. Hence old TXT record can be uniquely reconstructed
Copy link
Contributor

@szuecs szuecs Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is here and in line 128. I am not sure if this makes sense to me. I think this is more a property of the ApplyChanges function of how it does this change or do I miss something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the comment here because what is going on here is a bit of magic. With this PR change we cannot predict what is the value stored in the TXT record, and for the TXT record to be correctly deleted/updated we should make sure its value matches what is stored on DNS provider side. And this is done via regenerating txt record value from labels

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

filteredChanges.Delete = append(filteredChanges.Delete, txt)
}

// make sure TXT records are consistently updated as well
for _, r := range filteredChanges.UpdateNew {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 3 for loops are basically the same, can you refactor them with "extract method" and call them 3 times, for example:

 filteredChanges.Delete = append(filteredChanges.Delete, yourNewFunction()... )
 filteredChanges.UpdateNew = append(filteredChanges.UpdateNew, yourNewFunction()... )
 filteredChanges.UpdateOld = append(filteredChanges.UpdateOld, yourNewFunction()... )

@szuecs
Copy link
Contributor

szuecs commented Dec 11, 2017

from my side lgtm, only small cleanups suggested

// OwnerLabelKey is the name of the label that defines the owner of an Endpoint.
OwnerLabelKey = "owner"
// ResourceLabelKey is the name of the label that identifies k8s resource which wants to acquire the DNS name
ResourceLabelKey = "resource"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Owner was supposed to identify the owner of a resource. For us it was to say it's "the cluster" (or rather the - usually one - instance of external-dns). However now the owner is a particular resource in a particular namespace in a particular cluster. I wonder if it's confusing to separate them this way.

I would rather clarify it with either a combined owner resource (owner=extdns1/default/my-service, is that even easier to code?) or two owner related labels (ownerInstance=extdns1, ownerResource=default/my-service, just a rename then).

Although there's the concern of backwards compatibility...

@ideahitme let me know what you think.

Copy link
Author

@ideahitme ideahitme Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree, the naming is probably not the most descriptive.

However now the owner is a particular resource in a particular namespace in a particular cluster. I wonder if it's confusing to separate them this way.

I would still prefer them to be separated:

  • "owner" label signifies if the DNS Record can be modified at all
  • "resource" label helps to figure out how it should be updated

I also can see scenarios where "resource" label can be ignored altogether, so in my mind it is just a metadata, which allows to implement a "method" for determining resource-ownership, but it could be used for something else, or a different method could be implemented. That's why I thought it is better to keep the label name independent of how we use it.

With the previous statement though it is possibly better to rename "owner" label to something else, like "external-dns-instance", but since it would break compatibility, I would not do this change in this PR. However this change would be possible with some code change which would allow us to gradually migrate txt record values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

continue
}
key := strings.Split(token, "=")[0]
val := strings.Split(token, "=")[1]
Copy link
Member

@linki linki Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Kubernetes' label parsing functions? Seems easier to use: https://github.com/linki/chaoskube/blob/v0.6.1/main.go#L59 and we don't have to maintain this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try it out :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undone, because labels could not be parsed if "/" are present :(

@linki
Copy link
Member

linki commented Dec 12, 2017

For anyone who wants to test this the docker image is available at:

docker pull registry.opensource.zalan.do/teapot/external-dns:v0.4.8-24-ga0c3911

@linki
Copy link
Member

linki commented Dec 13, 2017

Tested on my GKE cluster and it works fine:

  • had existing Ingress with one IP as target
  • added another IP to ingress target => external-dns found both IPs but kept DNS records as is
  • dropped old IP from Ingress => external-dns switched to the new IP
  • added another IP again => external-dns found both IPs but kept DNS records as is

So that seems to work perfect.

@ideahitme
Copy link
Author

@linki couldn't use apimachinery package, due to the parsing failures for label values, such as "default/ingress/my-ingress" due to "/"

@ideahitme
Copy link
Author

ideahitme commented Dec 14, 2017

@linki f so let's get it merged ?

@linki
Copy link
Member

linki commented Dec 14, 2017

@ideahitme yes!

@linki linki merged commit ec07f45 into master Dec 14, 2017
@linki linki deleted the multi-target-plan branch December 14, 2017 10:21
@linki
Copy link
Member

linki commented Dec 14, 2017

🎉

@ideahitme
Copy link
Author

awesome!

ffledgling pushed a commit to ffledgling/external-dns that referenced this pull request Jan 18, 2018
* Make suitableType() be Endpoint method

With this change it becomes possible to work with endpoint
of empty type in packages other than "provider". Also
it seems logical for a smart property getter without side effects
to be a method rather than a function in different package

* Make plan computation work correctly with multi-target domains

* fix drawing

* drop comments

* fix boilerplate header

* fix comment

* fix the bug with empty map

* rework registry to support random lables

*  serialize->serializeLabel function rename

* golint for err variable naming

* add additional test

* add tests for current case where one resource can generate multiple endpoints

* make labels have its own type, add serialization as a method

* add comment for exported error

* use greater rather than not equal zero

* update changelog
linki pushed a commit that referenced this pull request Feb 21, 2018
* Endpoint.Target is now Endpoint.Targets. This is its own type representing mutliple targets for a single DNS name while adding some convenience for sorting and comparing

* Made everything compile and tests run through with the new Endpoint.Targets

* The ingress source can now properly handle multiple target ips per host

* Added custom conflict resolver, to better understand how conflict resolution has to work for me

* My custom conflict resolver behaves a bit different than the PerResource resolver, therefore I needed to modify the expected test result

Removed unnecessary FIXME

* The ingress source now creates CNAME endpoints with multiple targets to let the DNS provider decide how to handle multiple CNAME targets. This could be interesting for weighted targets etc.

* Adopted the expected results to the new way we create endpoints for CNAMEs

* Removed Add method from Targets since manipulating the slice through here is unnecessary complicated and doesn't deliver enough convenience

* Reverted ConflictResolver to the original one. There is some discussing to do what the best way is to handle conflicts

* Added missing documenting comment to IsLess of Targets

* Added documenting comments to Targets,Targets.Same and NewTargets to clarify their intention and usage

* Service source now also generates endpoints with multiple targets

* Service and Ingress source now sort all Targets for every Endpoint to make order of Targets predictable

* Endpoints generated by the Google Cloud DNS provider now also have sorted Targets to make order of Targets predictable

* Modified provider dyn to be able to compile with multi target changes

* Fixed small nitpicks, so my code is acceptable

* Fixed merge method after updating to new Targets. Replacing '!=' with .Same of course needs a boolean negation

* Tests for dyn provider now also use the new Targets instead of Target

* Simplified extractServiceIps as implied by linki to make it more readable

* ref: change service ClusterIP retrieval again

* Added entry to CHANGELOG.md describing the new features contained in this PR
grimmy pushed a commit to grimmy/external-dns that referenced this pull request Apr 10, 2018
…kubernetes-sigs#396 (kubernetes-sigs#418)

* Endpoint.Target is now Endpoint.Targets. This is its own type representing mutliple targets for a single DNS name while adding some convenience for sorting and comparing

* Made everything compile and tests run through with the new Endpoint.Targets

* The ingress source can now properly handle multiple target ips per host

* Added custom conflict resolver, to better understand how conflict resolution has to work for me

* My custom conflict resolver behaves a bit different than the PerResource resolver, therefore I needed to modify the expected test result

Removed unnecessary FIXME

* The ingress source now creates CNAME endpoints with multiple targets to let the DNS provider decide how to handle multiple CNAME targets. This could be interesting for weighted targets etc.

* Adopted the expected results to the new way we create endpoints for CNAMEs

* Removed Add method from Targets since manipulating the slice through here is unnecessary complicated and doesn't deliver enough convenience

* Reverted ConflictResolver to the original one. There is some discussing to do what the best way is to handle conflicts

* Added missing documenting comment to IsLess of Targets

* Added documenting comments to Targets,Targets.Same and NewTargets to clarify their intention and usage

* Service source now also generates endpoints with multiple targets

* Service and Ingress source now sort all Targets for every Endpoint to make order of Targets predictable

* Endpoints generated by the Google Cloud DNS provider now also have sorted Targets to make order of Targets predictable

* Modified provider dyn to be able to compile with multi target changes

* Fixed small nitpicks, so my code is acceptable

* Fixed merge method after updating to new Targets. Replacing '!=' with .Same of course needs a boolean negation

* Tests for dyn provider now also use the new Targets instead of Target

* Simplified extractServiceIps as implied by linki to make it more readable

* ref: change service ClusterIP retrieval again

* Added entry to CHANGELOG.md describing the new features contained in this PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multiple-targets cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants