Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild of the ConflictResolver #3640

Closed
wants to merge 1 commit into from

Conversation

mabels
Copy link

@mabels mabels commented May 28, 2023

Description

After discovering that the conflictresolver and plan only working with a few recordtypes. I decided to build i complete new.

I know it's a bigger change but I tried to fix the existing one and it failed --- mainly the merging and sorting had not been there. And the lack of that prevents the merging of targets and labels. To create a defined or for merging a internal switch to a wrapEndpoint which only have one Target instead of a list of targets.

I had to change the api(plan.Calculate) due to that there are some cases where the conflict resolver is not able to resolve. These are mainly if the input data are not valid like having multiple targets on CNAME's or
PTR's recordtypes. The change of the api and the behavior causes that i had to change the tests in plan and cloudflare_test.go.

Checklist

  • Unit tests updated
  • End user documentation updated

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 28, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @mabels!

It looks like this is your first PR to kubernetes-sigs/external-dns 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/external-dns has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 28, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mabels
Once this PR has been reviewed and has the lgtm label, please assign raffo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@szuecs
Copy link
Contributor

szuecs commented May 28, 2023

I am not sure if we are able to make this change. It's too huge and hard to review.
I think you have to split it into multiple ones.
In general I believe CNAME is the most relevant record type, so from the description I would say I don't want to invest time into it.

Thanks for trying to make the project better.

@mabels
Copy link
Author

mabels commented May 28, 2023

I am not sure if we are able to make this change. It's too huge and hard to review. I think you have to split it into multiple ones. In general I believe CNAME is the most relevant record type, so from the description I would say I don't want to invest time into it.

Thanks for trying to make the project better.

Hi,

thx for the quick response. I'm not sure how i should split up a complete new implementation.

I need almost every RecordType thats why i digged into it.

So there are:

alot of small changes in:
controller/controller.go
controller/controller_test.go
provider/aws/aws_test.go // api
provider/cloudflare/cloudflare_test.go // api and remove recordtype specific

medium but simple -- naming and removing
plan/plan.go
plan/plan_test.go // api change repeat the same block over and over

and new impl and new tests --- these should be treated as new files
plan/conflict.go
plan/conflict_test.go

On my opinion it is a big change but a split is not possible.

thx

meno

I am not sure if we are able to make this change. It's too huge and hard to review. I think you have to split it into multiple ones. In general I believe CNAME is the most relevant record type, so from the description I would say I don't want to invest time into it.

Thanks for trying to make the project better.

Hi,

thx for the quick response. I'm not sure how i should split up a complete new implementation.

I need almost every RecordType thats why i digged into it.

So there are:

alot of small changes in:
controller/controller.go
controller/controller_test.go
provider/aws/aws_test.go // api
provider/cloudflare/cloudflare_test.go // api and remove recordtype specific

medium but simple -- naming and removing
plan/plan.go
plan/plan_test.go // api change repeat the same block over and over

and new impl and new tests --- these should be treated as new files
plan/conflict.go
plan/conflict_test.go

On my opinion it is a big change but a split is not possible.

thx

meno

Copy link
Contributor

@johngmyers johngmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what you are trying to do.

Added test cases are IPv4-only, which makes me suspicious.

There should be smaller commits, with separate refactors in separate commits. Additions to the tests to cover existing behavior should be separate from refactors and behavioral changes. Each behavioral change should be in a separate commit. For example, the change to the resolver to pick at most one target for a CNAME should be its own commit. (Though PTR and CNAME can be in the same commit.)

@@ -97,17 +105,17 @@ func newPlanTable() planTable { // TODO: make resolver configurable
// current corresponds to the record currently occupying dns name on the dns provider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be updated to reflect the change.

Comment on lines +128 to +129
recordType := strings.ToUpper(strings.TrimSpace(e.RecordType))
setIdentifier := strings.TrimSpace(e.SetIdentifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this normalization? These values should already be normalized before creating the endpoint.Endpoint.

Comment on lines +151 to +157
// func (p *Plan) Calculate() *Plan {
// p, err := p.CalculateWithError()
// if err != nil {
// panic(fmt.Sprintf("CalculateWithError should not return an error:%v", err))
// }
// return p
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't leave in commented-out code.

// return p
// }

func (p *Plan) CalculateWithError() (*Plan, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should change the name. Just change the signature if need be.

@@ -186,32 +223,32 @@ func (p *Plan) Calculate() *Plan {
Current: p.Current,
Desired: p.Desired,
Changes: changes,
ManagedRecords: []string{endpoint.RecordTypeA, endpoint.RecordTypeAAAA, endpoint.RecordTypeCNAME},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing AAAA support?

// return p
// }

func (p *Plan) CalculateWithError() (*Plan, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary for the planner to be able to return an error?

foundCandidate = true
continue
}
changes.UpdateOld = append(changes.UpdateOld, oldEp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, len(changes.UpdateOld) == len(changes.UpdateNew) was an invariant. This is breaking that invariant. For what purpose?

},
},
UpdateNew: []*endpoint.Endpoint{
{
DNSName: "some-record.used.tld",
RecordType: endpoint.RecordTypeA,
Targets: endpoint.Targets{"1.1.1.1"},
// this ne new resolver will transfer existing labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code copies only the owner label. If adding an assertion for that, then the test case should also include a second label and assert that the code does not (or does) copy that.

@@ -97,17 +105,17 @@ func newPlanTable() planTable { // TODO: make resolver configurable
// current corresponds to the record currently occupying dns name on the dns provider
// candidates corresponds to the list of records which would like to have this dnsName
type planTableRow struct {
current *endpoint.Endpoint
currents []*endpoint.Endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to have more than one current? When would that happen?

t := newPlanTable()

if p.DomainFilter == nil {
p.DomainFilter = endpoint.MatchAllDomainFilters(nil)
}

// dnsname and recordtype, setIdentifier is used to group records together
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot parse this comment. What does it mean?

@mabels
Copy link
Author

mabels commented Jun 1, 2023

It is not clear what you are trying to do.

Added test cases are IPv4-only, which makes me suspicious.

There should be smaller commits, with separate refactors in separate commits. Additions to the tests to cover existing behavior should be separate from refactors and behavioral changes. Each behavioral change should be in a separate commit. For example, the change to the resolver to pick at most one target for a CNAME should be its own commit. (Though PTR and CNAME can be in the same commit.)

Hi,

super thx for your comments -- i will intergrate them. My goal for this is not to enable any particual record types. It's about to enable all record types. And i skipped to rebuild the hole test suite to check all support record types. But i focus on get the existing tests working with the new logic. I currently test my changes on my real-live application and discovered and improved alot on this journey(not commited yet). Mainly it about the generated heritage TXT records which causes some problems when records bound directly to the zone. This leads to some nasty situation due to the update of these records so i changed them to be planned as all other records. And this brokes alot of tests and required some new. I'm currently in the middle of something to get that done.
so please stay tuned.
And i understand that this PR is more than a "tweak" of external-dns. I want to use external-dns to maintain my dns for a life and running organization with 100+ zones and as many applications. You implemented the dnsendpoint crd and i started to use it, and it seams not to work so i make it work for me.
If you want to change the todays main purpose of external-dns as ingress and svc exposement to dns - i think that what i build might be a good starting point. But if the focus is not shift my changes are to much. And than it might be better that i fork external-dns and make use of your provider work and go from there or you decide to change the major-version nr. What I build is definitily breaking existing behavior for a more generalized use case where the todays is only a subset. With the providers(cloudflare) I discovered some really ugly behavior so they unifi dnsnames and modify targets per record type between write and read. So it is required that the providers all need tests which ensure that what you write you get back in the same format. Example cloudflare changes a SRV from "prio weight port target." to "weight\tport\ttarget" this disrupt any plan calculation. So the cf provider has to convert on read back to the "written" format which is some kind of tricky was there a . on the end of the target.

thx again

meno

@johngmyers
Copy link
Contributor

I think it would help to spend more time in the requirements and design phase. What are the specific behaviors you want to change and how do you want to change them?

It is an invariant of the provider/registry interface that there is at most one Endpoint per <DNSName, RecordType, SetIdentifier> tuple. Currently the resolver chooses a single Endpoint from the candidates proposed from sources. Perhaps you want some sort of merging behavior? Can you propose a good rule for when the resolver should merge versus choose a winner?

@johngmyers johngmyers mentioned this pull request Jun 7, 2023
2 tasks
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 18, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@johngmyers johngmyers mentioned this pull request Sep 13, 2023
2 tasks
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants