Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle migration to the new TXT format: use ForceUpdate strategy #3635

Merged

Conversation

mcharriere
Copy link
Contributor

Description

This PR changes the way that missing records are handled from a specific logic to a force update strategy.
The idea basically is that instead of having a dedicated plan and a especial logic inside the planner to handle missing TXT records, external-dns can force the update of the records avoiding conflicts.

The base for this PR was introduced in #2811 and #2913

Fixes #3217

Checklist

  • Unit tests updated
  • End user documentation updated

@k8s-ci-robot
Copy link
Contributor

Welcome @mcharriere!

It looks like this is your first PR to kubernetes-sigs/external-dns 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/external-dns has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 25, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 25, 2023
@szuecs
Copy link
Contributor

szuecs commented May 26, 2023

/ok-to-test
/approve

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 26, 2023
@szuecs
Copy link
Contributor

szuecs commented May 26, 2023

@mcharriere thanks makes sense and simplified it 😊

@mcharriere
Copy link
Contributor Author

mcharriere commented May 29, 2023

@szuecs thank you for the quick review 😄. the pr is not merging because it's missing the lgtm label. could you check that? thanks again!

@phajduk
Copy link

phajduk commented May 31, 2023

@mcharriere if I understand this correctly, using this feature brings us an ability to manually clear up all TXT entries, created by external-dns, and such will get recreated?

@mcharriere
Copy link
Contributor Author

if I understand this correctly, using this feature brings us an ability to manually clear up all TXT entries, created by external-dns, and such will get recreated?

hi @phajduk. no, that's not correct. it still needs the at least 1 TXT record to be present in the registry (either the old format or the new one). This PR doesn't change the behavior, it only uses a different strategy.

@johngmyers
Copy link
Contributor

The model here is slightly wrong, leading to test cases for situations that will never occur.

The new field will only ever be set on current Endpoints. Sources will not set it in the candidates. The new bit means "something about how this Endpoint is currently stored in the provider/registry is inconsistent". The only thing that has to happen is for the planner's Calculate() to schedule an update when the bit needs to change from set to unset.

We do not need tests for having ForceUpdate set in the candidates.

Instead of being a new field, this could go into ProviderSpecific.

@mcharriere
Copy link
Contributor Author

The model here is slightly wrong, leading to test cases for situations that will never occur.

The new field will only ever be set on current Endpoints. Sources will not set it in the candidates. The new bit means "something about how this Endpoint is currently stored in the provider/registry is inconsistent". The only thing that has to happen is for the planner's Calculate() to schedule an update when the bit needs to change from set to unset.

That's the point of having a ForceUpdate, so the planner can queue endpoints for changes, independently of their real state. That's useful to solve registry inconsistencies as you said, but that's just one use case.
I know it's not ideal, but having registry specific logic (as before) on different places makes the migration to the new format really hard.

We do not need tests for having ForceUpdate set in the candidates.

Instead of being a new field, this could go into ProviderSpecific.

I don't think that's the appropriate place for it, because it's definitively not provider specific and on the other hand we don't need to keep track of the forced update.

The main reason behind this PR is to simplify the migration process. Currently upgrading from 0.11 to a newer version when working with AWS in a dual zone setup (public+private). I want also to create a follow up PR based on this one to remove the old format DNS record to finalize the migration.

@johngmyers
Copy link
Contributor

Once you remove MissingRecords(), Provider and Registry are the same interface. ProviderSpecific can hold registry-specific data. The inconsistency and resulting need for an update is specific to the txt registry. The txt registry's PropertyValuesEqual() can use such registry-specific data in ProviderSpecific to cause the planner to send an update.

So I would say this definitely belongs in ProviderSpecific. ProviderSpecific (and PropertyValuesEqual) already provide the necessary mechanisms to cause the planner to send an update; there is no need for a new field.

Separately from this, there is no need to have tests where ForceUpdate is set in candidates. There isn't a use case for that.

This change tries to remove part of the logic added in 50f196c.
The forceUpdate strategy relies on existing code of the planner to migrate TXT records to the new format,
being the main goal to avoid conflicts during apply.

Signed-off-by: Matias Charriere <matias@giantswarm.io>
the entire logic of missing endpoints was removed from the controller
taking leverage on the planner

Signed-off-by: Matias Charriere <matias@giantswarm.io>
Signed-off-by: Matias Charriere <matias@giantswarm.io>
Signed-off-by: Matias Charriere <matias@giantswarm.io>
Signed-off-by: Matias Charriere <matias@giantswarm.io>
Signed-off-by: Matias Charriere <matias@giantswarm.io>
@mcharriere
Copy link
Contributor Author

hi there. as per @johngmyers suggestion, I reimplemented the PR using a ProviderSpecific property to force the update of the records. This is a bit cleaner than before, as it removes any logic inside the planner and it's located only on the TXT registry.

Thank you @johngmyers for your patience :)

@szuecs @johngmyers I'd appreciate another review. thank you!

@szuecs
Copy link
Contributor

szuecs commented Jun 15, 2023

/approve cancel
Have to read it again, thanks @johngmyers

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2023
@szuecs
Copy link
Contributor

szuecs commented Jun 15, 2023

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2023
Copy link
Contributor

@johngmyers johngmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, thanks!
The one comment is non-blocking. I'm separately removing all special-case logic from PropertyValuesEqual and it'd be simpler if we don't add in something I'll just remove later. Cancel the hold if you disagree.

/lgtm
/hold to give author chance to address non-blocking comment

registry/txt.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 15, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johngmyers, mcharriere, szuecs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Matias Charriere <matias@giantswarm.io>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2023
@johngmyers
Copy link
Contributor

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 18, 2023
@k8s-ci-robot k8s-ci-robot merged commit 930061a into kubernetes-sigs:master Jun 18, 2023
@mcharriere mcharriere deleted the handle-missing-txt-records branch June 19, 2023 06:47
@johngmyers johngmyers mentioned this pull request Sep 13, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[External DNS] InvalidChangeBatch error in v0.12.1
5 participants