Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow AWS provider to change record types #1867

Conversation

jsravn
Copy link
Contributor

@jsravn jsravn commented Nov 19, 2020

Description

Currently the AWS provider cannot handle record type changes. It always
attempts to UPSERT such updates, which will fail the entire zone batch
of changes. As a result, a single resource change can break all the
updates for the entire zone.

This change modifies the AWS behavior to correctly identify when the
record type changes and perform a batched DELETE and CREATE to update
the record successfully.

Special logic is required to handle ALIAS records which are not directly
encoded by the generic external-dns code, and relies on
convention (using a CNAME record type internally). I'm not sure this is
ideal as it's fairly error prone, and would prefer to see direct support
for such ALIAS types, but I've left it alone in this change.

Fixes #1852

Checklist

  • Unit tests updated
  • End user documentation updated

Currently the AWS provider cannot handle record type changes. It always
attempts to UPSERT such updates, which will fail the entire zone batch
of changes. As a result, a single resource change can break all the
updates for the entire zone.

This change modifies the AWS behavior to correctly identify when the
record type changes and perform a batched DELETE and CREATE to update
the record successfully.

Special logic is required to handle ALIAS records which are not directly
encoded by the generic external-dns code, and relies on
convention (using a CNAME record type internally). I'm not sure this is
ideal as it's fairly error prone, and would prefer to see direct support
for such ALIAS types, but I've left it alone in this change.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 19, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @jsravn!

It looks like this is your first PR to kubernetes-sigs/external-dns 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/external-dns has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 19, 2020
@jsravn
Copy link
Contributor Author

jsravn commented Nov 20, 2020

/assign @Raffo

@jsravn
Copy link
Contributor Author

jsravn commented Nov 23, 2020

ping @Raffo , mind having a look?

Comment on lines +403 to +416
// UpdateRecords updates a given set of old records to a new set of records in a given hosted zone.
func (p *AWSProvider) UpdateRecords(ctx context.Context, updates, current []*endpoint.Endpoint) error {
zones, err := p.Zones(ctx)
if err != nil {
return errors.Wrapf(err, "failed to list zones, aborting UpdateRecords")
}

records, err := p.records(ctx, zones)
if err != nil {
log.Errorf("failed to list records while preparing UpdateRecords: %s", err)
}

return p.submitChanges(ctx, p.createUpdateChanges(updates, current, records, zones), zones)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this exactly the same as doRecords? If so, why are we duplicating it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite the same, since we need to call createUpdateChanges to handle the special case of ALIAS records. The duplication is just getting the zones and records. I suppose I could remove the duplication by passing a function as an argument to doRecords.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I missed that. I think we can keep it as is then.

Comment on lines +425 to +435
if new.RecordType != old.RecordType ||
// Handle the case where an AWS ALIAS record is changing to/from a CNAME.
(old.RecordType == endpoint.RecordTypeCNAME && useAlias(old, p.preferCNAME) != useAlias(new, p.preferCNAME)) {
// The record type changed, so UPSERT will fail. Instead perform a DELETE followed by a CREATE.
deletes = append(deletes, old)
creates = append(creates, new)
} else {
// Safe to perform an UPSERT.
updates = append(updates, new)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this into UpdateRecords ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be called by ApplyChanges as well. I'm not quite sure why ApplyChanges is duplicated w/ these individual methods (Create/Update/Delete), but that's they way it is at the moment. All these methods could be refactored to remove the duplication possibly.

@Raffo
Copy link
Contributor

Raffo commented Dec 2, 2020

All right, changes look good to me. Did you have the chance to build an image and try this out @jsravn ? In case you didn't can I ask you to test the change once we get a master image before we will make a release?

@jsravn
Copy link
Contributor Author

jsravn commented Dec 3, 2020

Hi @Raffo, I've tested it in a real cluster deployed to AWS and it seems to work well. I'll get better coverage when I roll it out for production usage over the next couple weeks.

With this change, attempting to set target to a CNAME when it is currently an ALIAS.

{"level":"info","msg":"Desired change: UPSERT my.example.com TXT [Id: /hostedzone/MYZONEID]","time":"2020-12-03T11:13:54Z"}
{"level":"info","msg":"Desired change: DELETE my.example.com A [Id: /hostedzone/MYZONEID]","time":"2020-12-03T11:13:54Z"}
{"level":"info","msg":"Desired change: CREATE my.example.com CNAME [Id: /hostedzone/MYZONEID]","time":"2020-12-03T11:13:54Z"}
{"level":"info","msg":"3 record(s) in zone example.com. [Id: /hostedzone/MYZONEID] were successfully updated","time":"2020-12-03T11:13:54Z"}
{"level":"info","msg":"All records are already up to date","time":"2020-12-03T11:14:25Z"}

Without it:

{"level":"info","msg":"Desired change: UPSERT my.example.com CNAME","time":"2020-12-03T11:27:22Z"}
{"level":"info","msg":"Desired change: UPSERT external-dns_my.example.com TXT","time":"2020-12-03T11:27:22Z"}
{"level":"error","msg":"InvalidChangeBatch: [RRSet of type CNAME with DNS name my.example.com. is not permitted as it conflicts with other records with the same DNS name in zone example.com.]\n\tstatus code: 400, request id: bb65d964-f512-4352-b05b-de9379da7ccf","time":"2020-12-03T11:27:22Z"}
{"level":"error","msg":"Failed to submit all changes for the following zones: [/hostedzone/MYZONEID]","time":"2020-12-03T11:27:22Z"}

@Raffo
Copy link
Contributor

Raffo commented Dec 9, 2020

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 9, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsravn, Raffo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 9, 2020
@k8s-ci-robot k8s-ci-robot merged commit 9f8a974 into kubernetes-sigs:master Dec 9, 2020
@jsravn jsravn deleted the correctly-update-aws-records-when-type-changes branch December 9, 2020 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AWS Provider] Changes that require a record type change (for example A to CNAME) fail
3 participants