Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExternalDNS: TXT record issue #715

Closed
cyril94440 opened this issue Jun 6, 2023 · 7 comments
Closed

ExternalDNS: TXT record issue #715

cyril94440 opened this issue Jun 6, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@cyril94440
Copy link

cyril94440 commented Jun 6, 2023

Describe the bug

While using ExternalDNS with EKS-Blueprint, the records are created successfully during the first iteration. However, during the second iteration, most of the records are deleted and ExternalDNS gets stuck, throwing an InvalidChangeBatch error. This is observed in the logs.

Expected Behavior

ExternalDNS should maintain all the DNS records across multiple iterations without deleting them unnecessarily. In this case, we expect that the six DNS records created during the first iteration remain intact during the second iteration and beyond.

Current Behavior

After the first iteration, six DNS records are created successfully. However, during the second iteration, ExternalDNS attempts to create the same records again, resulting in an InvalidChangeBatch error due to duplication. Subsequently, all records except two (cname-k8s.acme.ai and cname-nginx.k8s.acme.ai) are deleted.

Please find the logs below:

First iteration ExternalDNS logs:

time="2023-06-06T12:15:41Z" level=info msg="Applying provider record filter for domains: [acme.ai. .acme.ai.]"
time="2023-06-06T12:15:41Z" level=info msg="Desired change: CREATE cname-k8s.acme.ai TXT [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:15:41Z" level=info msg="Desired change: CREATE cname-nginx.k8s.acme.ai TXT [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:15:41Z" level=info msg="Desired change: CREATE k8s.acme.ai A [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:15:41Z" level=info msg="Desired change: CREATE k8s.acme.ai TXT [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:15:41Z" level=info msg="Desired change: CREATE nginx.k8s.acme.ai A [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:15:41Z" level=info msg="Desired change: CREATE nginx.k8s.acme.ai TXT [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:15:41Z" level=info msg="6 record(s) in zone acme.ai. [Id: /hostedzone/ZXXXXX] were successfully updated"

Records Created:

k8s.acme.ai
TXT "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/kube-system/blueprints-addon-nginx-nginx-ingress-controller"
A k8s-kubesyst-blueprin-f74ac9cf07-111111.elb.eu-west-3.amazonaws.com.

cname-k8s.acme.ai
TXT "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/kube-system/blueprints-addon-nginx-nginx-ingress-controller"


cname-nginx.k8s.acme.ai
TXT "heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/team-dev-preprod/test-ng"

nginx.k8s.acme.ai
TXT "heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/team-dev-preprod/test-ng"
A k8s-kubesyst-blueprin-f74ac9cf07-111111.elb.eu-west-3.amazonaws.com.

Second iteration ExternalDNS logs:

time="2023-06-06T12:16:42Z" level=info msg="Applying provider record filter for domains: [acme.ai. .acme.ai.]"
time="2023-06-06T12:16:42Z" level=info msg="Desired change: CREATE cname-k8s.acme.ai TXT [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:16:42Z" level=info msg="Desired change: CREATE cname-nginx.k8s.acme.ai TXT [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:16:42Z" level=info msg="Desired change: CREATE k8s.acme.ai A [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:16:42Z" level=info msg="Desired change: CREATE k8s.acme.ai TXT [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:16:42Z" level=info msg="Desired change: CREATE nginx.k8s.acme.ai A [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:16:42Z" level=info msg="Desired change: CREATE nginx.k8s.acme.ai TXT [Id: /hostedzone/ZXXXXX]"
time="2023-06-06T12:16:42Z" level=error msg="Failure in zone acme.ai. [Id: /hostedzone/ZXXXXX] when submitting change batch: InvalidChangeBatch: [Tried to create resource record set [name='cname-k8s.acme.ai.', type='TXT'] but it already exists, Tried to create resource record set [name='cname-nginx.k8s.acme.ai.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: XXXXX"
time="2023-06-06T12:16:43Z" level=error msg="Failed submitting change (error: InvalidChangeBatch: [Tried to create resource record set [name='cname-k8s.acme.ai.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: XXXXX), it will be retried in a separate change batch in the next iteration"
time="2023-06-06T12:16:43Z" level=error msg="Failed submitting change (error: InvalidChangeBatch: [Tried to create resource record set [name='cname-nginx.k8s.acme.ai.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: XXXXX), it will be retried in a separate change batch in the next iteration"
time="2023-06-06T12:16:44Z" level=error msg="failed to submit all changes for the following zones: [/hostedzone/ZXXXXX]"

Records REMAINING:

cname-k8s.acme.ai
TXT "heritage=external-dns,external-dns/owner=default,external-dns/resource=service/kube-system/blueprints-addon-nginx-nginx-ingress-controller"


cname-nginx.k8s.acme.ai
TXT "heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/team-dev-preprod/test-ng"

Reproduction Steps

  1. Set up ExternalDNS with EKS-Blueprint.
  2. Observe the successful creation of DNS records during the first iteration.
  3. On the second iteration, observe the InvalidChangeBatch error and the deletion of most records.

Here is my Ingress manifest:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
  labels:
    argocd.argoproj.io/instance: test-ng
    name: test-ng
  name: test-ng
  namespace: team-dev-preprod
spec:
  rules:
    - host: nginx.k8s.acme.ai
      http:
        paths:
          - backend:
              service:
                name: test-ng
                port:
                  number: 80
            path: /
            pathType: Prefix

and CDK code:

#!/usr/bin/env node
import "source-map-support/register";
import * as cdk from "aws-cdk-lib";
import * as blueprints from "@aws-quickstart/eks-blueprints";
import {
  ArgoCDAddOnProps,
  KarpenterAddOn
} from "@aws-quickstart/eks-blueprints";
import { CapacityType, KubernetesVersion, Nodegroup, NodegroupAmiType } from "aws-cdk-lib/aws-eks";
import { TeamDatasciencePreprod, TeamDatascienceProd } from "../lib/teams";
import { InstanceType } from "aws-cdk-lib/aws-ec2";
import { RabbitMQApp } from "../lib/rabbitMQ";
import { DatabaseAurora } from "../lib/databaseAurora";

//Secrets required in Secrets Manager : "argocd/repo/ssh_key" "argocd/admin_password"

const app = new cdk.App();

// ---- CONSTANTS ----
const clusterName = "CLUSTER";
export const account = "00000000";
const region = "eu-west-3";
const domainAWSID = "Z00000000";
const subdomain = "k8s.acme.ai";

const boostrapRepoUrl =
  "git@github.com:ACME/repo.git";

// ---- KARPENTER ----
type KarpenterAddOnProps = ConstructorParameters<typeof KarpenterAddOn>[0];
const karpenterAddonProps:KarpenterAddOnProps = {
  requirements: [
    { key: "kubernetes.io/arch", op: "In", vals: ["arm64"] },
    {
      key: "karpenter.sh/capacity-type",
      op: "In",
      vals: ["spot", "on-demand"],
    },
    {
      key: 'karpenter.k8s.aws/instance-category',
      op: "In",
      vals: ["t", "a", "c", "m", "r"]
    }
  ],
  subnetTags: {
    Name: `${clusterName}/${clusterName}-vpc/PrivateSubnet1`
  },
  securityGroupTags: {
    [`kubernetes.io/cluster/${clusterName}`]: "owned",
  },
  interruptionHandling: true,
  consolidation: { enabled: true },
  name: "default"
};

// ---- ArgoCD ----
const argoCDAddonProps: ArgoCDAddOnProps = {
  version: '5.29.1',
  bootstrapRepo: {
    repoUrl: boostrapRepoUrl,
    credentialsSecretName: "argocd-repo-ssh_key", //Not working :(
    credentialsType: "SSH",
    path: "main",
  },
  adminPasswordSecretName: "argocd-admin_password",
};

// ---- Default Instance ----
const clusterProvider = new blueprints.MngClusterProvider({
  version: KubernetesVersion.V1_26,
  desiredSize: 2,
  instanceTypes: [new InstanceType("t4g.medium")],
  diskSize: 50,
  amiType: NodegroupAmiType.BOTTLEROCKET_ARM_64
});


// ---- Teams ----
const teams: Array<blueprints.Team> = [
  new TeamDatasciencePreprod(app),
  new TeamDatascienceProd(app),
];

// ---- Full EKS cluster ----
const addOns: Array<blueprints.ClusterAddOn> = [
  new blueprints.addons.SecretsStoreAddOn(),
  new blueprints.addons.AwsLoadBalancerControllerAddOn(),
  new blueprints.addons.ArgoCDAddOn(argoCDAddonProps),
  new blueprints.addons.MetricsServerAddOn(),
  new blueprints.addons.VpcCniAddOn(),
  new blueprints.addons.KarpenterAddOn(karpenterAddonProps),
  new blueprints.addons.ExternalDnsAddOn({
    hostedZoneResources: [blueprints.GlobalResources.HostedZone],
  }),
  new blueprints.addons.NginxAddOn({
    externalDnsHostname: subdomain,
    certificateResourceName: blueprints.GlobalResources.Certificate,
    internetFacing: true,
    backendProtocol: "tcp",
    crossZoneEnabled: false,
  }),
  // new blueprints.addons.CloudWatchAdotAddOn(),
  // new blueprints.addons.XrayAddOn(),
  //add Datadog
];

const eksBlueprint = blueprints.EksBlueprint.builder()
  .account(account)
  .region(region)
  .resourceProvider(
    blueprints.GlobalResources.HostedZone,
    new blueprints.ImportHostedZoneProvider(domainAWSID, subdomain),
  )
  .resourceProvider(
    blueprints.GlobalResources.Certificate,
    new blueprints.CreateCertificateProvider(
      "wildcard-cert",
      `*.${subdomain}`,
      blueprints.GlobalResources.HostedZone
    )
  )
  .addOns(...addOns)
  .clusterProvider(clusterProvider)
  .teams(...teams)
  .build(app, clusterName);

const eksVpc = eksBlueprint.getClusterInfo().cluster.vpc
// const eksSubnets = eksBlueprint

new Nodegroup(eksBlueprint, 'testGpuNodeGroup', {
  cluster: eksBlueprint.getClusterInfo().cluster,
  nodegroupName: "testGpuNodeGroup",
  desiredSize: 1,
  instanceTypes: [new InstanceType("g4dn.xlarge")], 
  diskSize: 50, 
  amiType: NodegroupAmiType.AL2_X86_64_GPU,
  capacityType: CapacityType.SPOT,
});

Possible Solution

There might be a conflict somewhere as some TXT records are created with and without prefix for a same url...

Additional Information/Context

No response

CDK CLI Version

2.81.0 (build bd920f2)

EKS Blueprints Version

1.8.1

Node.js Version

v18.14.0

Environment details (OS name and version, etc.)

Mac OS 13.2 (22D49)

Other information

No response

@cyril94440 cyril94440 added the bug Something isn't working label Jun 6, 2023
@cyril94440
Copy link
Author

This seems to solve the issue kubernetes-sigs/external-dns#262

Is it possible to pass the args in CDK code?

@shapirov103
Copy link
Collaborator

This seems to solve the issue kubernetes-sigs/external-dns#262

Is it possible to pass the args in CDK code?

You can surely pass txtPrefix in the values for the add-on. E.g.

new blueprints.ExternalDnsAddOn({
    hostedZoneResources: [blueprints.GlobalResources.HostedZone], // you can add more if you register resource providers,
    values: { txtPrefix: "value"}
})

The list of values to pass to the external DNS add-on as extra helm parameters is here.

Please let me know if it works. If there is something we need to include in the framework ootb, happy to discuss/include the change.

@cyril94440
Copy link
Author

cyril94440 commented Jun 8, 2023

@shapirov103 Thanks! I think it could be a great idea to add a default value for txtPrefix as without it it doesn't work properly on route53.

@cyril94440
Copy link
Author

@shapirov103 I did that in my CDK code but the value is not applied, any thoughts?

 new blueprints.addons.ExternalDnsAddOn({
    hostedZoneResources: [blueprints.GlobalResources.HostedZone],
    values:{
      'txt-prefix':'xdnstest-'
    }
  }),

@shapirov103
Copy link
Collaborator

@cyril94440 it is a bit unclear why you used txt-prefix as opposed to what I used in the example txtPrefix? Please see the list of values here: https://github.com/kubernetes-sigs/external-dns/blob/master/charts/external-dns/values.yaml
Case matters.

@cyril94440
Copy link
Author

@shapirov103 right? Why would I do that.. sorry stupid mistake, it works fine now!

We can either leave this open to force AWS route53 users to add a txtPrefix, or update the documentation or... close it if you feel like it is the responsibility of the user

@elamaran11
Copy link
Collaborator

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants