Skip to content
This repository has been archived by the owner on Sep 16, 2019. It is now read-only.

Canonical Zone ID for endpoint not found #77

Closed
naphthalene opened this issue Jan 30, 2017 · 11 comments
Closed

Canonical Zone ID for endpoint not found #77

naphthalene opened this issue Jan 30, 2017 · 11 comments

Comments

@naphthalene
Copy link

Hello,

I'm running into an issue getting mate to work. I have a kubernetes cluster running that I proxy to localhost:8001. I started mate as follows:

$ mate \                                     
--producer=kubernetes \
--kubernetes-format="{{.Name}}-{{.Namespace}}.mydomain.com" \
--consumer=aws \
--aws-record-group-id=k8smate \
--sync-only \
--kubernetes-server=http://127.0.0.1:8001

Error I'm seeing is:

ERRO[0363] Canonical Zone ID for endpoint:  is not found 
ERRO[0363] Canonical Zone ID for endpoint: <my-elb>.us-east-2.elb.amazonaws.com is not found

The problem seems to be in the endpointToAlias function and stems from the fact that an ELB is created in a default hosted zone for ELBs in the region/AZ. As a result, it looks up the hosted zone id for the load balancer and sees one that doesn't belong to the user and errors that its not found.

What is the correct solution - how can I either make kubernetes use my hosted zone or have mate resolve the right hosted zone id for the domain I specify?

@ideahitme
Copy link
Contributor

ideahitme commented Jan 30, 2017

@naphthalene Hello, it seems you are running mate locally and can u verify that you can perform describe-load-balancers action.

The flow is the following:

  1. It looks up all classic and application load balancers per your account (regardless of the hosted zones) and builds a map from lb-dns to its canonical hosted zone.
  2. Then it verifies that the load balancer created for your service by kubernetes is in this map.

So it should work as long as you are using same AWS account (your local setup and kubernetes cluster)

Plus could you please enable --debug flag

@naphthalene
Copy link
Author

Same account is being used. I am running locally.

Relevant output of describe-load-balancers:

            "CanonicalHostedZoneNameID": "Z3AADJGX6KTTL2", 
            "CanonicalHostedZoneName": "<snip>.us-east-2.elb.amazonaws.com"

Note the hosted zone ID

Looking at my hosted zones, I see a different zone ID:

{
    "HostedZones": [
        {
            "ResourceRecordSetCount": 13, 
            "CallerReference": "<snip>", 
            "Config": {
                "PrivateZone": false
            }, 
            "Id": "/hostedzone/Z30D7Q3FJN2V1W", 
            "Name": "mydomain.com."
        }
    ]
}

@naphthalene
Copy link
Author

Oddly enough, when I ran with Debug enabled, it worked! Very strange

@ideahitme
Copy link
Contributor

ideahitme commented Jan 30, 2017

@naphthalene canonical hosted zone id refers not to your hosted zone, but to the hosted zone where the load balancer is hosted. In your case Z3AADJGX6KTTL2 is a correct output, as it is the hosted zone id for *.us-east-2.amazonaws.com.

Running with --debug is unlikely to fix the problem :) but it might help if u paste the canonical hosted zone id for the load balancer which fails, and the relevant (only canonical hosted zone ids are relevant) full-output of the describe load balancers query.

Few more questions, which mate version you are running and the number of ELBs registered with your account. I suspect it is an old version of mate, where not all load balancers are retrieved, hence kubernetes created elb is simply not found (happens when u have > 100 load balancers). The bug is fixed in current release (and few prior too)

@naphthalene
Copy link
Author

I have it working now, going to close the ticket. I'm not sure what I was doing wrong first but I'm not going to waste your time chasing this. If i run into it again I'll reopen the issue!

@tdrozdowski
Copy link

tdrozdowski commented Jan 30, 2017

I'm facing this problem too - v0.5.1 and 'latest' for mate.

time="2017-01-30T23:19:15Z" level=debug msg="[Synchronize] Sleeping for 1m0s..." 
time="2017-01-30T23:19:15Z" level=info msg="[AWS] Listening for events..." 
time="2017-01-30T23:19:15Z" level=info msg="ADDED: kube-system/kubernetes-dashboard" 
time="2017-01-30T23:19:15Z" level=warning msg="[Service] The load balancer of service 'kube-system/kubernetes-dashboard' does not have any ingress." 
time="2017-01-30T23:19:15Z" level=info msg="ADDED: xcdn-staging/dns-auth" 
time="2017-01-30T23:19:15Z" level=info msg="[AWS] Processing (dns-auth.staging.xcdn.io, , ad50df968e73211e6bfeb023f1e3e2b4-1855914691.us-west-2.elb.amazonaws.com)\n" 
time="2017-01-30T23:19:15Z" level=info msg="ADDED: kube-system/collectd" 
time="2017-01-30T23:19:15Z" level=warning msg="[Service] The load balancer of service 'kube-system/collectd' does not have any ingress." 
time="2017-01-30T23:19:15Z" level=info msg="ADDED: default/kubernetes" 
time="2017-01-30T23:19:15Z" level=warning msg="[Service] The load balancer of service 'default/kubernetes' does not have any ingress." 
time="2017-01-30T23:19:15Z" level=info msg="ADDED: kube-system/kube-dns" 
time="2017-01-30T23:19:15Z" level=warning msg="[Service] The load balancer of service 'kube-system/kube-dns' does not have any ingress." 
time="2017-01-30T23:19:15Z" level=error msg="Error getting LBs: MissingRegion: could not find region configuration. Skipping..." 
time="2017-01-30T23:19:15Z" level=error msg="Error getting LBs: MissingRegion: could not find region configuration. Skipping..." 
time="2017-01-30T23:19:15Z" level=error msg="Canonical Zone ID for endpoint: ad50df968e73211e6bfeb023f1e3e2b4-1855914691.us-west-2.elb.amazonaws.com is not found" 
panic: runtime error: index out of range

goroutine 19 [running]:
panic(0x14d7a00, 0xc420014060)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/zalando-incubator/mate/consumers.(*awsConsumer).Process(0xc42039ed00, 0xc42026f4d0, 0xc420266e78, 0x3)
	/home/master/workspace/teabag_mate_master-5HM4GNJPGTJMNQYAZYN6JFXPZ2TWYP4JPPHRCHUMWCZK6LHVMNQA/mate/_jenkins_build/go/src/github.com/zalando-incubator/mate/consumers/aws.go:190 +0x45a
github.com/zalando-incubator/mate/consumers.(*awsConsumer).Consume(0xc42039ed00, 0xc4203cd080, 0xc4203cd0e0, 0xc4203cd140, 0xc4201a7d50)

@naphthalene
Copy link
Author

Oh yeah, I forgot to say that if I don't ryn with --sync-only the above happens.

@naphthalene naphthalene reopened this Jan 30, 2017
@ideahitme
Copy link
Contributor

ideahitme commented Jan 31, 2017

@tdrozdowski thanks for reporting. I know where the panic is coming from. The issue is apparently due to misconfigured aws config. Could you please check that the region is specified in the ~/.aws/config file (if u are running locally)?

@naphthalene do you also have the following error message:

time="2017-01-30T23:19:15Z" level=error msg="Error getting LBs: MissingRegion: could not find region configuration. Skipping..." time="2017-01-30T23:19:15Z" level=error msg="Error getting LBs: MissingRegion: could not find region configuration. Skipping..." ?

@naphthalene
Copy link
Author

That was the problem. Thanks!!

@tdrozdowski
Copy link

tdrozdowski commented Jan 31, 2017

@ideahitme - I'm not running locally - this is all running on my k8s cluster that is on AWS.

@ideahitme
Copy link
Contributor

ideahitme commented Jan 31, 2017

@tdrozdowski apparently your problem is different, could you please create a new issue with relevant details. Thanks :)
(log output with enabled --debug and --sync-only flags)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants