[Cloudflare provider] external-dns should stop processing when zone lookup fails #2610

mateusz-jablonski94 · 2022-02-18T14:54:44Z

What happened:

We have an application aaa.bbb.com that is hosted in two datacenters. Domain aaa.bbb.com has A record xxx.xxx.xxx.xxx (datacenter 1) and this record A is added manually. Some paths for this application are redirected by CloudFlare Page Rules to IP yyy.yyy.yyy.yyy (datacenter 2 - where Kubernetes cluster works with external-dns)

Our monitoring notified us about some problems with service aaa.bbb.com. After debugging we saw a bad A record for aaa.bbb.com domain with value yyy.yyy.yyy.yyy and a new TXT record added by external-dns. Next we checked CloudFlare Audit Log and we saw a huge number of ADD operations in ZoneID aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa executed by external-dns. External-dns executed requests for all ADD domains which are used in ingresses, even for domains that did not have a TXT record specifying that this external-dns instance owns the domain.

In external-dns logs we found this line:

zone aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa lookup failed, Timeout

Below, there was an information about getting endpoints from ingresses and (which surprised us) creating A and TXT records for all domains used in ingresses.

What you expected to happen:

When zone lookup failed we should return an error and stop processing :

external-dns/provider/cloudflare/cloudflare.go

Lines 170 to 186 in dd870ae

    
           if len(p.zoneIDFilter.ZoneIDs) > 0 && p.zoneIDFilter.ZoneIDs[0] != "" { 
        
           	log.Debugln("zoneIDFilter configured. only looking up zone IDs defined") 
        
           	for _, zoneID := range p.zoneIDFilter.ZoneIDs { 
        
           		log.Debugf("looking up zone %s", zoneID) 
        
           		detailResponse, err := p.Client.ZoneDetails(ctx, zoneID) 
        
           		if err != nil { 
        
           			log.Errorf("zone %s lookup failed, %v", zoneID, err) 
        
           			continue 
        
           		} 
        
           		log.WithFields(log.Fields{ 
        
           			"zoneName": detailResponse.Name, 
        
           			"zoneID":   detailResponse.ID, 
        
           		}).Debugln("adding zone for consideration") 
        
           		result = append(result, detailResponse) 
        
           	} 
        
           	return result, nil 
        
           }

How to reproduce it (as minimally and precisely as possible):

set configuration as shown in Environment
simulate and error (for instance, a timeout) for a request:

external-dns/provider/cloudflare/cloudflare.go

Line 174 in dd870ae

detailResponse, err := p.Client.ZoneDetails(ctx, zoneID)

Anything else we need to know?:

Seems like the problem can be solved by simple change:

from:

detailResponse, err := p.Client.ZoneDetails(ctx, zoneID)
if err != nil {
   log.Errorf("zone %s lookup failed, %v", zoneID, err)
   continue
}

to

detailResponse, err := p.Client.ZoneDetails(ctx, zoneID)
if err != nil {
   log.Errorf("zone %s lookup failed, %v", zoneID, err)
   return result, err
}

Environment:

External-DNS version: 0.7.6
DNS provider: cloudflare
Others:

--source=ingress
--domain-filter=aaa.bbb.com
--provider=cloudflare
--cloudflare-proxied
--zone-id-filter=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
--interval=60m
--events

The text was updated successfully, but these errors were encountered:

mateusz-jablonski94 · 2022-03-23T09:08:51Z

PR with a suggested fix can be found here: #2662

k8s-triage-robot · 2022-06-21T09:24:38Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

…e-zone-lookup

k8s-triage-robot · 2022-07-21T09:54:05Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-08-20T10:51:03Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-08-20T10:51:14Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mateusz-jablonski94 · 2022-09-05T11:56:31Z

/remove-lifecycle rotten

mateusz-jablonski94 · 2022-09-05T11:56:41Z

/reopen

k8s-ci-robot · 2022-09-05T11:56:44Z

@mateusz-jablonski94: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…e-zone-lookup

k8s-triage-robot · 2022-12-04T11:58:12Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

…ookup stop processing after zone lookup failed

mateusz-jablonski94 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 18, 2022

mateusz-jablonski94 mentioned this issue Mar 23, 2022

stop processing after zone lookup failed #2662

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2022

mateusz-jablonski94 added a commit to mateusz-jablonski94/external-dns that referenced this issue Jul 6, 2022

Merge branch 'kubernetes-sigs:master' into kubernetes-sigs#2610-handl…

5372829

…e-zone-lookup

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 21, 2022

k8s-ci-robot closed this as completed Aug 20, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 5, 2022

k8s-ci-robot reopened this Sep 5, 2022

mateusz-jablonski94 added a commit to mateusz-jablonski94/external-dns that referenced this issue Sep 5, 2022

Merge branch 'kubernetes-sigs:master' into kubernetes-sigs#2610-handl…

6c67358

…e-zone-lookup

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 4, 2022

k8s-ci-robot closed this as completed in #2662 Jan 2, 2023

k8s-ci-robot added a commit that referenced this issue Jan 2, 2023

Merge pull request #2662 from mateusz-jablonski94/#2610-handle-zone-l…

bfcd764

…ookup stop processing after zone lookup failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cloudflare provider] external-dns should stop processing when zone lookup fails #2610

[Cloudflare provider] external-dns should stop processing when zone lookup fails #2610

mateusz-jablonski94 commented Feb 18, 2022

mateusz-jablonski94 commented Mar 23, 2022

k8s-triage-robot commented Jun 21, 2022

k8s-triage-robot commented Jul 21, 2022

k8s-triage-robot commented Aug 20, 2022

k8s-ci-robot commented Aug 20, 2022

mateusz-jablonski94 commented Sep 5, 2022

mateusz-jablonski94 commented Sep 5, 2022

k8s-ci-robot commented Sep 5, 2022

k8s-triage-robot commented Dec 4, 2022

[Cloudflare provider] external-dns should stop processing when zone lookup fails #2610

[Cloudflare provider] external-dns should stop processing when zone lookup fails #2610

Comments

mateusz-jablonski94 commented Feb 18, 2022

mateusz-jablonski94 commented Mar 23, 2022

k8s-triage-robot commented Jun 21, 2022

k8s-triage-robot commented Jul 21, 2022

k8s-triage-robot commented Aug 20, 2022

k8s-ci-robot commented Aug 20, 2022

mateusz-jablonski94 commented Sep 5, 2022

mateusz-jablonski94 commented Sep 5, 2022

k8s-ci-robot commented Sep 5, 2022

k8s-triage-robot commented Dec 4, 2022