Different DNS resolution between startup and actual runtime #1104

noamelf · 2016-03-28T13:04:47Z

Problem

It seems that kong startup/migration is using different DNS resolution the actual run time (or Lua DNS resolution, I'm not sure what is doing the resolution). Example: I have an AWS internal hosted zone DNS by the name of qa2. If Kong config doesn't have the hosted zone suffix, e.g.:

contact_points:
    - "cassandra-01:9042"

Kong will start without any errors but when asked to answer a simple request, will fail:

$ curl http://localhost:8001/ 
"{message:An unexpected error occurred}"

The error log will show that it can't find the DB instance:

2016/03/28 11:18:47 [error] 114#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-04 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer

Nevertheless, if I use a "proper" DNS name, everything works:

contact_points:
    - "cassandra-01.qa2:9042"

Like I mentioned before, it seems that the different part of kong are using different DNS resolution methods, that produce different behaviour. This is fairly confusing since you expect kong to fail from the beginning if the Cassandra DNS is bad.

Versions

Kong on docker: v0.7.0
Cassandra: v2.2.4

The text was updated successfully, but these errors were encountered:

thibaultcha · 2016-03-28T17:32:08Z

Without investigation, the reason that seems obvious at first sight would be dnsmasq. Kong manages dnsmasq along Nginx. In the CLI, dnsmasq is not started, but once Kong started, dnsmasq is used as the resolver. From there, I suggest different approaches to try to fix this:

configure dnsmasq so that it can also resolve your AWS zone
disable dnsmasq and enter your resolver's address in kong.yml (probably easier)

subnetmarco · 2016-03-28T19:05:26Z

To better investigate this, you can use the following command to check how the address is being resolved by a DNS server:

$ nslookup -port={dns_server_port} {address_to_resolve} {dns_server_address}

like

$ nslookup -port=53 google.com 8.8.8.8

Now, you can start kong/dnsmasq and try to see what's the output of:

$ nslookup -port=8053 cassandra-01 127.0.0.1

Port 8053 is the default port that Kong uses for dnsmasq. Dnsmasq should follow the local resolution settings, so the output of these two commands should be the same:

$ nslookup -port=8053 cassandra-01 127.0.0.1 # Using dnsmasq
$ nslookup cassandra-01 # Using system settings

It seems like in your environment this is not the case, so we can see the output of these commands and take it from there.

noamelf · 2016-03-30T13:24:42Z

Ok, so I'm running following commands from within the Kong Docker container, and surprisingly they both show the correct address:

$ yum install bind-utils
....

$ nslookup cassandra-01
Server:         10.0.0.2
Address:        10.0.0.2#53

Non-authoritative answer:
Name:   cassandra-01.qa2
Address: 10.0.1.56

$ nslookup -port=8053 cassandra-01 127.0.0.1
Server:         127.0.0.1
Address:        127.0.0.1#8053

Non-authoritative answer:
Name:   cassandra-01.qa2
Address: 10.0.1.56

But Kong still doesn't resolve it correctly. error.log shows:

2016/03/30 12:55:09 [error] 115#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer

subnetmarco · 2016-03-30T21:44:55Z

So two notes:

The DNS resolution must happen somewhere else, or something weird is happening, especially if both dnsmasq and the system DNS resolutions are the same.
Re-reading your issue, we have two distinct problems:
- A 500 error when loading the Admin API on 127.0.0.1:8001.
- A problem in the async job cluster.lua when pinging Cassandra, which shouldn't affect loading 127.0.0.1:8001, unless the problem is the same.

Can you see if in the error.log file there are more errors after invoking http://127.0.0.1:8001/?

noamelf · 2016-03-31T20:59:28Z

Sure.
Just launching the container, without invoking any request (but waiting a few minutes) shows the following errors:

$ cat /usr/local/kong/logs/error.log
2016/03/31 20:50:35 [error] 114#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/03/31 20:50:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:50:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:50:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:07 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:15 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:23 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:31 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:07 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:15 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:23 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:31 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:53:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:53:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:53:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:54:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:54:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer

After making a request to http://127.0.0.1:8001/, the failed request also appears:

2016/03/31 20:56:41 [error] 114#0: *3 [lua] responses.lua:97: All hosts tried for query failed. cassandra-01:9042: Host considered DOWN., client: 127.0.0.1, server: , request: "GET / HTTP/1.1", host: "127.0.0.1:8001"

See this gist for the full error.log.

subnetmarco · 2016-04-01T18:42:52Z

What happens if you make a request to http://127.0.0.1:8001/apis ?

noamelf · 2016-04-04T08:38:02Z

I receive an error message, and the log shows:

2016/04/04 08:37:15 [error] 114#0: *6 [lua] responses.lua:97: handler(): Cassandra error: All hosts tried for query failed. cassandra-01:9042: Host considered DOWN., client: 172.17.0.1, server: , request: "GET /apis HTTP/1.1", host: "localhost:8001"

rileylark · 2016-04-05T21:14:09Z

I am seeing the exact same behavior, also running mashape/kong:0.7.0

From inside the container, I see

sh-4.2# nslookup cassandra
Server:     10.3.240.10
Address:    10.3.240.10#53

Name:   cassandra.default.svc.cluster.local
Address: 10.3.247.225

Kong starts up fine, correctly finding cassandra, but then the error log is full of similar errors to @noamelf :

2016/04/05 21:09:21 [error] 113#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/05 21:09:23 [error] 113#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/05 21:09:31 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:39 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:47 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:51 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/05 21:09:53 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/05 21:10:01 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer

rileylark · 2016-04-05T22:31:39Z

configure dnsmasq so that it can also resolve your AWS zone

disable dnsmasq and enter your resolver's address in kong.yml (probably easier)

I should note that I am trying to do the second thing, with the following config (full kong.yml at https://gist.github.com/rileylark/a22e2146b85dec3faa9a7c82162b3b29 )

dns_resolver: server
dns_resolvers_available:
  server:
    address: "10.0.0.10:53"
cassandra:
  contact_points:
    - "cassandra:9042"

With these settings, kong can find cassandra while it's joining the node, but cluster.lua and hooks.lua cannot find cassandra at runtime

subnetmarco · 2016-04-06T14:54:13Z

@rileylark ark in your case what's the output of:

$ nslookup -port=8053 cassandra 127.0.0.1

and is it the same as:

$ nslookup cassandra

?

rileylark · 2016-04-06T17:03:22Z

@thefosk Ah, of course I should have thought of that before. Sorry! So my case is different after all:

sh-4.2# nslookup -port=8053 cassandra 127.0.0.1
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; trying next origin
^C


sh-4.2# nslookup cassandra
Server:     10.3.240.10
Address:    10.3.240.10#53

Name:   cassandra.default.svc.cluster.local
Address: 10.3.249.185

sh-4.2#

So it feels like the kong.yml file is being respected at some levels of kong, but then deep down somewhere kong is still trying to resolve names at 127.0.0.1:8053 .

So, maybe to progress at this point, I should focus on configuring the local dnsmasq instead of trying to disable it and provide my own nameserver?

rileylark · 2016-04-06T18:02:09Z

Ok, I tried NOT providing a custom nameserver, and instead letting dnsmasq do the trick, and now I get results exactly like @noamelf.

kong.yml is exactly copy/pasted from exact copy/paste from https://github.com/Mashape/docker-kong/blob/master/config.docker/kong.yml :

    cassandra:
      contact_points:
        - "cassandra:9042"
    nginx: |
      [...snip...]

Docker container output:

[INFO] Kong 0.7.0
[INFO] Using configuration: /etc/kong/kong.yml
[INFO] Setting working directory to /usr/local/kong
[INFO] database...........cassandra keyspace=kong ssl=verify=false enabled=false replication_factor=1 contact_points=cassandra:9042 replication_strategy=SimpleStrategy timeout=5000 data_centers=
[INFO] dnsmasq............address=127.0.0.1:8053 dnsmasq=true port=8053
[INFO] serf ..............-profile=wan -rpc-addr=127.0.0.1:7373 -event-handler=member-join,member-leave,member-failed,member-update,member-reap,user:kong=/usr/local/kong/serf_event.sh -bind=0.0.0.0:7946 -node=kong-3591152686-oc2ik_0.0.0.0:7946 -log-level=err
[INFO] Trying to auto-join Kong nodes, please wait..
[INFO] Successfully auto-joined 10.0.1.4:7946

sh-4.2# nslookup cassandra
Server:     10.3.240.10
Address:    10.3.240.10#53

Name:   cassandra.default.svc.cluster.local
Address: 10.3.249.185

sh-4.2# nslookup -port=8053 cassandra 127.0.0.1
Server:     127.0.0.1
Address:    127.0.0.1#8053

Name:   cassandra.default.svc.cluster.local
Address: 10.3.249.185

sh-4.2#

sh-4.2# cat /usr/local/kong/logs/error.log 
2016/04/06 17:55:00 [error] 113#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/06 17:55:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:55:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:55:32 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:55:40 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:48 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:56 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:10 [error] 113#0: *3 [lua] hooks.lua:75: member_update(): Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/06 17:56:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:32 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:40 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:48 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:54 [error] 113#0: *4 [lua] responses.lua:97: handler(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 10.0.2.4, server: , request: "GET /apis HTTP/1.1", host: "kong:8001"
2016/04/06 17:56:56 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:57:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:57:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:58:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:58:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
sh-4.2#

subnetmarco · 2016-04-08T18:09:53Z

This is weird and I will need more time to replicate the issue and trying to figure it out.

rileylark · 2016-04-09T17:09:16Z

@thefosk Yesterday I got a vanilla nginx setup working in the exact same kubernetes cluster, which I mention just to confirm that nginx can work in this environment w/o any extra config.

If you have a Kubernetes cluster running already (easy to set up on Google Container Engine), you can get an exact repro of my situation from my yaml files at https://github.com/peardeck/kubernetes-boilerplate/tree/more-kong/kong

First, kubectl apply -f cassandra.yaml which will start a cassandra db and expose it at cassandra:9042. Wait for that to start successfully (~1m for me) and then run kubectl apply -f kong.yaml. From there I think you'll be able to see the behavior that Noam & I are seeing.

I'm happy to help however I can - let me know if you can think of ways for me to provide more info or debugging context!

minhchuduc · 2016-06-06T16:30:24Z

I got the same problem when running Kong-with-postgres-backend on Kubernetes.
My dns resolve is ok:

[root@kong-n6t3q /]# nslookup -port=8053 kong-database 127.0.0.1  
Server:         127.0.0.1
Address:        127.0.0.1#8053

Name:   kong-database.mailship.svc.cluster.local
Address: 10.233.30.140

[root@kong-n6t3q /]# nslookup kong-database                            
Server:         10.233.0.2
Address:        10.233.0.2#53

Non-authoritative answer:
Name:   kong-database.mailship.svc.cluster.local
Address: 10.233.30.140

but i still got error:

[root@kong-n6t3q /]# tail -f /usr/local/kong/logs/error.log      
2016/06/06 16:22:58 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:23:28 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:23:58 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:24:28 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
...
2016/06/06 16:28:59 [error] 270#0: *10 [lua] responses.lua:99: execute(): kong-database could not be resolved (3: Host not found), client: 10.3.18.104, server: _, request: "GET / HTTP/1.1", host: "10.3.18.104:32723"

Version:

Kong: v0.8.0
Postgres: v9.4

Set the database host if passed in the environment. This is useful for deploying this container out to Kubernetes where there is some issue between dnsmasq and the Kubernetes resolver when trying to access a service. Kong/kong#1104

rca · 2016-07-15T16:29:19Z

I ran into this problem as well trying to deploy Kong's official docker image onto a Kubernetes cluster. I work around this issue by using the fully-qualified Kubernetes service name in /etc/kong/kong.yml. For example, I set the environment in my Kubernetes deployment yml:

          env:
            - name: DATABASE
              value: postgres
            - name: DATABASE_HOST
              value: kong-database.zymbit.svc.cluster.local

And I have modified setup.sh to update kong.yml:

zymbit/docker-kong@514a0a2

The Automated Build image with this change is at:

https://hub.docker.com/r/zymbit/kong/

Please let me know if this is useful and I will open a PR.

Thanks!

rca · 2016-07-15T19:22:30Z

By the way, there is an issue specific to Ruby that is exposed when resolv.conf sets ndots > 1.

Kubernetes sets options ndots:5

Perhaps this is going on in the bowels of dnsmasq / lua?

https://bugzilla.redhat.com/show_bug.cgi?id=1132704

tvschuer · 2016-08-02T20:16:36Z

Thx, rca. I can confirm that your changes fix the issue we were having on kubernetes

thenayr · 2016-09-01T14:44:31Z

Encountered this issue in my Kubernetes clusters today as well.

After running in circles chasing dnsmasq, I finally gave up on trying to use the shorter DNS entry for my service my-service, and just ended up using the full thing my-service.default.svc.cluster.local

Nothing else I tried worked including pointing the resolver at kube-dns.kube-system.svc.cluster.local and a bunch of other things that I can't remember ;)

It's worth mentioning that Kong was running in the same namespace as the service I was trying to proxy to as well (default).

gavinzhou · 2016-09-01T14:50:13Z

@thenayr
I change go-dnsmasq in k8s.
maybe nginx resolver is problem.

you can try it

    spec:
      containers:
      ### https://github.com/kubernetes/kubernetes/issues/26309
      ### use go-dnsmasq in pod
      - name: go-dnsmasq
        image: "janeczku/go-dnsmasq:release-1.0.5"
        args:
          - --listen
          - "127.0.0.1:53"
          - --default-resolver
          - --append-search-domains
          - --hostsfile=/etc/hosts

thenayr · 2016-09-01T15:49:43Z

I'm running another Nginx proxy (with open-resty / Lua modules etc) and have 0 issues with Nginx DNS resolution inside Kubernets. All of my proxy_pass definitions are to the short DNS names. It's definitely specific to something with Kong.

nshttpd · 2016-09-16T03:08:25Z

Just came across this while looking into Kong as a possible solution for something. This sounds oddly familiar to a problem we had with running an open-resty solution for an api-proxy in a Kubernetes cluster. We ended solving it with having the nginx.conf be a jinja2 template that was parsed at start time (ugly, but solved other issues as well) that has :

    resolver {{ nameserver }} valid=300s ipv6=off;
    resolver_timeout 5s;

in the http block and the python start script that starts things up did a :

f = open('/etc/resolv.conf', 'r')
resolv = f.read()
f.close()

nameserver = None

for r in resolv.split('\n'):
  if r.startswith('nameserver'):
    nameserver = r.split(' ')[1]
    break

to set the nameserver in the config (along with a bunch of other environment specific vars) then used the output as the live nginx.conf for runtime. This worked fine for doing the short name looking http://appname work fine where the open-resty api-proxy would run in different namespaces and talk to the various microservice pods in the same namespace via the short name.

Not sure it that will help at all, but might be a start path to head down.

Tieske · 2016-09-16T08:51:21Z

In 0.10 we'll have our internal dns resolution, see #1587, used both from the cli and in nginx. So the inconsistencies should be resolved.

That branch is just up for review now and all tests pass. So if some of you are willing to give it a try, please do.

NOTE: see this code, the config read from resolv.conf is the nameservers, attempts, and timeout. So the shorter names with ndots options are not supported for now.

Really looking for feedback on this.

PS. to run the branch you'd need to manually install dns.lua which hasn't been published yet. See https://github.com/Mashape/dns.lua

rca · 2016-09-16T11:24:44Z

@Tieske, this sounds great. Any chance you can roll this up into a test image that I can docker pull?

subnetmarco · 2016-09-29T02:28:57Z

@rca I just pushed it at docker pull mashape/kong:dns

tvschuer · 2016-10-18T09:37:43Z

Anyone else having an issue running the kong:dns branch?
Just tried it on docker-cloud, got:
2016-10-18T09:33:28.296046329Z /docker-entrypoint.sh: line 9: exec: kong: not found 2016-10-18T09:35:53.260789876Z /docker-entrypoint.sh: line 9: exec: kong: not found

subnetmarco · 2016-10-19T17:43:50Z

It's mashape/kong:dns.

We use mashape/kong for our tests, while keeping the official kong image clean.

tvschuer · 2016-10-19T18:11:18Z

Am I missing something, this is the command I run:

$ run mashape/kong:dns
Unable to find image 'mashape/kong:dns' locally
dns: Pulling from mashape/kong
8d30e94188e7: Already exists 
73cc12610219: Already exists 
77d6169dca74: Already exists 
587d89d67788: Already exists 
Digest: sha256:0dea70bb87e4c51581874a4659f397c9cda2a0e4e4214f9e202ea6fbd7d90031
Status: Downloaded newer image for mashape/kong:dns
/docker-entrypoint.sh: line 9: exec: kong: not found`

kendavis2 · 2016-10-21T12:39:54Z

I am experiencing that same issue running it with docker-compose locally.

docker-compose up
Recreating dev_kong_1
Attaching to dev_kong_1
kong_1  | /docker-entrypoint.sh: line 9: exec: kong: not found
dev_kong_1 exited with code 127

derrley · 2016-10-28T00:45:36Z

Same issue, running mashape/kong:dns in kubernetes:


2016-10-28T00:43:45.958914991Z /docker-entrypoint.sh: line 9: exec: kong: not found

derrley · 2016-10-28T00:50:24Z

Pulling and running mashape/kong seems to have solved this problem.

derrley · 2016-10-28T01:05:09Z

... almost.

Now I've got no problem connecting to the postgres database, but now upstream forwarding doesn't work.

My client:

$ http http://192.168.99.100:32057/sbex1/v1/
HTTP/1.1 502 Bad Gateway
Connection: keep-alive
Content-Type: text/plain; charset=UTF-8
Date: Fri, 28 Oct 2016 00:56:04 GMT
Server: kong/0.9.0
Transfer-Encoding: chunked

An invalid response was received from the upstream server

In the Kong logs:

==> logs/error.log <==
2016/10/28 00:56:04 [error] 87#0: *43 sbex1-app could not be resolved (2: Server failure), client: 172.17.0.1, server: kong, request: "GET /sbex1/v1/ HTTP/1.1", host: "192.168.99.100:32057"
2016/10/28 00:56:04 [info] 87#0: *43 client 172.17.0.1 closed keepalive connection

But dnsmasq can see it --

<dwlo kong]# nslookup -port=8053 sbex1-app localhost
Server:     localhost
Address:    127.0.0.1#8053

Name:   sbex1-app.test.svc.cluster.local
Address: 10.0.0.78

I suspect this one's harder to get around -- does nginx, itself, need to be told to use dnsmaq?

subnetmarco · 2017-05-19T18:13:39Z

We have updated how we resolve DNS hostnames in > 0.10.x (and removed the dnsmasq dependency) - Can you try to replicate with 0.10.x and see if the problem has been fixed?

p0pr0ck5 · 2017-06-05T03:31:41Z

Any updates here, following the changes in 0.10.x?

derrley · 2017-06-05T03:56:57Z

We resolve DNS externally and only ever pass IPs to Kong, because of the various DNS issues we encountered previously. I'm not in a great position to test the fix right now, but we are not blocked by the issue that I reported back then.

…

On Sun, Jun 4, 2017 at 10:31 PM, Robert ***@***.***> wrote: Any updates here, following the changes in 0.10.x? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1104 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABLW3I-F8utSGMgpIL9siCeynlXQWMp2ks5sA3amgaJpZM4H51Ue> .

subnetmarco · 2017-07-25T03:08:56Z

@derrley understood - if you want to give it another try please let us know the outcome. Closing this issue for now, but happy to re-open it if Kong >= 0.10 still doesn't work.

thibaultcha added the question label Mar 29, 2016

subnetmarco self-assigned this Mar 30, 2016

subnetmarco added area/core and removed question labels Apr 8, 2016

thibaultcha added the task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not. label Sep 1, 2016

ismailbaskin mentioned this issue Nov 23, 2016

Crash on kong >v0.9.1 Kong/kong-dist-kubernetes#3

Closed

This was referenced Mar 22, 2017

Using kube-dns doesn't resolve correctly. Kong/kong-dist-kubernetes#6

Closed

“Ghost” kong instances never disappear and drastically slow down the startup process #2192

Closed

subnetmarco closed this as completed Jul 25, 2017

bhargav01 mentioned this issue Sep 25, 2018

An unexpected error occurred. Fails to resolve DNS. But a curl call directly to mockbin.org works #3799

Closed

Different DNS resolution between startup and actual runtime #1104

Different DNS resolution between startup and actual runtime #1104

Comments

noamelf commented Mar 28, 2016

Problem

Versions

thibaultcha commented Mar 28, 2016

subnetmarco commented Mar 28, 2016

noamelf commented Mar 30, 2016

subnetmarco commented Mar 30, 2016

noamelf commented Mar 31, 2016

subnetmarco commented Apr 1, 2016

noamelf commented Apr 4, 2016

rileylark commented Apr 5, 2016

rileylark commented Apr 5, 2016

subnetmarco commented Apr 6, 2016

rileylark commented Apr 6, 2016

rileylark commented Apr 6, 2016

subnetmarco commented Apr 8, 2016

rileylark commented Apr 9, 2016

minhchuduc commented Jun 6, 2016 • edited Loading

Version:

rca commented Jul 15, 2016

rca commented Jul 15, 2016 • edited Loading

tvschuer commented Aug 2, 2016

thenayr commented Sep 1, 2016 • edited Loading

gavinzhou commented Sep 1, 2016

thenayr commented Sep 1, 2016 • edited Loading

nshttpd commented Sep 16, 2016

Tieske commented Sep 16, 2016

rca commented Sep 16, 2016

subnetmarco commented Sep 29, 2016

tvschuer commented Oct 18, 2016

subnetmarco commented Oct 19, 2016

tvschuer commented Oct 19, 2016

kendavis2 commented Oct 21, 2016

derrley commented Oct 28, 2016

derrley commented Oct 28, 2016

derrley commented Oct 28, 2016

subnetmarco commented May 19, 2017

p0pr0ck5 commented Jun 5, 2017

derrley commented Jun 5, 2017 via email

subnetmarco commented Jul 25, 2017

minhchuduc commented Jun 6, 2016 •

edited

Loading

rca commented Jul 15, 2016 •

edited

Loading

thenayr commented Sep 1, 2016 •

edited

Loading

thenayr commented Sep 1, 2016 •

edited

Loading