Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different DNS resolution between startup and actual runtime #1104

Closed
noamelf opened this issue Mar 28, 2016 · 36 comments
Closed

Different DNS resolution between startup and actual runtime #1104

noamelf opened this issue Mar 28, 2016 · 36 comments
Assignees
Labels
task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not.

Comments

@noamelf
Copy link

noamelf commented Mar 28, 2016

Problem

It seems that kong startup/migration is using different DNS resolution the actual run time (or Lua DNS resolution, I'm not sure what is doing the resolution). Example: I have an AWS internal hosted zone DNS by the name of qa2. If Kong config doesn't have the hosted zone suffix, e.g.:

contact_points:
    - "cassandra-01:9042"

Kong will start without any errors but when asked to answer a simple request, will fail:

$ curl http://localhost:8001/ 
"{message:An unexpected error occurred}"

The error log will show that it can't find the DB instance:

2016/03/28 11:18:47 [error] 114#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-04 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer

Nevertheless, if I use a "proper" DNS name, everything works:

contact_points:
    - "cassandra-01.qa2:9042"

Like I mentioned before, it seems that the different part of kong are using different DNS resolution methods, that produce different behaviour. This is fairly confusing since you expect kong to fail from the beginning if the Cassandra DNS is bad.

Versions

Kong on docker: v0.7.0
Cassandra: v2.2.4

@thibaultcha
Copy link
Member

Without investigation, the reason that seems obvious at first sight would be dnsmasq. Kong manages dnsmasq along Nginx. In the CLI, dnsmasq is not started, but once Kong started, dnsmasq is used as the resolver. From there, I suggest different approaches to try to fix this:

  • configure dnsmasq so that it can also resolve your AWS zone
  • disable dnsmasq and enter your resolver's address in kong.yml (probably easier)

@subnetmarco
Copy link
Member

To better investigate this, you can use the following command to check how the address is being resolved by a DNS server:

$ nslookup -port={dns_server_port} {address_to_resolve} {dns_server_address}

like

$ nslookup -port=53 google.com 8.8.8.8

Now, you can start kong/dnsmasq and try to see what's the output of:

$ nslookup -port=8053 cassandra-01 127.0.0.1

Port 8053 is the default port that Kong uses for dnsmasq. Dnsmasq should follow the local resolution settings, so the output of these two commands should be the same:

$ nslookup -port=8053 cassandra-01 127.0.0.1 # Using dnsmasq
$ nslookup cassandra-01 # Using system settings

It seems like in your environment this is not the case, so we can see the output of these commands and take it from there.

@noamelf
Copy link
Author

noamelf commented Mar 30, 2016

Ok, so I'm running following commands from within the Kong Docker container, and surprisingly they both show the correct address:

$ yum install bind-utils
....

$ nslookup cassandra-01
Server:         10.0.0.2
Address:        10.0.0.2#53

Non-authoritative answer:
Name:   cassandra-01.qa2
Address: 10.0.1.56

$ nslookup -port=8053 cassandra-01 127.0.0.1
Server:         127.0.0.1
Address:        127.0.0.1#8053

Non-authoritative answer:
Name:   cassandra-01.qa2
Address: 10.0.1.56

But Kong still doesn't resolve it correctly. error.log shows:

2016/03/30 12:55:09 [error] 115#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer

@subnetmarco
Copy link
Member

So two notes:

  • The DNS resolution must happen somewhere else, or something weird is happening, especially if both dnsmasq and the system DNS resolutions are the same.
  • Re-reading your issue, we have two distinct problems:
    • A 500 error when loading the Admin API on 127.0.0.1:8001.
    • A problem in the async job cluster.lua when pinging Cassandra, which shouldn't affect loading 127.0.0.1:8001, unless the problem is the same.

Can you see if in the error.log file there are more errors after invoking http://127.0.0.1:8001/?

@subnetmarco subnetmarco self-assigned this Mar 30, 2016
@noamelf
Copy link
Author

noamelf commented Mar 31, 2016

Sure.
Just launching the container, without invoking any request (but waiting a few minutes) shows the following errors:

$ cat /usr/local/kong/logs/error.log
2016/03/31 20:50:35 [error] 114#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/03/31 20:50:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:50:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:50:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:07 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:15 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:23 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:31 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:07 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:15 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:23 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:31 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:53:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:53:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:53:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:54:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:54:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer

After making a request to http://127.0.0.1:8001/, the failed request also appears:

2016/03/31 20:56:41 [error] 114#0: *3 [lua] responses.lua:97: All hosts tried for query failed. cassandra-01:9042: Host considered DOWN., client: 127.0.0.1, server: , request: "GET / HTTP/1.1", host: "127.0.0.1:8001"

See this gist for the full error.log.

@subnetmarco
Copy link
Member

What happens if you make a request to http://127.0.0.1:8001/apis ?

@noamelf
Copy link
Author

noamelf commented Apr 4, 2016

I receive an error message, and the log shows:

2016/04/04 08:37:15 [error] 114#0: *6 [lua] responses.lua:97: handler(): Cassandra error: All hosts tried for query failed. cassandra-01:9042: Host considered DOWN., client: 172.17.0.1, server: , request: "GET /apis HTTP/1.1", host: "localhost:8001"

@rileylark
Copy link

I am seeing the exact same behavior, also running mashape/kong:0.7.0

From inside the container, I see

sh-4.2# nslookup cassandra
Server:     10.3.240.10
Address:    10.3.240.10#53

Name:   cassandra.default.svc.cluster.local
Address: 10.3.247.225

Kong starts up fine, correctly finding cassandra, but then the error log is full of similar errors to @noamelf :

2016/04/05 21:09:21 [error] 113#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/05 21:09:23 [error] 113#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/05 21:09:31 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:39 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:47 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:51 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/05 21:09:53 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/05 21:10:01 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer

@rileylark
Copy link

  • configure dnsmasq so that it can also resolve your AWS zone
  • disable dnsmasq and enter your resolver's address in kong.yml (probably easier)

I should note that I am trying to do the second thing, with the following config (full kong.yml at https://gist.github.com/rileylark/a22e2146b85dec3faa9a7c82162b3b29 )

dns_resolver: server
dns_resolvers_available:
  server:
    address: "10.0.0.10:53"
cassandra:
  contact_points:
    - "cassandra:9042"

With these settings, kong can find cassandra while it's joining the node, but cluster.lua and hooks.lua cannot find cassandra at runtime

@subnetmarco
Copy link
Member

@rileylark ark in your case what's the output of:

$ nslookup -port=8053 cassandra 127.0.0.1

and is it the same as:

$ nslookup cassandra

?

@rileylark
Copy link

@thefosk Ah, of course I should have thought of that before. Sorry! So my case is different after all:

sh-4.2# nslookup -port=8053 cassandra 127.0.0.1
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; trying next origin
^C


sh-4.2# nslookup cassandra
Server:     10.3.240.10
Address:    10.3.240.10#53

Name:   cassandra.default.svc.cluster.local
Address: 10.3.249.185

sh-4.2# 

So it feels like the kong.yml file is being respected at some levels of kong, but then deep down somewhere kong is still trying to resolve names at 127.0.0.1:8053 .

So, maybe to progress at this point, I should focus on configuring the local dnsmasq instead of trying to disable it and provide my own nameserver?

@rileylark
Copy link

Ok, I tried NOT providing a custom nameserver, and instead letting dnsmasq do the trick, and now I get results exactly like @noamelf.

kong.yml is exactly copy/pasted from exact copy/paste from https://github.com/Mashape/docker-kong/blob/master/config.docker/kong.yml :

    cassandra:
      contact_points:
        - "cassandra:9042"
    nginx: |
      [...snip...]

Docker container output:

[INFO] Kong 0.7.0
[INFO] Using configuration: /etc/kong/kong.yml
[INFO] Setting working directory to /usr/local/kong
[INFO] database...........cassandra keyspace=kong ssl=verify=false enabled=false replication_factor=1 contact_points=cassandra:9042 replication_strategy=SimpleStrategy timeout=5000 data_centers=
[INFO] dnsmasq............address=127.0.0.1:8053 dnsmasq=true port=8053
[INFO] serf ..............-profile=wan -rpc-addr=127.0.0.1:7373 -event-handler=member-join,member-leave,member-failed,member-update,member-reap,user:kong=/usr/local/kong/serf_event.sh -bind=0.0.0.0:7946 -node=kong-3591152686-oc2ik_0.0.0.0:7946 -log-level=err
[INFO] Trying to auto-join Kong nodes, please wait..
[INFO] Successfully auto-joined 10.0.1.4:7946
sh-4.2# nslookup cassandra
Server:     10.3.240.10
Address:    10.3.240.10#53

Name:   cassandra.default.svc.cluster.local
Address: 10.3.249.185

sh-4.2# nslookup -port=8053 cassandra 127.0.0.1
Server:     127.0.0.1
Address:    127.0.0.1#8053

Name:   cassandra.default.svc.cluster.local
Address: 10.3.249.185

sh-4.2# 
sh-4.2# cat /usr/local/kong/logs/error.log 
2016/04/06 17:55:00 [error] 113#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/06 17:55:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:55:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:55:32 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:55:40 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:48 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:56 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:10 [error] 113#0: *3 [lua] hooks.lua:75: member_update(): Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/06 17:56:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:32 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:40 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:48 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:54 [error] 113#0: *4 [lua] responses.lua:97: handler(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 10.0.2.4, server: , request: "GET /apis HTTP/1.1", host: "kong:8001"
2016/04/06 17:56:56 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:57:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:57:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:58:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:58:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
sh-4.2# 

@subnetmarco
Copy link
Member

This is weird and I will need more time to replicate the issue and trying to figure it out.

@rileylark
Copy link

@thefosk Yesterday I got a vanilla nginx setup working in the exact same kubernetes cluster, which I mention just to confirm that nginx can work in this environment w/o any extra config.

If you have a Kubernetes cluster running already (easy to set up on Google Container Engine), you can get an exact repro of my situation from my yaml files at https://github.com/peardeck/kubernetes-boilerplate/tree/more-kong/kong

First, kubectl apply -f cassandra.yaml which will start a cassandra db and expose it at cassandra:9042. Wait for that to start successfully (~1m for me) and then run kubectl apply -f kong.yaml. From there I think you'll be able to see the behavior that Noam & I are seeing.

I'm happy to help however I can - let me know if you can think of ways for me to provide more info or debugging context!

@minhchuduc
Copy link

minhchuduc commented Jun 6, 2016

I got the same problem when running Kong-with-postgres-backend on Kubernetes.
My dns resolve is ok:

[root@kong-n6t3q /]# nslookup -port=8053 kong-database 127.0.0.1  
Server:         127.0.0.1
Address:        127.0.0.1#8053

Name:   kong-database.mailship.svc.cluster.local
Address: 10.233.30.140

[root@kong-n6t3q /]# nslookup kong-database                            
Server:         10.233.0.2
Address:        10.233.0.2#53

Non-authoritative answer:
Name:   kong-database.mailship.svc.cluster.local
Address: 10.233.30.140

but i still got error:

[root@kong-n6t3q /]# tail -f /usr/local/kong/logs/error.log      
2016/06/06 16:22:58 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:23:28 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:23:58 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:24:28 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
...
2016/06/06 16:28:59 [error] 270#0: *10 [lua] responses.lua:99: execute(): kong-database could not be resolved (3: Host not found), client: 10.3.18.104, server: _, request: "GET / HTTP/1.1", host: "10.3.18.104:32723"

Version:

Kong: v0.8.0
Postgres: v9.4

rca added a commit to zymbit/docker-kong that referenced this issue Jul 15, 2016
Set the database host if passed in the environment.  This is useful for
deploying this container out to Kubernetes where there is some issue
between dnsmasq and the Kubernetes resolver when trying to access a
service.

Kong/kong#1104
@rca
Copy link

rca commented Jul 15, 2016

I ran into this problem as well trying to deploy Kong's official docker image onto a Kubernetes cluster. I work around this issue by using the fully-qualified Kubernetes service name in /etc/kong/kong.yml. For example, I set the environment in my Kubernetes deployment yml:

          env:
            - name: DATABASE
              value: postgres
            - name: DATABASE_HOST
              value: kong-database.zymbit.svc.cluster.local

And I have modified setup.sh to update kong.yml:

zymbit/docker-kong@514a0a2

The Automated Build image with this change is at:

https://hub.docker.com/r/zymbit/kong/

Please let me know if this is useful and I will open a PR.

Thanks!

@rca
Copy link

rca commented Jul 15, 2016

By the way, there is an issue specific to Ruby that is exposed when resolv.conf sets ndots > 1.

Kubernetes sets options ndots:5

Perhaps this is going on in the bowels of dnsmasq / lua?

https://bugzilla.redhat.com/show_bug.cgi?id=1132704

@tvschuer
Copy link

tvschuer commented Aug 2, 2016

Thx, rca. I can confirm that your changes fix the issue we were having on kubernetes

@thenayr
Copy link

thenayr commented Sep 1, 2016

Encountered this issue in my Kubernetes clusters today as well.

After running in circles chasing dnsmasq, I finally gave up on trying to use the shorter DNS entry for my service my-service, and just ended up using the full thing my-service.default.svc.cluster.local

Nothing else I tried worked including pointing the resolver at kube-dns.kube-system.svc.cluster.local and a bunch of other things that I can't remember ;)

It's worth mentioning that Kong was running in the same namespace as the service I was trying to proxy to as well (default).

@gavinzhou
Copy link

@thenayr
I change go-dnsmasq in k8s.
maybe nginx resolver is problem.

you can try it

    spec:
      containers:
      ### https://github.com/kubernetes/kubernetes/issues/26309
      ### use go-dnsmasq in pod
      - name: go-dnsmasq
        image: "janeczku/go-dnsmasq:release-1.0.5"
        args:
          - --listen
          - "127.0.0.1:53"
          - --default-resolver
          - --append-search-domains
          - --hostsfile=/etc/hosts

@thenayr
Copy link

thenayr commented Sep 1, 2016

I'm running another Nginx proxy (with open-resty / Lua modules etc) and have 0 issues with Nginx DNS resolution inside Kubernets. All of my proxy_pass definitions are to the short DNS names. It's definitely specific to something with Kong.

@thibaultcha thibaultcha added the task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not. label Sep 1, 2016
@nshttpd
Copy link

nshttpd commented Sep 16, 2016

Just came across this while looking into Kong as a possible solution for something. This sounds oddly familiar to a problem we had with running an open-resty solution for an api-proxy in a Kubernetes cluster. We ended solving it with having the nginx.conf be a jinja2 template that was parsed at start time (ugly, but solved other issues as well) that has :

    resolver {{ nameserver }} valid=300s ipv6=off;
    resolver_timeout 5s;

in the http block and the python start script that starts things up did a :

f = open('/etc/resolv.conf', 'r')
resolv = f.read()
f.close()

nameserver = None

for r in resolv.split('\n'):
  if r.startswith('nameserver'):
    nameserver = r.split(' ')[1]
    break

to set the nameserver in the config (along with a bunch of other environment specific vars) then used the output as the live nginx.conf for runtime. This worked fine for doing the short name looking http://appname work fine where the open-resty api-proxy would run in different namespaces and talk to the various microservice pods in the same namespace via the short name.

Not sure it that will help at all, but might be a start path to head down.

@Tieske
Copy link
Member

Tieske commented Sep 16, 2016

In 0.10 we'll have our internal dns resolution, see #1587, used both from the cli and in nginx. So the inconsistencies should be resolved.

That branch is just up for review now and all tests pass. So if some of you are willing to give it a try, please do.

NOTE: see this code, the config read from resolv.conf is the nameservers, attempts, and timeout. So the shorter names with ndots options are not supported for now.

Really looking for feedback on this.

PS. to run the branch you'd need to manually install dns.lua which hasn't been published yet. See https://github.com/Mashape/dns.lua

@rca
Copy link

rca commented Sep 16, 2016

@Tieske, this sounds great. Any chance you can roll this up into a test image that I can docker pull?

@subnetmarco
Copy link
Member

@rca I just pushed it at docker pull mashape/kong:dns

@tvschuer
Copy link

Anyone else having an issue running the kong:dns branch?
Just tried it on docker-cloud, got:
2016-10-18T09:33:28.296046329Z /docker-entrypoint.sh: line 9: exec: kong: not found 2016-10-18T09:35:53.260789876Z /docker-entrypoint.sh: line 9: exec: kong: not found

@subnetmarco
Copy link
Member

It's mashape/kong:dns.

We use mashape/kong for our tests, while keeping the official kong image clean.

@tvschuer
Copy link

Am I missing something, this is the command I run:

$ run mashape/kong:dns
Unable to find image 'mashape/kong:dns' locally
dns: Pulling from mashape/kong
8d30e94188e7: Already exists 
73cc12610219: Already exists 
77d6169dca74: Already exists 
587d89d67788: Already exists 
Digest: sha256:0dea70bb87e4c51581874a4659f397c9cda2a0e4e4214f9e202ea6fbd7d90031
Status: Downloaded newer image for mashape/kong:dns
/docker-entrypoint.sh: line 9: exec: kong: not found`

@kendavis2
Copy link

I am experiencing that same issue running it with docker-compose locally.

docker-compose up
Recreating dev_kong_1
Attaching to dev_kong_1
kong_1  | /docker-entrypoint.sh: line 9: exec: kong: not found
dev_kong_1 exited with code 127

@derrley
Copy link

derrley commented Oct 28, 2016

Same issue, running mashape/kong:dns in kubernetes:


2016-10-28T00:43:45.958914991Z /docker-entrypoint.sh: line 9: exec: kong: not found

@derrley
Copy link

derrley commented Oct 28, 2016

Pulling and running mashape/kong seems to have solved this problem.

@derrley
Copy link

derrley commented Oct 28, 2016

... almost.

Now I've got no problem connecting to the postgres database, but now upstream forwarding doesn't work.

My client:

$ http http://192.168.99.100:32057/sbex1/v1/
HTTP/1.1 502 Bad Gateway
Connection: keep-alive
Content-Type: text/plain; charset=UTF-8
Date: Fri, 28 Oct 2016 00:56:04 GMT
Server: kong/0.9.0
Transfer-Encoding: chunked

An invalid response was received from the upstream server

In the Kong logs:

==> logs/error.log <==
2016/10/28 00:56:04 [error] 87#0: *43 sbex1-app could not be resolved (2: Server failure), client: 172.17.0.1, server: kong, request: "GET /sbex1/v1/ HTTP/1.1", host: "192.168.99.100:32057"
2016/10/28 00:56:04 [info] 87#0: *43 client 172.17.0.1 closed keepalive connection

But dnsmasq can see it --

<dwlo kong]# nslookup -port=8053 sbex1-app localhost
Server:     localhost
Address:    127.0.0.1#8053

Name:   sbex1-app.test.svc.cluster.local
Address: 10.0.0.78

I suspect this one's harder to get around -- does nginx, itself, need to be told to use dnsmaq?

@subnetmarco
Copy link
Member

We have updated how we resolve DNS hostnames in > 0.10.x (and removed the dnsmasq dependency) - Can you try to replicate with 0.10.x and see if the problem has been fixed?

@p0pr0ck5
Copy link
Contributor

p0pr0ck5 commented Jun 5, 2017

Any updates here, following the changes in 0.10.x?

@derrley
Copy link

derrley commented Jun 5, 2017 via email

@subnetmarco
Copy link
Member

@derrley understood - if you want to give it another try please let us know the outcome. Closing this issue for now, but happy to re-open it if Kong >= 0.10 still doesn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not.
Projects
None yet
Development

No branches or pull requests