Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Connection: close" header leads to unstable instance #6799

Closed
cubic3d opened this issue Nov 26, 2018 · 7 comments · Fixed by #6814
Closed

"Connection: close" header leads to unstable instance #6799

cubic3d opened this issue Nov 26, 2018 · 7 comments · Fixed by #6814
Labels
area/api REST API bug Something isn't working
Milestone

Comments

@cubic3d
Copy link

cubic3d commented Nov 26, 2018

This is a continuation from #6514.

Using the sequential curl requests with "Connection: close" header seems to work, but after firing up our API client which uses threaded job handling results in some errors again. If I don't send the "Connection: close" header, everything is fine (at least for 2.10.2, on some 2.8.x it was the exact opposite, that's why we changed the behaviour).

Here is the debug log:

[2018-11-19 10:10:32 +0100] information/WorkQueue: #36 (HttpServerConnection) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-19 10:10:33 +0100] notice/ThreadPool: Thread pool; current: 6; adjustment: 1
[2018-11-19 10:10:33 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 20; Pool utilization: 72.0427%
[2018-11-19 10:10:34 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:10:34 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:34 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:10:34 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:10:39 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:44 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:10:44 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:10:44 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:44 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:10:45 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00714326%
[2018-11-19 10:10:48 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 20; Pool utilization: 75.2167%
[2018-11-19 10:10:49 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:53540 (no client certificate)
[2018-11-19 10:10:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:53538 (no client certificate)
[2018-11-19 10:10:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:53532 (no client certificate)
[2018-11-19 10:10:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:53548 (no client certificate)
[2018-11-19 10:10:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:53540), user: [ REMOVED ])
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:10:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:53538), user: [ REMOVED ])
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:53532), user: [ REMOVED ])
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:10:52 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53540)
[2018-11-19 10:10:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:53548), user: [ REMOVED ])
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53548)
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53548)
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53538)
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] notice/ThreadPool: Thread pool; current: 7; adjustment: -1
[2018-11-19 10:10:52 +0100] notice/ThreadPool: Thread pool; current: 7; adjustment: -2
[2018-11-19 10:10:52 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: -2
[2018-11-19 10:10:53 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: -2
[2018-11-19 10:10:53 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: -2
[2018-11-19 10:10:54 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: -2
[2018-11-19 10:10:54 +0100] notice/ThreadPool: Thread pool; current: 6; adjustment: 1
[2018-11-19 10:10:54 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:10:54 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:54 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:10:54 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:10:55 +0100] notice/ThreadPool: Thread pool; current: 5; adjustment: 1
[2018-11-19 10:10:58 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: 1
[2018-11-19 10:10:59 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:00 +0100] notice/ThreadPool: Thread pool; current: 7; adjustment: 1
[2018-11-19 10:11:00 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00333945%
[2018-11-19 10:11:01 +0100] notice/ThreadPool: Thread pool; current: 10; adjustment: 1
[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53528): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53526): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53530): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53534): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53536): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53542): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53544): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53546): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53550): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] information/WorkQueue: #41 (HttpServerConnection) items: 0, rate: 0.0166667/s (1/min 1/5min 1/15min);
[2018-11-19 10:11:02 +0100] information/WorkQueue: #42 (HttpServerConnection) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-19 10:11:03 +0100] information/HttpServerConnection: No messages for Http connection have been received in the last 10 seconds.
[2018-11-19 10:11:03 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53532)
[2018-11-19 10:11:03 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53532)
[2018-11-19 10:11:03 +0100] notice/ThreadPool: Thread pool; current: 6; adjustment: 1
[2018-11-19 10:11:03 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 33; Pool utilization: 73.1336%
[2018-11-19 10:11:04 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:04 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:04 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:04 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:08 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: 1
[2018-11-19 10:11:11 +0100] notice/ThreadPool: Thread pool; current: 1; adjustment: 2
[2018-11-19 10:11:11 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:11 +0100] notice/ThreadPool: Thread pool; current: 3; adjustment: -1
[2018-11-19 10:11:14 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:14 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:14 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:15 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 11.4771%
[2018-11-19 10:11:16 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:18 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 35; Pool utilization: 70.8839%
[2018-11-19 10:11:21 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:55174 (no client certificate)
[2018-11-19 10:11:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:55176 (no client certificate)
[2018-11-19 10:11:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:55190 (no client certificate)
[2018-11-19 10:11:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result?host=[ REMOVED ] (from [127.0.0.1]:55174), user: [ REMOVED ])
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:11:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:55192 (no client certificate)
[2018-11-19 10:11:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:55176), user: [ REMOVED ])
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:11:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:55190), user: [ REMOVED ])
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:55190)
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:55190)
[2018-11-19 10:11:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:55192), user: [ REMOVED ])
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:55176)
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:55192)
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] notice/ThreadPool: Thread pool; current: 13; adjustment: -2
[2018-11-19 10:11:22 +0100] notice/ThreadPool: Thread pool; current: 12; adjustment: -2
[2018-11-19 10:11:22 +0100] notice/ThreadPool: Thread pool; current: 10; adjustment: -2
[2018-11-19 10:11:23 +0100] notice/ThreadPool: Thread pool; current: 10; adjustment: -2
[2018-11-19 10:11:24 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:24 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:24 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:26 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:26 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: 1
[2018-11-19 10:11:28 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: 1
[2018-11-19 10:11:28 +0100] notice/ThreadPool: Thread pool; current: 8; adjustment: 1
[2018-11-19 10:11:29 +0100] notice/ThreadPool: Thread pool; current: 10; adjustment: 1
[2018-11-19 10:11:30 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.4596%
[2018-11-19 10:11:31 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55172): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55178): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55180): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55182): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55184): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55186): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55188): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55194): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55196): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] information/WorkQueue: #43 (HttpServerConnection) items: 0, rate: 0.0166667/s (1/min 1/5min 1/15min);
[2018-11-19 10:11:32 +0100] information/WorkQueue: #45 (HttpServerConnection) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-19 10:11:33 +0100] notice/ThreadPool: Thread pool; current: 12; adjustment: 1
[2018-11-19 10:11:33 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 46; Pool utilization: 74.4609%
[2018-11-19 10:11:34 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:34 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:34 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:36 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:36 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: 1
[2018-11-19 10:11:41 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:44 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:44 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:44 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:45 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5044%
[2018-11-19 10:11:46 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:48 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 47; Pool utilization: 74.652%
[2018-11-19 10:11:51 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:57286 (no client certificate)
[2018-11-19 10:11:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:57286), user: [ REMOVED ])
[2018-11-19 10:11:52 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:52 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:11:52 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:11:52 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:11:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:57286)
[2018-11-19 10:11:52 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:52 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: -2
[2018-11-19 10:11:52 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: -2
[2018-11-19 10:11:54 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:54 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:54 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:56 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:56 +0100] notice/ThreadPool: Thread pool; current: 14; adjustment: 1
[2018-11-19 10:11:56 +0100] notice/ThreadPool: Thread pool; current: 14; adjustment: 1
[2018-11-19 10:11:59 +0100] notice/ThreadPool: Thread pool; current: 15; adjustment: 1
[2018-11-19 10:11:59 +0100] notice/ThreadPool: Thread pool; current: 15; adjustment: 1
[2018-11-19 10:12:00 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5037%
[2018-11-19 10:12:01 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57278): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57280): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57276): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57282): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57284): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57288): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57290): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57292): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57294): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57296): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57300): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57298): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:03 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: 1
[2018-11-19 10:12:03 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 60; Pool utilization: 76.3517%
[2018-11-19 10:12:04 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:12:04 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:12:04 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:12:06 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:06 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:06 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:07 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:07 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:07 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:07 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:08 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:08 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:08 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:08 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:09 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:09 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:09 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:09 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:10 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:10 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:10 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:10 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:11 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:11 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:11 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:11 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:11 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:12 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:12 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:12 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:12 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:13 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:13 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:13 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:13 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:12:14 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:12:14 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5047%
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:16 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:16 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:16 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:16 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:16 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:17 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:17 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:17 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:17 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 60; Pool utilization: 78.0499%
[2018-11-19 10:12:19 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:19 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:19 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:19 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:20 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:20 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:20 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:20 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
^C[2018-11-19 10:12:21 +0100] information/Application: Received request to shut down.
[2018-11-19 10:12:21 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:21 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:21 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:21 +0100] information/Application: Shutting down...
[2018-11-19 10:12:21 +0100] information/ApiListener: 'api' stopped.
[2018-11-19 10:12:21 +0100] information/CheckerComponent: 'checker' stopped.
[2018-11-19 10:12:21 +0100] information/ExternalCommandListener: 'command' stopped.
[2018-11-19 10:12:21 +0100] information/IdoPgsqlConnection: 'ido-pgsql' paused.
[2018-11-19 10:12:21 +0100] information/DbConnection: Pausing IDO connection: ido-pgsql
[2018-11-19 10:12:21 +0100] information/DbConnection: 'ido-pgsql' stopped.
[2018-11-19 10:12:21 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2018-11-19 10:12:21 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:21 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:22 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:22 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:59916 (no client certificate)
[2018-11-19 10:12:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:59932 (no client certificate)
[2018-11-19 10:12:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:12:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:59916), user: [ REMOVED ])
[2018-11-19 10:12:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:12:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:12:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:12:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:59932), user: [ REMOVED ])
[2018-11-19 10:12:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:12:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:12:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:59916)
[2018-11-19 10:12:22 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:12:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:59932)
[2018-11-19 10:12:22 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:12:22 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:22 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
[2018-11-19 10:12:30 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
[2018-11-19 10:12:30 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:30 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
^C

Sometimes there are two entries for a disconnect with the same source port:

[2018-11-19 10:29:00 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:43340), user: [ REMOVED ])
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:43340)
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:43340)
[2018-11-19 10:29:00 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:43344), user: [ REMOVED ])
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:43344)
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:43344)

Log from client side:

2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I see some strange behaviour there:

  • The thread pool seems to be rising up to 16 and then stalling completely.
  • After first TLS handshake fails, Icinga gets in an unstable state. Every subsequent HTTP request will timeout, doesn't matter if authorised or unauthorised with or without the "Connection: close" header (tested with curl).
  • On a SIGHUP Icinga is not able to shutdown in unstable state (see logs).
  • Looks like a race condition somewhere

Unfortunately the client is not opensource, but the only interaction with Icinga it does is pushing check results and get services from a host by API in worker threads (written in GO using http.Client).

Environment:

icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.2-1)

Copyright (c) 2012-2018 Icinga Development Team (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Ubuntu
  Platform version: 16.04.5 LTS (Xenial Xerus)
  Kernel: Linux
  Kernel version: 4.15.0-36-generic
  Architecture: x86_64

Build information:
  Compiler: GNU 5.3.1
  Build host: 409e1113863b

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

Enabled features: api checker command ido-pgsql mainlog

[2018-11-19 11:13:29 +0100] information/cli: Icinga application loader (version: r2.10.2-1)
[2018-11-19 11:13:29 +0100] information/cli: Loading configuration file(s).
[2018-11-19 11:13:29 +0100] information/ConfigItem: Committing config item(s).
[2018-11-19 11:13:29 +0100] information/ApiListener: My API identity: [ REMOVED ]
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 48 Services.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 4 Hosts.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 2 NotificationCommands.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 3 Zones.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 Endpoint.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 2 ApiUsers.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 215 CheckCommands.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 IdoPgsqlConnection.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 TimePeriod.
[2018-11-19 11:13:29 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2018-11-19 11:13:29 +0100] information/cli: Finished validating the configuration file(s).
@akqopensystems
Copy link

We are experiencing the same problem with the Icinga Output Plugin for Logstash. After some working API calls, the Log shows "[2018-11-26 11:39:09 +0100] information/HttpServerConnection: Unable to disconnect Http client, I/O thread busy". After this, all Client TLS handshakes fail: "Error: Timeout was reached (10) during TLS handshake".

@swegener
Copy link
Contributor

swegener commented Nov 27, 2018

While debugging hanging connection issues on our icinga2 setup, my finding is, that with "Connection: close" it makes a difference if the client sends the headers and body in one go or not.

If the client sends them in one socket operation, everything is fine, because they will be put into the same TCP packet and parsed by icinga at the same time.
If it sends the headers and body in two separate operations, it will probably result in two TCP packets, which depending on latency and icinga reaction time, will be parsed in two steps by icinga.
Icinga seems to read and parse the header packet with the "Connection: close" entry, tries to close the connection, while there is still data, the body packet, pending. This results in a hanging connection and also threads being stuck.

With a fresh started icinga and one hanging connection I did a quick gdb backtrace and I get two suspicious threads with these backtraces:

Thread 23 (Thread 0x7f3f8b9d1700 (LWP 179709)):
#0  0x00007f3fd0fd326d in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f3fd0fcce42 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00000000006ba69f in icinga::HttpServerConnection::DataAvailableHandler() ()
#3  0x00000000006488ee in ?? ()
#4  0x00000000005d4178 in icinga::Stream::SignalDataAvailable() ()
#5  0x000000000062c1c8 in icinga::TlsStream::OnEvent(int) ()
#6  0x00000000005f82cb in icinga::SocketEventEngineEpoll::ThreadProc(int) ()
#7  0x00007f3fd0209c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f3fd0fca6ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9  0x00007f3fd1e9b41d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 14 (Thread 0x7f3fcc501700 (LWP 179700)):
#0  0x00007f3fd0fd0709 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000005d89ea in icinga::SocketEventEngine::WakeUpThread(int, bool) ()
#2  0x00000000005d9325 in icinga::SocketEventEngineEpoll::Unregister(icinga::SocketEvents*) ()
#3  0x000000000061cf09 in icinga::TlsStream::CloseInternal(bool) ()
#4  0x00000000006a679a in icinga::HttpServerConnection::Disconnect() ()
#5  0x00000000006ba661 in icinga::HttpServerConnection::DataAvailableHandler() ()
#6  0x00000000006ba9c1 in icinga::HttpServerConnection::Start() ()
#7  0x0000000000679162 in icinga::ApiListener::NewClientHandlerInternal(boost::intrusive_ptr<icinga::Socket> const&, icinga::String const&, icinga::ConnectionRole) ()
#8  0x000000000067a430 in icinga::ApiListener::NewClientHandler(boost::intrusive_ptr<icinga::Socket> const&, icinga::String const&, icinga::ConnectionRole) ()
#9  0x000000000062edb6 in icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) ()
#10 0x00007f3fd197b5d5 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
#11 0x00007f3fd0fca6ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f3fd1e9b41d in clone () from /lib/x86_64-linux-gnu/libc.so.6

The first one is blocking to acquire m_DataHandlerMutex in icinga::HttpServerConnection::DataAvailableHandler(), which the second one already acquired in icinga::HttpServerConnection::Disconnect() and is trying to close the connection, looping to wake up a (the other?) thread.

I used the 2.10.2-1.xenial package from packages.icinga.com. Without debug info I cannot be sure whether the threads are in fact blocking on the same m_DataHandlerMutex, but I'm pretty certain they do.

@swegener
Copy link
Contributor

Reverting 13239c3 (which moved the disconnect path to icinga::HttpServerConnection::DataAvailableHandler) fixes our hanging connecion issue.

@stevie-sy
Copy link
Contributor

Situation

In our case the system is stable if there are no timouts (especially TLS handshake) here. But with the begin of the first message in the log files, our cluster is beginning to get unstable. The cluster-, ido- and icinga-checks are getting critical or unknown. But the icinga daemon is still running.

This is an example of the debug-log from our master2:
image

What are the results of this problem?

Deployment with the director:

On our master1 we configured the director for automatic deployment. If there is one in such unstable icinga state, the daemon is not able to reload correctly. It seems that it is getting forked, but it can't stop the old childs.
image

This is an example output while icinga2 is reloading. For us it seems icinga crashes while the step "information/ExternalCommandListener: 'command' stopped.", because it is the last output and systemd write still "reload" as state:

[2018-11-20 20:43:01 +0100] information/WorkQueue: #120 (HttpServerConnection) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-20 20:43:15 +0100] information/Checkable: Checking for configured notifications for object 'zmon-master1.ooe.gv.at!cluster'
[2018-11-20 20:43:28 +0100] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 1, rate: 32.8667/s (1972/min 10025/5min 29932/15min);
[2018-11-20 20:43:38 +0100] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 1, rate: 33.0167/s (1981/min 10025/5min 29932/15min);
[2018-11-20 20:43:56 +0100] information/WorkQueue: #153 (HttpServerConnection) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-20 20:44:06 +0100] information/Application: Got reload command: Starting new instance.
[2018-11-20 20:44:17 +0100] information/Application: Reload requested, letting new process take over.
[2018-11-20 20:44:17 +0100] information/ApiListener: 'api' stopped.
[2018-11-20 20:44:17 +0100] information/CheckerComponent: 'checker' stopped.
[2018-11-20 20:44:17 +0100] information/ExternalCommandListener: 'command' stopped.

Sometimes it happens, if the master1 is hanging, some seconds later also the icinga daemon at the master2 get the same problem .

State of the icinga2 daemon

Very often we find lines like this in the systemd-log after a restart/reload:
image

Problems with acknowledgements and setting downtimes in icingaweb2

The acknowledgment via icingaweb2 is not possible. There is a timout error:
Icinga Log:
image
This is the message in icingaweb2:
image
Result: The acknowledgment is not set.

The same situation is with setting a downtime

Icinga Log:
image

Icingaweb2 message:
image

But the diffrence to the acknowledgement the downtime is set.

Logstash-Plugin "logstash-output-icinga"

We also use the Logstash Plugin "logstash-output-icinga ". If logstash send something via API to icinga, the connection is reset by icinga.
This is the icinga log entry from the request:
image.

This are possible return messages in the logstash log:
image
image

I tested this also with on old icinga Setup (2.8.4-1). There icinga can handle the api request.
image

Because of this problem I changed to the logstash-plugin "logstash-output-nagios" for using the command pipe. With this everything is fine and our cluster is still stable. Also we don't have TLS-Errors in the log-files.

Summary

At the end it is the same behaviour what we have if nessus is scanning the server - see
https://github.com/Icinga/icinga2/issues/6559. That properly means for us the icinga logstash output plugin and nessus are sending API request which Icinga with Version 2.10.x can't handle.

@dnsmichi
Copy link
Contributor

Thus far, I'm not able to reproduce it with my previous analysis in #6514 (comment)

If one could provide a simple reproducer script, or a given number of objects/IO required for this tests, much appreciated. I'm still wondering whether the requests need to run in full parallel, or going more into the socket events overloaded route.

@dnsmichi dnsmichi added area/api REST API needs feedback We'll only proceed once we hear from you again labels Nov 30, 2018
@swegener
Copy link
Contributor

swegener commented Nov 30, 2018

What I came up with is this snipet that uses openssl s_client and sends the header and body with a sleep in between. If you comment the sleep, it should work, with the sleep it results in icinga threads blocking. There's no parallelism needed, just one request with the sleep in between.

{
	cat <<-EOF
		GET /v1/objects/hosts HTTP/1.1
		Authorization: Basic $BASE64_ENCODED_USERNAME_PASSWORD
		Content-Type: application/json
		Connection: close
		Accept: application/json
		Content-Length: 80

	EOF

	sleep 1

	cat <<-EOF
		{"filter": "host.state!=0 && host.acknowledgement!=1 && host.downtime_depth==0"}
	EOF
} | openssl s_client -quiet -connect $IP:5665

@dnsmichi
Copy link
Contributor

dnsmichi commented Dec 3, 2018

There's no parallelism needed, just one request with the sleep in between.

Thanks, that was the missing point in my brain.

Test Script

Script with root:icinga base64 encoded.

vim conn_close.sh

#!/bin/bash

BASE64_ENCODED_USERNAME_PASSWORD="cm9vdDppY2luZ2E="
IP="127.0.0.1"
{
        cat <<-EOF
                GET /v1/objects/hosts HTTP/1.1
                Authorization: Basic $BASE64_ENCODED_USERNAME_PASSWORD
                Content-Type: application/json
                Connection: close
                Accept: application/json
                Content-Length: 80

        EOF

        sleep 1

        cat <<-EOF
                {"filter": "host.state!=0 && host.acknowledgement!=1 && host.downtime_depth==0"}
        EOF
} | openssl s_client -quiet -connect $IP:5665

chmod +x conn_close.sh

Current Behaviour

michi@mbmif ~/coding/icinga/icinga2 (feature/development-docs=) $ ./conn_close.sh
depth=1 CN = Icinga CA
verify error:num=19:self signed certificate in certificate chain
verify return:0
[2018-12-03 13:39:38 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:53215 (no client certificate)
[2018-12-03 13:39:38 +0100] information/HttpServerConnection: Request: GET /v1/objects/hosts (from [::ffff:127.0.0.1]:53215), user: root)
[2018-12-03 13:39:38 +0100] information/HttpServerConnection: Unable to disconnect Http client, I/O thread busy
[2018-12-03 13:39:38 +0100] information/HttpServerConnection: HTTP client disconnected (from [::ffff:127.0.0.1]:53215)
[2018-12-03 13:39:38 +0100] information/HttpServerConnection: HTTP client disconnected (from [::ffff:127.0.0.1]:53215)

The doubled log line is misleading at best, so I've taken the extra mile and debugged this below.

While reverting the suggested patch, there's a certain possibility that the "corked" functionality did hide this dead lock previously.

<<<<<<< HEAD
                /* Request finished, decide whether to explicitly close the connection. */
                if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
                        m_CurrentRequest.Headers->Get("connection") == "close") {
                        m_Stream->Shutdown();
                        close = true;
                }
=======
                m_RequestQueue.Enqueue(std::bind(&Stream::SetCorked, m_Stream, false));
>>>>>>> parent of 13239c317... Ensure that HTTP/1.0 or Connection: close headers are properly disconnecting the client

Patch fixed

michi@mbmif ~/coding/icinga/icinga2 (bugfix/conn-close-stall) $ ./conn_close.sh
depth=1 CN = Icinga CA
verify error:num=19:self signed certificate in certificate chain
verify return:0
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
Server: Icinga/v2.10.2-91-g9afe7b234

206b4
{"results":[{"attrs"

   ...

ta":{},"name":"mbmif","type":"Host"}]}
0

[2018-12-03 13:51:38 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:53414 (no client certificate)
[2018-12-03 13:51:38 +0100] information/HttpServerConnection: Request: GET /v1/objects/hosts (from [::ffff:127.0.0.1]:53414), user: root)
[2018-12-03 13:51:40 +0100] information/HttpServerConnection: HTTP client disconnected (from [::ffff:127.0.0.1]:53414)

Debugging Session

mbmif /usr/local/icinga2 (master *) # lldb -- /usr/local/icinga2/lib/icinga2/sbin/icinga2 daemon
(lldb) target create "/usr/local/icinga2/lib/icinga2/sbin/icinga2"
Current executable set to '/usr/local/icinga2/lib/icinga2/sbin/icinga2' (x86_64).
(lldb) settings set -- target.run-args  "daemon"
(lldb) b icinga::HttpServerConnection::Disconnect()
Breakpoint 1: where = icinga2`icinga::HttpServerConnection::Disconnect() + 19 at httpserverconnection.cpp:91, address = 0x0000000100629653
(lldb) r

2 threads call Disconnect().

[2018-12-03 14:04:52 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:53570 (no client certificate)
[2018-12-03 14:04:52 +0100] information/HttpServerConnection: Request: GET /v1/objects/hosts (from [::ffff:127.0.0.1]:53570), user: root)
Process 31665 stopped
* thread #12, stop reason = breakpoint 1.1
    frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
   88
   89  	void HttpServerConnection::Disconnect()
   90  	{
-> 91  		boost::recursive_mutex::scoped_try_lock lock(m_DataHandlerMutex);
   92  		if (!lock.owns_lock()) {
   93  			Log(LogInformation, "HttpServerConnection", "Unable to disconnect Http client, I/O thread busy");
   94  			return;
  thread #27, stop reason = breakpoint 1.1
    frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
   88
   89  	void HttpServerConnection::Disconnect()
   90  	{
-> 91  		boost::recursive_mutex::scoped_try_lock lock(m_DataHandlerMutex);
   92  		if (!lock.owns_lock()) {
   93  			Log(LogInformation, "HttpServerConnection", "Unable to disconnect Http client, I/O thread busy");
   94  			return;
Target 0: (icinga2) stopped.

Thread 12

This thread handles the main connection and calls HttpServerConnection::DataAvailableHandler() once in the Start() method.

(lldb) thr sel 12
* thread #12, stop reason = breakpoint 1.1
    frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
   88
   89  	void HttpServerConnection::Disconnect()
   90  	{
-> 91  		boost::recursive_mutex::scoped_try_lock lock(m_DataHandlerMutex);
   92  		if (!lock.owns_lock()) {
   93  			Log(LogInformation, "HttpServerConnection", "Unable to disconnect Http client, I/O thread busy");
   94  			return;
(lldb) bt
* thread #12, stop reason = breakpoint 1.1
  * frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
    frame #1: 0x000000010062955c icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x000000010382a400) at httpserverconnection.cpp:376
    frame #2: 0x00000001006290ff icinga2`icinga::HttpServerConnection::Start(this=0x000000010382a400) at httpserverconnection.cpp:76
    frame #3: 0x00000001004f97fb icinga2`icinga::ApiListener::NewClientHandlerInternal(this=0x0000000104804400, client=0x000000010672e360, hostname=0x000000010672e368, role=RoleServer) at apilistener.cpp:606
    frame #4: 0x00000001004f60fe icinga2`icinga::ApiListener::NewClientHandler(this=0x0000000104804400, client=0x000000010672e360, hostname=0x000000010672e368, role=RoleServer) at apilistener.cpp:441

   .....


    frame #16: 0x000000010034f94e icinga2`boost::detail::thread_data<std::__1::__bind<void (icinga::ThreadPool::WorkerThread::*)(icinga::ThreadPool::Queue&), std::__1::reference_wrapper<icinga::ThreadPool::WorkerThread>, std::__1::reference_wrapper<icinga::ThreadPool::Queue> > >::run(this=0x0000000106655100) at thread.hpp:117
    frame #17: 0x00000001030882c8 libboost_thread-mt.dylib`boost::(anonymous namespace)::thread_proxy(void*) + 136
    frame #18: 0x00007fff53d52661 libsystem_pthread.dylib`_pthread_body + 340
    frame #19: 0x00007fff53d5250d libsystem_pthread.dylib`_pthread_start + 377
    frame #20: 0x00007fff53d51bf9 libsystem_pthread.dylib`thread_start + 13
(lldb)

Thread 27

This thread is doing the socket event polling and gets woken up in TlsStream::OnEvent which then results in DataAvailableHandler() and calling Disconnect().

Actually this thread is correctly closing the connection.

(lldb) thr sel 27
* thread #27, stop reason = breakpoint 1.1
    frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
   88
   89  	void HttpServerConnection::Disconnect()
   90  	{
-> 91  		boost::recursive_mutex::scoped_try_lock lock(m_DataHandlerMutex);
   92  		if (!lock.owns_lock()) {
   93  			Log(LogInformation, "HttpServerConnection", "Unable to disconnect Http client, I/O thread busy");
   94  			return;
(lldb) bt
(lldb) bt
* thread #27, stop reason = breakpoint 1.1
  * frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
    frame #1: 0x000000010062955c icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x000000010382a400) at httpserverconnection.cpp:376

....

    frame #21: 0x00000001002e5325 icinga2`icinga::Stream::SignalDataAvailable(this=0x0000000107036cc0) at stream.cpp:55
    frame #22: 0x000000010035f9f3 icinga2`icinga::TlsStream::OnEvent(this=0x0000000107036cc0, revents=1) at tlsstream.cpp:289
    frame #23: 0x00000001002dd39f icinga2`icinga::SocketEventEnginePoll::ThreadProc(this=0x0000000103890600, tid=0) at socketevents-poll.cpp:121

Why would the first thread call Disconnect()?

(lldb) br del
About to delete all breakpoints, do you want to do that?: [Y/n] y
All breakpoints removed. (1 breakpoint)
(lldb) b icinga::HttpServerConnection::DataAvailableHandler
Breakpoint 2: where = icinga2`icinga::HttpServerConnection::DataAvailableHandler() + 33 at httpserverconnection.cpp:351, address = 0x00000001006291b1

[2018-12-03 14:17:44 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:53666 (no client certificate)
Process 32506 stopped
* thread #12, stop reason = breakpoint 2.1
    frame #0: 0x00000001006291b1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:351
   348
   349 	void HttpServerConnection::DataAvailableHandler()
   350 	{
-> 351 		bool close = false;
   352
   353 		if (!m_Stream->IsEof()) {
   354 			boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
  thread #26, stop reason = breakpoint 2.1
    frame #0: 0x00000001006291b1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:351
   348
   349 	void HttpServerConnection::DataAvailableHandler()
   350 	{
-> 351 		bool close = false;
   352
   353 		if (!m_Stream->IsEof()) {
   354 			boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
Target 0: (icinga2) stopped.

Dig deeper into the first thread being called just once via DataAvailableHandler():

(lldb) thr sel 12
* thread #12, stop reason = breakpoint 2.1
    frame #0: 0x00000001006291b1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:351
   348
   349 	void HttpServerConnection::DataAvailableHandler()
   350 	{
-> 351 		bool close = false;
   352
   353 		if (!m_Stream->IsEof()) {
   354 			boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
(lldb) n
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x00000001006291b5 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:353
   350 	{
   351 		bool close = false;
   352
-> 353 		if (!m_Stream->IsEof()) {
   354 			boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
   355
   356 			try {
  thread #26, stop reason = breakpoint 2.1
    frame #0: 0x00000001006291b1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:351
   348
   349 	void HttpServerConnection::DataAvailableHandler()
   350 	{
-> 351 		bool close = false;
   352
   353 		if (!m_Stream->IsEof()) {
   354 			boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x00000001006291dc icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:354
   351 		bool close = false;
   352
   353 		if (!m_Stream->IsEof()) {
-> 354 			boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
   355
   356 			try {
   357 				while (ProcessMessage())
Target 0: (icinga2) stopped.
(lldb)
[2018-12-03 14:18:10 +0100] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 0, rate: 1182.28/s (70937/min 70937/5min 70937/15min);
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x00000001006291f1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:357
   354 			boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
   355
   356 			try {
-> 357 				while (ProcessMessage())
   358 					; /* empty loop body */
   359 			} catch (const std::exception& ex) {
   360 				Log(LogWarning, "HttpServerConnection")
Target 0: (icinga2) stopped.
(lldb)
[2018-12-03 14:18:10 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-12-03 14:18:13 +0100] information/Application: We jumped forward in time: 30.2703 seconds
[2018-12-03 14:18:13 +0100] warning/ThreadPool: Event call took 2.97695s
[2018-12-03 14:18:13 +0100] warning/ThreadPool: Event call took 2.97716s
[2018-12-03 14:18:15 +0100] warning/ThreadPool: Event call took 4.50566s
[2018-12-03 14:18:15 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 23.9/s (1434/min 1434/5min 1434/15min);
[2018-12-03 14:18:15 +0100] information/HttpServerConnection: Request: GET /v1/objects/hosts (from [::ffff:127.0.0.1]:53666), user: root)
[2018-12-03 14:18:15 +0100] warning/ThreadPool: Event call took 4.58007s
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x000000010062944e icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:359
   356 			try {
   357 				while (ProcessMessage())
   358 					; /* empty loop body */
-> 359 			} catch (const std::exception& ex) {
   360 				Log(LogWarning, "HttpServerConnection")
   361 					<< "Error while reading Http request: " << DiagnosticInformation(ex);
   362
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x000000010062931e icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:367
   364 			}
   365
   366 			/* Request finished, decide whether to explicitly close the connection. */
-> 367 			if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
   368 				m_CurrentRequest.Headers->Get("connection") == "close") {
   369 				m_Stream->Shutdown();
   370 				close = true;
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x0000000100629345 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:368
   365
   366 			/* Request finished, decide whether to explicitly close the connection. */
   367 			if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
-> 368 				m_CurrentRequest.Headers->Get("connection") == "close") {
   369 				m_Stream->Shutdown();
   370 				close = true;
   371 			}
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x00000001006293c2 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:367
   364 			}
   365
   366 			/* Request finished, decide whether to explicitly close the connection. */
-> 367 			if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
   368 				m_CurrentRequest.Headers->Get("connection") == "close") {
   369 				m_Stream->Shutdown();
   370 				close = true;
Target 0: (icinga2) stopped.
(lldb)
[2018-12-03 14:18:22 +0100] warning/ThreadPool: Event call took 7.63664s
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x0000000100629425 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:369
   366 			/* Request finished, decide whether to explicitly close the connection. */
   367 			if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
   368 				m_CurrentRequest.Headers->Get("connection") == "close") {
-> 369 				m_Stream->Shutdown();
   370 				close = true;
   371 			}
   372 		} else
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x0000000100629445 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:370
   367 			if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
   368 				m_CurrentRequest.Headers->Get("connection") == "close") {
   369 				m_Stream->Shutdown();
-> 370 				close = true;
   371 			}
   372 		} else
   373 			close = true;
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x0000000100629449 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:371
   368 				m_CurrentRequest.Headers->Get("connection") == "close") {
   369 				m_Stream->Shutdown();
   370 				close = true;
-> 371 			}
   372 		} else
   373 			close = true;
   374
Target 0: (icinga2) stopped.
(lldb) p close
(bool) $0 = true
(lldb) n
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x000000010062952a icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:372
   369 				m_Stream->Shutdown();
   370 				close = true;
   371 			}
-> 372 		} else
   373 			close = true;
   374
   375 		if (close)
Target 0: (icinga2) stopped.
(lldb)
[2018-12-03 14:18:35 +0100] warning/ThreadPool: Event call took 19.78s
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x0000000100629546 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:375
   372 		} else
   373 			close = true;
   374
-> 375 		if (close)
   376 			Disconnect();
   377 	}
   378
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
    frame #0: 0x0000000100629550 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:376
   373 			close = true;
   374
   375 		if (close)
-> 376 			Disconnect();
   377 	}
   378
   379 	void HttpServerConnection::CheckLiveness()
Target 0: (icinga2) stopped.

Summary

Actually the request body is not parsed at this point, where @swegener is 100% correct with his analysis.

Therefore closing the request here makes absolutely no sense, and I did not think about this when fixing it for 2.10.0.

Older versions of Icinga likely did not suffer from Connection: close being incorrectly parsed, but the enqueuing of that "Corked" stuff, meaning to say that future packets where not handled properly.

With the revert of the "Corked" stuff in 2.10.2 and #6738 such state machine like connections with wait times in between function properly again.

dnsmichi pushed a commit that referenced this issue Dec 3, 2018
Actually the `corked` functionality caused problems with
not closing connections properly.

Full Analysis: #6799 (comment)

Full credits to @swegener :)

fixes #6799
@dnsmichi dnsmichi added bug Something isn't working and removed needs feedback We'll only proceed once we hear from you again labels Dec 3, 2018
@dnsmichi dnsmichi added this to the 2.10.3 milestone Dec 3, 2018
dnsmichi pushed a commit that referenced this issue Dec 5, 2018
Actually the `corked` functionality caused problems with
not closing connections properly.

Full Analysis: #6799 (comment)

Full credits to @swegener :)

fixes #6799
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants