Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API TLS session connection closed after 2 requests #6635

Closed
rilott opened this issue Sep 25, 2018 · 9 comments · Fixed by #6738
Closed

API TLS session connection closed after 2 requests #6635

rilott opened this issue Sep 25, 2018 · 9 comments · Fixed by #6738
Assignees
Labels
area/api REST API bug Something isn't working
Milestone

Comments

@rilott
Copy link

rilott commented Sep 25, 2018

Expected Behavior

You can use 1 API connection to POST to /v1/actions/process-check-result many times, rather than opening a new TCP connection for each POST

Current Behavior

The first 2 POSTs work, then on the 3rd POST the connection hangs for a few seconds, then is closed by icinga

Possible Solution

Unsure, this only started happening after upgrading to 2.9.1 (we were on 2.4.10)

Steps to Reproduce (for bugs)

  1. Open a connection to the icinga API
  2. POST to /v1/actions/process-check-result 2 times
  3. POST a third time fails

The issue can be replicated with the following python:

#! /usr/bin/env python

import logging
import requests
import sys
import time

from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
console_logger = logging.StreamHandler(sys.stdout)
console_logger.setFormatter(logging.Formatter('%(name)s %(asctime)s %(levelname)s : %(message)s'))
logger.addHandler(console_logger)

with open('icinga_password', 'r') as f:
    password = f.read().rstrip()

hostname = '<insert fqdn>'
username = '<insert API username>'

session = requests.Session()
session.verify = False
session.auth = (username, password)
session.headers = {
    'accept': 'application/json',
    'content-type': 'application/json',
}

while True:
    try:
        result = session.post('https://{}:5665/v1/actions/process-check-result?service=testhost!testservice'.format(hostname), json={'exit_status': 0, 'plugin_output': 'all is ok'})
        print(result.text)
    except Exception as e:
        print(e)
        logger.error('Error')
    time.sleep(1)

Output:

requests.packages.urllib3.connectionpool 2018-09-25 17:18:28,839 DEBUG : Starting new HTTPS connection (1): <fqdn>
requests.packages.urllib3.connectionpool 2018-09-25 17:18:28,899 DEBUG : https://<fqdn>:5665 "POST /v1/actions/process-check-result?service=testhost!testservice HTTP/1.1" 404 None
{"error":404.0,"status":"No objects found."}
requests.packages.urllib3.connectionpool 2018-09-25 17:18:29,905 DEBUG : https://<fqdn>:5665 "POST /v1/actions/process-check-result?service=testhost!testservice HTTP/1.1" 404 None
{"error":404.0,"status":"No objects found."}
('Connection aborted.', BadStatusLine("''",))
root 2018-09-25 17:18:40,908 ERROR : Error

Context

We use this API endpoint to update the status of passive checks. We have many hundreds of them, so using 1 connection to update them all is much more efficient than opening a new connection to the API for each one.

Your Environment

  • Version used (icinga2 --version):
    icinga2 - The Icinga 2 network monitoring daemon (version: r2.9.1-1)

  • Operating System and version:

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 3.10.0-693.11.6.el7.x86_64
  Architecture: x86_64
  • Enabled features (icinga2 feature list):
    Enabled features: api checker command ido-mysql mainlog notification
  • Icinga Web 2 version and modules (System - About):
    2.6.1 (b3e0b5d587c3587a09d82b3634dbe3a5dd315353)
  • Config validation (icinga2 daemon -C):
    Validates fine
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
Object 'global-templates' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 31:1-31:30
  * __name = "global-templates"
  * endpoints = null
  * global = true
    % = modified in '/etc/icinga2/zones.conf', lines 32:3-32:15
  * name = "global-templates"
  * package = "_etc"
  * parent = ""
  * source_location
    * first_column = 1
    * first_line = 31
    * last_column = 30
    * last_line = 31
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "global-templates" ]
    % = modified in '/etc/icinga2/zones.conf', lines 31:1-31:30
  * type = "Zone"
  * zone = ""

Object 'ln' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 27:1-27:16
  * __name = "ln"
  * endpoints = [ "<fqdn1>", "<fqdn2>" ]
    % = modified in '/etc/icinga2/zones.conf', lines 28:3-28:70
  * global = false
  * name = "ln"
  * package = "_etc"
  * parent = ""
  * source_location
    * first_column = 1
    * first_line = 27
    * last_column = 16
    * last_line = 27
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "ln" ]
    % = modified in '/etc/icinga2/zones.conf', lines 27:1-27:16
  * type = "Zone"
  * zone = ""
@dnsmichi dnsmichi added area/api REST API bug Something isn't working labels Sep 26, 2018
@dnsmichi
Copy link
Contributor

Problem persists with 2.9.2, maybe resolved with the snapshot packages. Thanks for the script, this helps a lot - will continue later.

@dnsmichi
Copy link
Contributor

I'll look into this after 2.10, it is nasty but doesn't crash.

@dnsmichi
Copy link
Contributor

It is a timing problem - when you halt in lldb, it sometimes lasts longer. I suspect a mix between ProcessMessage() and setting the stream to corked.

@dnsmichi
Copy link
Contributor

Hm, yeah, temporarily removing that corked bullshit (pardon) solves it.

@dnsmichi
Copy link
Contributor

Corked basically also affects how events are read with epoll in our socket IO handling. I would assume that at some point, future events are not read nor the socket is woken up. This leads to this hard to debug behaviour.

@dnsmichi
Copy link
Contributor

Likely this also affects connection handling in the JSON-RPC class with undefined timeouts being seen in the past weeks.

@dnsmichi dnsmichi added this to the 2.10.2 milestone Oct 29, 2018
dnsmichi pushed a commit that referenced this issue Oct 29, 2018
…nd not stalled

This basically drops the "corked" implementation which just stalled the
TLS IO polling after some requests. If you need sort of rate limiting
for these events, use an external TLS proxy which terminates that in front
of Icinga.

fixes #6635
@dnsmichi
Copy link
Contributor

screen shot 2018-10-29 at 14 15 09

@dnsmichi
Copy link
Contributor

Also tested the Director kickstart CLI command, works again.

@dnsmichi
Copy link
Contributor

==> icinga2: Notice: /Stage[main]/Icingaweb2::Module::Director/Exec[director-migration]/returns: executed successfully
==> icinga2: Notice: /Stage[main]/Icingaweb2::Module::Director/Exec[director-kickstart]/returns: executed successfully

dnsmichi pushed a commit that referenced this issue Nov 12, 2018
…nd not stalled

This basically drops the "corked" implementation which just stalled the
TLS IO polling after some requests. If you need sort of rate limiting
for these events, use an external TLS proxy which terminates that in front
of Icinga.

fixes #6635
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants