Leak of scheduler thread under not catched errors in plugin run #21

andsel · 2020-03-06T13:27:50Z

When Logstash during the start of a plugin is not able to catch all errors, like IO errors from the FS side, there is leakage of threads or resources.

This has been verified with logstash-input-jdbc with an IO error from redirected stdout and stderr of the Logstash process.

The problem is that the jdbc input uses Rufus scheduler, and when the scheduler parses the schedule pattern (* * * * *)
https://github.com/jmettraux/rufus-scheduler/blob/v3.0.9/lib/rufus/scheduler/cronline.rb#L77 it writes to stdout warning: constant ::Fixnum is deprecated, if in this step there is an IO error from the stdout the raised error make the pipeline to crash, but not to shutdown the LogStash process.
Now if have the Logstash that uses the xpack.management we end in a situation where at every reload of the pipeline config from the remote we leak one thread.

Steps to reproduce

create a limited size partition where redirect the stdout and stderr of Logstash
start a docker with ES and Kibana to be used to configure Logstash
configure Logstash to use the remote config and start it redirecting the stdout and stderr to the limited size partition
in Kibana configure the incriminated pipeline
in Logstash logs you can see the iteration of pipeline fault and recreate without Logstash exiting
use the jps and jstack tool to verify the rising of threads
cleanup

create limited size partition

create a file that serves as image (512k)

dd if=/dev/zero of=/tmp/virtualblock.img bs=512 count=1024

link to loopback device

sudo losetup -fP  /tmp/virtualblock.img

create a filesystem in it

mkfs.ext4 /tmp/virtualblock.img

find the loopback device on which the losetup linked the image file, something like /dev/loop<n>:

sudo losetup -a

create a mount point and mount the loop device on it:

mkdir /loopfs
sudo mount -o loop /dev/loop<n> /loopfs

create tmp folder to be used to redirect stderr and stdout of Logstash process:

sudo mkdir -p /loopfs/tmp
sudo chmod a+w /loopfs/tmp/

start a docker with ES and Kibana
use the attached docker compose file and start it:

docker-compose -f docker-compose-monitoring-vs-management_without_LS.yml up

docker-compose-monitoring-vs-management_without_LS.zip

configure main pipeline
Log into http://localhost:5601 with elastic/changeme , the management tab create a simple main pipeline (input { stdin {} } output {stdout {} })

configure Logstash and run it with stdout/stderr redirected

in config/logstash.yml put:

xpack.management.enabled: true
xpack.management.pipeline.id: ["main", "jdbc_input"]
xpack.management.elasticsearch.username: elastic
xpack.management.elasticsearch.password: changeme
xpack.management.elasticsearch.hosts: ["http://localhost:9200"]

run Logstash:

bin/logstash 2>/loopfs/tmp/ls.out 1>&2

after Logstash are up, consume the space remaining on this fake device:

$>df -h
...
/dev/loop<n>     488K   32K  424K   8% /loopfs
...

dd if=/dev/zero of=/loopfs/tmp/filler bs=1024 count=424

in Kibana configure the incriminated pipeline

in Kibana UI create a pipeline name jdbc_input with this definition:

input {
  jdbc {
    jdbc_user => "andrea"
    jdbc_password => "password"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_driver_library => "/sql_drivers/mysql-connector-java-8.0.18/mysql-connector-java-8.0.18.jar"
    jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test_logstash"
    jdbc_validate_connection => true
    schedule => "* * * * *"
    statement_filepath => "/tmp/input_script.sql"
    tracking_column => "item_id"
    use_column_value => true
  }
}
output {
  stdout {
    codec => rubydebug
  }  
}

Be sure to have /tmp/input_script.sql and the drivers, but it's not mandatory to have a running MySQL to trigger the bug

monitor logs/logstash-plain.log to check to see this:

[2020-03-05T09:51:21,300][ERROR][logstash.javapipeline    ][jdbc_input][05a070f66962b6ebb9b4a5f50b8d3a84871f7cf4f589e7c20e426d61ef97080d] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:jdbc_input
  Plugin: <LogStash::Inputs::Jdbc jdbc_user=>"andrea", tracking_column=>"item_id", use_column_value=>true, schedule=>"* * * * *", jdbc_validate_connection=>true, jdbc_password=><password>, statement_filepath=>"/tmp/input_script.sql", jdbc_driver_library=>"/home/andrea/workspace/sql_drivers/mysql-connector-java-8.0.18/mysql-connector-java-8.0.18.jar", jdbc_connection_string=>"jdbc:mysql://127.0.0.1:3306/test_logstash", id=>"05a070f66962b6ebb9b4a5f50b8d3a84871f7cf4f589e7c20e426d61ef97080d", jdbc_driver_class=>"com.mysql.cj.jdbc.Driver", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_63036b9d-9ad0-4c0d-b068-5f5be3bebecc", enable_metric=>true, charset=>"UTF-8">, jdbc_paging_enabled=>false, jdbc_page_size=>100000, jdbc_validation_timeout=>3600, jdbc_pool_timeout=>5, sql_log_level=>"info", connection_retry_attempts=>1, connection_retry_attempts_wait_time=>0.5, plugin_timezone=>"utc", last_run_metadata_path=>"/home/andrea/.logstash_jdbc_last_run", tracking_column_type=>"numeric", clean_run=>false, record_last_run=>true, lowercase_column_names=>true, use_prepared_statements=>false>
  Error: No space left on device - <STDERR>
  Exception: Errno::ENOSPC
  Stack: org/jruby/RubyIO.java:1477:in `write'
org/jruby/RubyIO.java:1432:in `write'
org/jruby/common/RubyWarnings.java:229:in `warn'
org/jruby/common/RubyWarnings.java:106:in `warn'
org/jruby/common/RubyWarnings.java:143:in `warn'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77:in `block in initialize'
org/jruby/RubyEnumerable.java:695:in `find'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77:in `block in initialize'
org/jruby/RubyArray.java:1814:in `each'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:73:in `initialize'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:604:in `initialize'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler.rb:629:in `do_schedule'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler.rb:244:in `cron'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/logstash-integration-jdbc-5.0.1/lib/logstash/inputs/jdbc.rb:276:in `run'
/home/andrea/workspace/logstash_andsel/logstash-core/lib/logstash/java_pipeline.rb:330:in `inputworker'
/home/andrea/workspace/logstash_andsel/logstash-core/lib/logstash/java_pipeline.rb:321:in `block in start_input'

use the jps and jstack tool to verify the rising of threads

in your JAVA_HOME use the jps to find Logstash pid
jstack <pid> or other monitor tool to see the increment of number of threads

cleanup

sudo umount /loopfs
sudo losetup -d /dev/loop<n>

The text was updated successfully, but these errors were encountered:

andsel · 2020-03-06T13:43:20Z

The problem is that in https://github.com/elastic/logstash/blob/e9c9865f4066b54048f8d708612a72d25e2fe5d9/logstash-core/lib/logstash/java_pipeline.rb#L329-L358 if the plugin.run fails then there is an invocation to do_close and then a retry. The retry it's because we don't distinguish transient from permanent errors in run, and so keep retrying. But if the plugin doesn't implement the do_close to do the cleanup of allocated resources, we have a leakage like in this case.
Possible solutions (exclusive):

move the creation of the scheduler into register (don't implement any cleanup in close, else the run can't never run correctly again)
implement the do_close to cleanup the scheduler created in run

close method is used to clean in case of error in run phase of the plugin, implemented to avoid leak of threads generated by Rufus scheduler. Fixes logstash-plugins#21

close method is used to clean in case of error in run phase of the plugin, implemented to avoid leak of threads generated by Rufus scheduler. Fixes logstash-plugins#21 Close logstash-plugins#28

andsel mentioned this issue Mar 6, 2020

Leak of thread under not catched errors in plugin start elastic/logstash#11657

Closed

andsel mentioned this issue Apr 24, 2020

Implemented close method to release scheduler resources on error #28

Merged

kares mentioned this issue Apr 27, 2020

Do not limit scheduler library to old version logstash-plugins/logstash-input-http_poller#121

Closed

elasticsearch-bot closed this as completed in df09843 May 11, 2020

This was referenced May 29, 2020

Possible thread leak due to Rufus if not properly closed logstash-plugins/logstash-input-http_poller#122

Open

Possible thread leak due to Rufus if not properly closed logstash-plugins/logstash-input-elasticsearch#130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leak of scheduler thread under not catched errors in plugin run #21

Leak of scheduler thread under not catched errors in plugin run #21

andsel commented Mar 6, 2020

andsel commented Mar 6, 2020 •

edited

Loading

Leak of scheduler thread under not catched errors in plugin run #21

Leak of scheduler thread under not catched errors in plugin run #21

Comments

andsel commented Mar 6, 2020

Steps to reproduce

andsel commented Mar 6, 2020 • edited Loading

andsel commented Mar 6, 2020 •

edited

Loading