Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leak of scheduler thread under not catched errors in plugin run #21

Closed
andsel opened this issue Mar 6, 2020 · 1 comment · Fixed by #28
Closed

Leak of scheduler thread under not catched errors in plugin run #21

andsel opened this issue Mar 6, 2020 · 1 comment · Fixed by #28

Comments

@andsel
Copy link
Contributor

andsel commented Mar 6, 2020

When Logstash during the start of a plugin is not able to catch all errors, like IO errors from the FS side, there is leakage of threads or resources.

This has been verified with logstash-input-jdbc with an IO error from redirected stdout and stderr of the Logstash process.

The problem is that the jdbc input uses Rufus scheduler, and when the scheduler parses the schedule pattern (* * * * *)
https://github.com/jmettraux/rufus-scheduler/blob/v3.0.9/lib/rufus/scheduler/cronline.rb#L77 it writes to stdout warning: constant ::Fixnum is deprecated, if in this step there is an IO error from the stdout the raised error make the pipeline to crash, but not to shutdown the LogStash process.
Now if have the Logstash that uses the xpack.management we end in a situation where at every reload of the pipeline config from the remote we leak one thread.

Steps to reproduce

  • create a limited size partition where redirect the stdout and stderr of Logstash
  • start a docker with ES and Kibana to be used to configure Logstash
  • configure Logstash to use the remote config and start it redirecting the stdout and stderr to the limited size partition
  • in Kibana configure the incriminated pipeline
  • in Logstash logs you can see the iteration of pipeline fault and recreate without Logstash exiting
  • use the jps and jstack tool to verify the rising of threads
  • cleanup

create limited size partition

  • create a file that serves as image (512k)
dd if=/dev/zero of=/tmp/virtualblock.img bs=512 count=1024
  • link to loopback device
sudo losetup -fP  /tmp/virtualblock.img
  • create a filesystem in it
mkfs.ext4 /tmp/virtualblock.img
  • find the loopback device on which the losetup linked the image file, something like /dev/loop<n>:
sudo losetup -a
  • create a mount point and mount the loop device on it:
mkdir /loopfs
sudo mount -o loop /dev/loop<n> /loopfs
  • create tmp folder to be used to redirect stderr and stdout of Logstash process:
sudo mkdir -p /loopfs/tmp
sudo chmod a+w /loopfs/tmp/

start a docker with ES and Kibana
use the attached docker compose file and start it:

docker-compose -f docker-compose-monitoring-vs-management_without_LS.yml up

docker-compose-monitoring-vs-management_without_LS.zip

configure main pipeline
Log into http://localhost:5601 with elastic/changeme , the management tab create a simple main pipeline (input { stdin {} } output {stdout {} })

configure Logstash and run it with stdout/stderr redirected

  • in config/logstash.yml put:
xpack.management.enabled: true
xpack.management.pipeline.id: ["main", "jdbc_input"]
xpack.management.elasticsearch.username: elastic
xpack.management.elasticsearch.password: changeme
xpack.management.elasticsearch.hosts: ["http://localhost:9200"]
  • run Logstash:
bin/logstash 2>/loopfs/tmp/ls.out 1>&2
  • after Logstash are up, consume the space remaining on this fake device:
$>df -h
...
/dev/loop<n>     488K   32K  424K   8% /loopfs
...
dd if=/dev/zero of=/loopfs/tmp/filler bs=1024 count=424

in Kibana configure the incriminated pipeline

  • in Kibana UI create a pipeline name jdbc_input with this definition:
input {
  jdbc {
    jdbc_user => "andrea"
    jdbc_password => "password"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_driver_library => "/sql_drivers/mysql-connector-java-8.0.18/mysql-connector-java-8.0.18.jar"
    jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test_logstash"
    jdbc_validate_connection => true
    schedule => "* * * * *"
    statement_filepath => "/tmp/input_script.sql"
    tracking_column => "item_id"
    use_column_value => true
  }
}
output {
  stdout {
    codec => rubydebug
  }  
}

Be sure to have /tmp/input_script.sql and the drivers, but it's not mandatory to have a running MySQL to trigger the bug

  • monitor logs/logstash-plain.log to check to see this:
[2020-03-05T09:51:21,300][ERROR][logstash.javapipeline    ][jdbc_input][05a070f66962b6ebb9b4a5f50b8d3a84871f7cf4f589e7c20e426d61ef97080d] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:jdbc_input
  Plugin: <LogStash::Inputs::Jdbc jdbc_user=>"andrea", tracking_column=>"item_id", use_column_value=>true, schedule=>"* * * * *", jdbc_validate_connection=>true, jdbc_password=><password>, statement_filepath=>"/tmp/input_script.sql", jdbc_driver_library=>"/home/andrea/workspace/sql_drivers/mysql-connector-java-8.0.18/mysql-connector-java-8.0.18.jar", jdbc_connection_string=>"jdbc:mysql://127.0.0.1:3306/test_logstash", id=>"05a070f66962b6ebb9b4a5f50b8d3a84871f7cf4f589e7c20e426d61ef97080d", jdbc_driver_class=>"com.mysql.cj.jdbc.Driver", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_63036b9d-9ad0-4c0d-b068-5f5be3bebecc", enable_metric=>true, charset=>"UTF-8">, jdbc_paging_enabled=>false, jdbc_page_size=>100000, jdbc_validation_timeout=>3600, jdbc_pool_timeout=>5, sql_log_level=>"info", connection_retry_attempts=>1, connection_retry_attempts_wait_time=>0.5, plugin_timezone=>"utc", last_run_metadata_path=>"/home/andrea/.logstash_jdbc_last_run", tracking_column_type=>"numeric", clean_run=>false, record_last_run=>true, lowercase_column_names=>true, use_prepared_statements=>false>
  Error: No space left on device - <STDERR>
  Exception: Errno::ENOSPC
  Stack: org/jruby/RubyIO.java:1477:in `write'
org/jruby/RubyIO.java:1432:in `write'
org/jruby/common/RubyWarnings.java:229:in `warn'
org/jruby/common/RubyWarnings.java:106:in `warn'
org/jruby/common/RubyWarnings.java:143:in `warn'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77:in `block in initialize'
org/jruby/RubyEnumerable.java:695:in `find'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77:in `block in initialize'
org/jruby/RubyArray.java:1814:in `each'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:73:in `initialize'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:604:in `initialize'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler.rb:629:in `do_schedule'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler.rb:244:in `cron'
/home/andrea/workspace/logstash_andsel/vendor/bundle/jruby/2.5.0/gems/logstash-integration-jdbc-5.0.1/lib/logstash/inputs/jdbc.rb:276:in `run'
/home/andrea/workspace/logstash_andsel/logstash-core/lib/logstash/java_pipeline.rb:330:in `inputworker'
/home/andrea/workspace/logstash_andsel/logstash-core/lib/logstash/java_pipeline.rb:321:in `block in start_input'

use the jps and jstack tool to verify the rising of threads

  • in your JAVA_HOME use the jps to find Logstash pid
  • jstack <pid> or other monitor tool to see the increment of number of threads

cleanup

sudo umount /loopfs
sudo losetup -d /dev/loop<n>
@andsel
Copy link
Contributor Author

andsel commented Mar 6, 2020

The problem is that in https://github.com/elastic/logstash/blob/e9c9865f4066b54048f8d708612a72d25e2fe5d9/logstash-core/lib/logstash/java_pipeline.rb#L329-L358 if the plugin.run fails then there is an invocation to do_close and then a retry. The retry it's because we don't distinguish transient from permanent errors in run, and so keep retrying. But if the plugin doesn't implement the do_close to do the cleanup of allocated resources, we have a leakage like in this case.
Possible solutions (exclusive):

  • move the creation of the scheduler into register (don't implement any cleanup in close, else the run can't never run correctly again)
  • implement the do_close to cleanup the scheduler created in run

andsel added a commit to andsel/logstash-integration-jdbc that referenced this issue Apr 24, 2020
close method is used to clean in case of error in run phase of the plugin, implemented to avoid leak of threads generated by Rufus scheduler.

Fixes logstash-plugins#21
andsel added a commit to andsel/logstash-integration-jdbc that referenced this issue Apr 27, 2020
close method is used to clean in case of error in run phase of the plugin, implemented to avoid leak of threads generated by Rufus scheduler.

Fixes logstash-plugins#21
Close logstash-plugins#28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant