Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize the exception behaviour for inputs outputs and filters #2386

Closed
wants to merge 7 commits into from

Conversation

jsvd
Copy link
Member

@jsvd jsvd commented Jan 21, 2015

Right now an exception on an input plugin will cause it to restart indefinitely, while an exception in an output plugin halts logstash immediately.

[EDITED] Taking @jordansissel's comments into account, this pull request makes all plugins have the same behaviour:
If a plugin raises a StandardError, worker catches and retries the method. An Exception makes logstash crash.

This might not be the desired behaviour we want, so this PR is more a WIP, aiming at consistent behaviour across plugins.
Also this adds tests to the pipeline for this behaviour.

Issues related to this problem:
#2152
#2130
#1250
#1373

fixes #2477

@@ -114,4 +130,64 @@ class TestPipeline < LogStash::Pipeline
end
end
end

context "when filters raise exceptions" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when filters plugins raise exceptions ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@wiibaa
Copy link
Contributor

wiibaa commented Jan 26, 2015

Also related to #1373

@jordansissel
Copy link
Contributor

First off, thanks for helping start this discussion more visibly outside our little team :)

If the plugin raises an exception, the teardown for all plugins will be called and the exception message will be printed.

I'm not in favor of this behavior for two reasons. The first is historical behavior and reason behind it. The second pertains to clustering and the future.

First, history. There's tons of external libraries of varying stability (in the library and in the integration with Logstash) that Logstash depends on that mean uncaught exceptions would cause Logstash to terminate. My original idea was that you could write simpler plugins and worry less about intermittent failures (via exceptions) by simply restarting the plugin when it failed. This was most effectively done for the input plugins (because they have an obvious "start" behavior), and less so for filters and outputs. This "restart it when it fails" is common in process managers like daemontools, upstart, systemd, and runit; it also is a central goal of Erlang's supervisor behaviors.

Second, future! If we terminate logstash on any uncaught error, then how does this impact things when Logstash is a cluster? I don't think the cluster should terminate with one node having trouble, right? And what of multitenant cluster deployments where one user could accidentally push a bad configuration (or plugin!) that causes errors in that user's log flow - we don't want to interrupt any flows by other tenants.

My practice in operations for the past few years has always been to run applications in a kind of "do_thing while true" to restart a thing whenever it exits (on error, or otherwise, always restart it!). In Logstash's case, this follows with my desire to always restart components on failures. For permanent failures, an event could be poisonous and we need to detect these situatinos and direct them to some kind of store for later use (dead letter queue, etc), but the default should be to restart the failed component and only direct any poisonous input (events) do a safe place after several restart attempts have failed.

@jsvd jsvd force-pushed the fix/output_thread_lock branch from eb8bbb0 to 1cfa954 Compare February 2, 2015 16:37
@jsvd jsvd self-assigned this Feb 2, 2015
@jsvd
Copy link
Member Author

jsvd commented Feb 2, 2015

test this please

@@ -302,4 +295,8 @@ def flush_filters_to_output!(options = {})
end
end # flush_filters_to_output!

private
def print_exception_information(exception)
@logger.error("Restarting worker: #{exception} => #{exception.backtrace}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@logger.error("Exception information", :exception => exception, :backtrace => exception.backtrace)

@lexelby
Copy link

lexelby commented Feb 10, 2015

Great stuff here. Thanks, all!

@jsvd
Copy link
Member Author

jsvd commented Feb 26, 2015

TODO include concern of #1588

@suyograo suyograo modified the milestones: v1.6.0, 1.5.1 Feb 26, 2015
jsvd added 7 commits March 17, 2015 13:22
input, filter and output workers should retry on a StandardError.
This PR adds that behaviour and tests.

It also changes Thread.abort_on_exception to false so that the pipeline.run
method catches the exceptions on other threads through `thread.join`.
Otherwise an Exception on a thread directly calls exit and logstash terminates
@jsvd jsvd force-pushed the fix/output_thread_lock branch from ef560ab to 7ee652d Compare March 17, 2015 20:50
break if event.is_a?(LogStash::ShutdownEvent)
output(event)
end # while true

rescue => e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rescue StandardError => e

@joekiller
Copy link
Contributor

I'm going to try to test this as I believe I'm having a problem similar to that described in #2130.

@tbragin tbragin removed this from the v1.5.1 milestone Jun 10, 2015
@guyboertje
Copy link
Contributor

See #4064

@guyboertje
Copy link
Contributor

Please continue discussion here #4127

@suyograo
Copy link
Contributor

super old stuff. closing

@suyograo suyograo closed this May 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shutdown Semantics: Exception handling in the pipeline
10 participants