1.5 performance regression fixes #2869

colinsurprenant · 2015-03-20T22:52:15Z

performance fixes to the pipeline and filters and output code generation

this is also the fix/perf_regression share branch

colinsurprenant · 2015-03-23T18:22:37Z

here's what the generated filter function looks like across versions for the following config:

filter{if [foo] == 1 {clone{}} if [bar] == 1 {clone{}} clone{}}

1.4.2

  @filter_func = lambda do |event, &block|
    extra_events = []
    @logger.debug? && @logger.debug("filter received", :event => event.to_hash)

    if ((event["[foo]"] == 1))
      newevents = []
      extra_events.each do |event|
        @filter_clone_2.filter(event) do |newevent|
          newevents << newevent
        end
      end
      extra_events += newevents
      @filter_clone_2.filter(event) do |newevent|
        extra_events << newevent
      end
      if event.cancelled?
        extra_events.each(&block)
        return
      end
    end

    if ((event["[bar]"] == 1))
      newevents = []
      extra_events.each do |event|
        @filter_clone_3.filter(event) do |newevent|
          newevents << newevent
        end
      end
      extra_events += newevents
      @filter_clone_3.filter(event) do |newevent|
        extra_events << newevent
      end
      if event.cancelled?
        extra_events.each(&block)
        return
      end
    end

    newevents = []
    extra_events.each do |event|
      @filter_clone_4.filter(event) do |newevent|
        newevents << newevent
      end
    end
    extra_events += newevents
    @filter_clone_4.filter(event) do |newevent|
      extra_events << newevent
    end
    if event.cancelled?
      extra_events.each(&block)
      return
    end

    extra_events.each(&block)
  end

1.5.0-rc2

  @filter_func = lambda do |event, &block|
    events = [event]
    @logger.debug? && @logger.debug("filter received", :event => event.to_hash)

    events = events.flat_map do |event|
      events = [event]
      if ((event["[foo]"] == 1)) # if [foo] == 1
        events = events.flat_map do |event|
          next [] if event.cancelled?

          new_events = []
          @filter_clone_2.filter(event){|new_event| new_events << new_event}
          event.cancelled? ? new_events : new_events.unshift(event)
        end
      end
      events
    end

    events = events.flat_map do |event|
      events = [event]
      if ((event["[bar]"] == 1)) # if [bar] == 1
        events = events.flat_map do |event|
          next [] if event.cancelled?

          new_events = []
          @filter_clone_3.filter(event){|new_event| new_events << new_event}
          event.cancelled? ? new_events : new_events.unshift(event)
        end
      end
      events
    end

    events = events.flat_map do |event|
      next [] if event.cancelled?

      new_events = []
      @filter_clone_4.filter(event){|new_event| new_events << new_event}
      event.cancelled? ? new_events : new_events.unshift(event)
    end

    events.flatten.each{|e| block.call(e) }
  end

1.5.0 fix/perf_regression

  def filter_func(event)
    events = [event]
    @logger.debug? && @logger.debug("filter received", :event => event.to_hash)

    if ((event["[foo]"] == 1)) # if [foo] == 1
      events = @filter_clone_2.multi_filter(events)
    end

    if ((event["[bar]"] == 1)) # if [bar] == 1
      events = @filter_clone_3.multi_filter(events)
    end

    events = @filter_clone_4.multi_filter(events)
    events
  end

the new generated code is a lot more straightforward and easier to understand. note the multi_filter method which was introduced in LogStash::Filters::Base.

  def multi_filter(events)
    result = []
    events.each do |event|
      result << event
      filter(event){|new_event| result << new_event}
    end
    result
  end

also the cancelled event handling has been moved up in the LogStash::Pipeline#filterworker with

filter_func(event).each { |e| @filter_to_output.push(e) unless e.cancelled? }

eventually (post 1.5) I'd like to get rid of the filter method block argument as described in #2872

colinsurprenant · 2015-03-23T18:36:52Z

here's what the generated output function looks like across versions for the following config:

output { if [foo] == 1 {stdout{codec => dots}} if [bar] == 1 {stdout{codec => dots}} stdout{codec => dots}}

1.4.2

@output_func = lambda do |event, &block|
  @logger.debug? && @logger.debug("output received", :event => event.to_hash)

  if ((event["[foo]"] == 1))
    @output_stdout_2.handle(event)
  end

  if ((event["[bar]"] == 1))
    @output_stdout_3.handle(event)
  end

  @output_stdout_4.handle(event)
end

1.5.0-rc2

note the totally useless multiple events handling introduced here.

@output_func = lambda do |event, &block|
  events = [event]
  @logger.debug? && @logger.debug("output received", :event => event.to_hash)

  events = events.flat_map do |event|
    events = [event]
    if ((event["[foo]"] == 1)) # if [foo] == 1
      @output_stdout_2.handle(event)
    end
    events
  end

  events = events.flat_map do |event|
    events = [event]
    if ((event["[bar]"] == 1)) # if [bar] == 1
      @output_stdout_3.handle(event)
    end
    events
  end

  @output_stdout_4.handle(event)
end

1.5.0 fix/perf_regression

def output_func(event)
  @logger.debug? && @logger.debug("output received", :event => event.to_hash)

  if ((event["[foo]"] == 1)) # if [foo] == 1
    @output_stdout_2.handle(event)
  end

  if ((event["[bar]"] == 1)) # if [bar] == 1
    @output_stdout_3.handle(event)
  end

  @output_stdout_4.handle(event)
end

colinsurprenant · 2015-03-23T18:39:37Z

I will be adding some specs but I'd appreciate a review anyway @jsvd @ph please?

ph · 2015-03-23T18:46:04Z

lib/logstash/filters/base.rb

+    end
+    result
+  end
+


Can you add a comment on this? I am not sure what it exactly does?

totally! on it.

ph · 2015-03-24T13:30:04Z

I did a small sanity check:

the code is easier to read.
I am a bit worried a about the removal of the flat_map inside the filter code.

@jsvd @suyograo can you look at this?

ph · 2015-03-24T13:32:29Z

I did not run the test and we don't have enough code harness for the pipeline..

colinsurprenant · 2015-03-24T13:33:50Z

see inline comment for flat_map. it is not required anymore.

ph · 2015-03-24T13:42:13Z

@colinsurprenant ty for correctly answering my questions, I have no doubt it is faster than what we have before with the removal of the lambda. But lets get more eyes into it :)

jsvd · 2015-03-26T14:42:29Z

lib/logstash/event.rb

@@ -16,6 +16,13 @@
 class LogStash::ShutdownEvent; end
 class LogStash::FlushEvent; end

+module LogStash


I'd add an extra module level here to add a "subject" example:

module LogStash::Signal FLUSH = LogStash::FlushEvent.new SHUTDOWN = LogStash::ShutdownEvent.new end

Then:

@input_to_filter.push(LogStash::Signal::SHUTDOWN)

or

@input_to_filter.push(Signal::SHUTDOWN) since we're already inside the namespace.

Good suggestion but unfortunately we cannot do this without updating all plugins which rely on LogStash::SHUTDOWN, see #2863 which is exactly what happened. I suggest you create a new issue and we can tackle this in >= 1.6. It could be done at the same time we do #2872

Hum..around 10 plugins use it https://github.com/search?p=1&q=user%3Alogstash-plugins+LogStash%3A%3ASHUTDOWN&ref=searchresults&type=Code&utf8=%E2%9C%93 (I couldn't find a more accurate search for this)

Alright, lets postpone this to > 1.5

jsvd · 2015-03-26T15:32:25Z

wow, I love this refactor @colinsurprenant!
So much simpler: concerns are better separated, generates less code, easier to understand, great job ❤️ ❤️
Pointed out 2 minor (subjective things) but otherwise LGTM (also, tests pass).

colinsurprenant · 2015-03-26T19:45:40Z

I will remove the WIP status and proceed to the merge once @ph and @jsvd do a final review for the last optimization commit 99484fc and the 2 spec commits 4529671 688cf28

talevy · 2015-03-26T20:08:23Z

If I understood the changes correctly, I like it! definitely a clearer filter function is generated, and removing the need to create new shutdown event objects each time is nice. output is a step or two shorter! beyond what @ph and @jsvd mention, I don't see anything that jumps out at me. I have not tested it though. go go performant colin!

colinsurprenant · 2015-03-26T21:35:37Z

I just updated the performance numbers in #2870 to reflect all latests commits.

suyograo · 2015-03-26T22:13:43Z

+1 to merge. nice work @colinsurprenant

jsvd · 2015-03-26T22:15:34Z

LGTM

ph · 2015-03-27T14:04:34Z

LGTM tests are running

rework output function init removed double logger.debug use event constant fix LogStash::SHUTDOWN to actually be the shutdown signal event push in the pipeline fix outputs conditionals handling code gen methods not lambdas, remove intermediate arrays, base multi_filter to simplify code gen remove block from filter_func force_encoding at compile time commented multi_filter method typo DRY & cleanups optimizations: string constants, timestamp initialization, useless logger instance specs for signal events Filters::Base interface and multi_filter method closes #2870

elasticsearch-bot · 2015-03-30T20:07:59Z

Merged sucessfully into 1.5 master!

rework output function init removed double logger.debug use event constant fix LogStash::SHUTDOWN to actually be the shutdown signal event push in the pipeline fix outputs conditionals handling code gen methods not lambdas, remove intermediate arrays, base multi_filter to simplify code gen remove block from filter_func force_encoding at compile time commented multi_filter method typo DRY & cleanups optimizations: string constants, timestamp initialization, useless logger instance specs for signal events Filters::Base interface and multi_filter method closes #2870 Fixes #2869

colinsurprenant added enhancement work in progress v1.5.0 blocker labels Mar 20, 2015

colinsurprenant self-assigned this Mar 20, 2015

This was referenced Mar 20, 2015

Service fails to stop (v1.5.0.rc2) #2796

Closed

wrong shutdown event object handling in plugins #2863

Closed

1.5 performance regressions #2870

Closed

refactor filter plugin filter method contract #2872

Closed

ph reviewed Mar 23, 2015
View reviewed changes

colinsurprenant mentioned this pull request Mar 24, 2015

performance drop since v1.5.0 #2807

Closed

colinsurprenant added the performance regression label Mar 25, 2015

jsvd reviewed Mar 26, 2015
View reviewed changes

colinsurprenant changed the title ~~[WIP] 1.5 performance regression fixes~~ 1.5 performance regression fixes Mar 26, 2015

colinsurprenant mentioned this pull request Mar 27, 2015

simple pipeline config is slower on 1.5rc2 than 1.4.2 #2859

Closed

colinsurprenant force-pushed the fix/perf_regression branch from 688cf28 to 1d7d14c Compare March 30, 2015 14:44

elasticsearch-bot added the 1.5 label Mar 30, 2015

colinsurprenant closed this in dbe4154 Mar 30, 2015

This was referenced Apr 2, 2015

Shutdown Semantics: Exception handling in the pipeline #2477

Open

fix conditionals regression #2943

Closed

suyograo mentioned this pull request Apr 6, 2015

Performance comparison beta1 and rc2 #2824

Closed

suyograo mentioned this pull request Apr 24, 2015

[1.5.0-RC3] filters: undefined local variable or method event #3100

Closed

suyograo deleted the fix/perf_regression branch December 16, 2015 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.5 performance regression fixes #2869

1.5 performance regression fixes #2869

colinsurprenant commented Mar 20, 2015

colinsurprenant commented Mar 23, 2015

colinsurprenant commented Mar 23, 2015

colinsurprenant commented Mar 23, 2015

ph Mar 23, 2015

colinsurprenant Mar 23, 2015

colinsurprenant Mar 23, 2015

ph Mar 24, 2015

ph commented Mar 24, 2015

ph commented Mar 24, 2015

colinsurprenant commented Mar 24, 2015

ph commented Mar 24, 2015

jsvd Mar 26, 2015

colinsurprenant Mar 26, 2015

jsvd Mar 26, 2015

jsvd commented Mar 26, 2015

colinsurprenant commented Mar 26, 2015

talevy commented Mar 26, 2015

colinsurprenant commented Mar 26, 2015

suyograo commented Mar 26, 2015

jsvd commented Mar 26, 2015

ph commented Mar 27, 2015

elasticsearch-bot commented Mar 30, 2015

1.5 performance regression fixes #2869

1.5 performance regression fixes #2869

Conversation

colinsurprenant commented Mar 20, 2015

colinsurprenant commented Mar 23, 2015

1.4.2

1.5.0-rc2

1.5.0 fix/perf_regression

colinsurprenant commented Mar 23, 2015

1.4.2

1.5.0-rc2

1.5.0 fix/perf_regression

colinsurprenant commented Mar 23, 2015

ph Mar 23, 2015

Choose a reason for hiding this comment

colinsurprenant Mar 23, 2015

Choose a reason for hiding this comment

colinsurprenant Mar 23, 2015

Choose a reason for hiding this comment

ph Mar 24, 2015

Choose a reason for hiding this comment

ph commented Mar 24, 2015

ph commented Mar 24, 2015

colinsurprenant commented Mar 24, 2015

ph commented Mar 24, 2015

jsvd Mar 26, 2015

Choose a reason for hiding this comment

colinsurprenant Mar 26, 2015

Choose a reason for hiding this comment

jsvd Mar 26, 2015

Choose a reason for hiding this comment

jsvd commented Mar 26, 2015

colinsurprenant commented Mar 26, 2015

talevy commented Mar 26, 2015

colinsurprenant commented Mar 26, 2015

suyograo commented Mar 26, 2015

jsvd commented Mar 26, 2015

ph commented Mar 27, 2015

elasticsearch-bot commented Mar 30, 2015