Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.5 performance regression fixes #2869

Closed
wants to merge 1 commit into from
Closed

Conversation

colinsurprenant
Copy link
Contributor

performance fixes to the pipeline and filters and output code generation

this is also the fix/perf_regression share branch

@colinsurprenant
Copy link
Contributor Author

here's what the generated filter function looks like across versions for the following config:

filter{if [foo] == 1 {clone{}} if [bar] == 1 {clone{}} clone{}}

1.4.2

  @filter_func = lambda do |event, &block|
    extra_events = []
    @logger.debug? && @logger.debug("filter received", :event => event.to_hash)

    if ((event["[foo]"] == 1))
      newevents = []
      extra_events.each do |event|
        @filter_clone_2.filter(event) do |newevent|
          newevents << newevent
        end
      end
      extra_events += newevents
      @filter_clone_2.filter(event) do |newevent|
        extra_events << newevent
      end
      if event.cancelled?
        extra_events.each(&block)
        return
      end
    end

    if ((event["[bar]"] == 1))
      newevents = []
      extra_events.each do |event|
        @filter_clone_3.filter(event) do |newevent|
          newevents << newevent
        end
      end
      extra_events += newevents
      @filter_clone_3.filter(event) do |newevent|
        extra_events << newevent
      end
      if event.cancelled?
        extra_events.each(&block)
        return
      end
    end

    newevents = []
    extra_events.each do |event|
      @filter_clone_4.filter(event) do |newevent|
        newevents << newevent
      end
    end
    extra_events += newevents
    @filter_clone_4.filter(event) do |newevent|
      extra_events << newevent
    end
    if event.cancelled?
      extra_events.each(&block)
      return
    end

    extra_events.each(&block)
  end

1.5.0-rc2

  @filter_func = lambda do |event, &block|
    events = [event]
    @logger.debug? && @logger.debug("filter received", :event => event.to_hash)

    events = events.flat_map do |event|
      events = [event]
      if ((event["[foo]"] == 1)) # if [foo] == 1
        events = events.flat_map do |event|
          next [] if event.cancelled?

          new_events = []
          @filter_clone_2.filter(event){|new_event| new_events << new_event}
          event.cancelled? ? new_events : new_events.unshift(event)
        end
      end
      events
    end

    events = events.flat_map do |event|
      events = [event]
      if ((event["[bar]"] == 1)) # if [bar] == 1
        events = events.flat_map do |event|
          next [] if event.cancelled?

          new_events = []
          @filter_clone_3.filter(event){|new_event| new_events << new_event}
          event.cancelled? ? new_events : new_events.unshift(event)
        end
      end
      events
    end

    events = events.flat_map do |event|
      next [] if event.cancelled?

      new_events = []
      @filter_clone_4.filter(event){|new_event| new_events << new_event}
      event.cancelled? ? new_events : new_events.unshift(event)
    end

    events.flatten.each{|e| block.call(e) }
  end

1.5.0 fix/perf_regression

  def filter_func(event)
    events = [event]
    @logger.debug? && @logger.debug("filter received", :event => event.to_hash)

    if ((event["[foo]"] == 1)) # if [foo] == 1
      events = @filter_clone_2.multi_filter(events)
    end

    if ((event["[bar]"] == 1)) # if [bar] == 1
      events = @filter_clone_3.multi_filter(events)
    end

    events = @filter_clone_4.multi_filter(events)
    events
  end

the new generated code is a lot more straightforward and easier to understand. note the multi_filter method which was introduced in LogStash::Filters::Base.

  def multi_filter(events)
    result = []
    events.each do |event|
      result << event
      filter(event){|new_event| result << new_event}
    end
    result
  end

also the cancelled event handling has been moved up in the LogStash::Pipeline#filterworker with

filter_func(event).each { |e| @filter_to_output.push(e) unless e.cancelled? }

eventually (post 1.5) I'd like to get rid of the filter method block argument as described in #2872

@colinsurprenant
Copy link
Contributor Author

here's what the generated output function looks like across versions for the following config:

output { if [foo] == 1 {stdout{codec => dots}} if [bar] == 1 {stdout{codec => dots}} stdout{codec => dots}}

1.4.2

@output_func = lambda do |event, &block|
  @logger.debug? && @logger.debug("output received", :event => event.to_hash)

  if ((event["[foo]"] == 1))
    @output_stdout_2.handle(event)
  end

  if ((event["[bar]"] == 1))
    @output_stdout_3.handle(event)
  end

  @output_stdout_4.handle(event)
end

1.5.0-rc2

note the totally useless multiple events handling introduced here.

@output_func = lambda do |event, &block|
  events = [event]
  @logger.debug? && @logger.debug("output received", :event => event.to_hash)

  events = events.flat_map do |event|
    events = [event]
    if ((event["[foo]"] == 1)) # if [foo] == 1
      @output_stdout_2.handle(event)
    end
    events
  end

  events = events.flat_map do |event|
    events = [event]
    if ((event["[bar]"] == 1)) # if [bar] == 1
      @output_stdout_3.handle(event)
    end
    events
  end

  @output_stdout_4.handle(event)
end

1.5.0 fix/perf_regression

def output_func(event)
  @logger.debug? && @logger.debug("output received", :event => event.to_hash)

  if ((event["[foo]"] == 1)) # if [foo] == 1
    @output_stdout_2.handle(event)
  end

  if ((event["[bar]"] == 1)) # if [bar] == 1
    @output_stdout_3.handle(event)
  end

  @output_stdout_4.handle(event)
end

@colinsurprenant
Copy link
Contributor Author

I will be adding some specs but I'd appreciate a review anyway @jsvd @ph please?

end
result
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on this? I am not sure what it exactly does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally! on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ph
Copy link
Contributor

ph commented Mar 24, 2015

I did a small sanity check:

  1. the code is easier to read.
  2. I am a bit worried a about the removal of the flat_map inside the filter code.

@jsvd @suyograo can you look at this?

@ph
Copy link
Contributor

ph commented Mar 24, 2015

I did not run the test and we don't have enough code harness for the pipeline..

@colinsurprenant
Copy link
Contributor Author

see inline comment for flat_map. it is not required anymore.

@ph
Copy link
Contributor

ph commented Mar 24, 2015

@colinsurprenant ty for correctly answering my questions, I have no doubt it is faster than what we have before with the removal of the lambda. But lets get more eyes into it :)

@@ -16,6 +16,13 @@
class LogStash::ShutdownEvent; end
class LogStash::FlushEvent; end

module LogStash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add an extra module level here to add a "subject" example:

module LogStash::Signal
 FLUSH = LogStash::FlushEvent.new
 SHUTDOWN = LogStash::ShutdownEvent.new
end

Then:

@input_to_filter.push(LogStash::Signal::SHUTDOWN)

or

@input_to_filter.push(Signal::SHUTDOWN) since we're already inside the namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion but unfortunately we cannot do this without updating all plugins which rely on LogStash::SHUTDOWN, see #2863 which is exactly what happened. I suggest you create a new issue and we can tackle this in >= 1.6. It could be done at the same time we do #2872

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum..around 10 plugins use it https://github.com/search?p=1&q=user%3Alogstash-plugins+LogStash%3A%3ASHUTDOWN&ref=searchresults&type=Code&utf8=%E2%9C%93 (I couldn't find a more accurate search for this)

Alright, lets postpone this to > 1.5

@jsvd
Copy link
Member

jsvd commented Mar 26, 2015

wow, I love this refactor @colinsurprenant!
So much simpler: concerns are better separated, generates less code, easier to understand, great job ❤️ ❤️
Pointed out 2 minor (subjective things) but otherwise LGTM (also, tests pass).

@colinsurprenant
Copy link
Contributor Author

I will remove the WIP status and proceed to the merge once @ph and @jsvd do a final review for the last optimization commit 99484fc and the 2 spec commits 4529671 688cf28

@colinsurprenant colinsurprenant changed the title [WIP] 1.5 performance regression fixes 1.5 performance regression fixes Mar 26, 2015
@talevy
Copy link
Contributor

talevy commented Mar 26, 2015

If I understood the changes correctly, I like it! definitely a clearer filter function is generated, and removing the need to create new shutdown event objects each time is nice. output is a step or two shorter! beyond what @ph and @jsvd mention, I don't see anything that jumps out at me. I have not tested it though. go go performant colin!

@colinsurprenant
Copy link
Contributor Author

I just updated the performance numbers in #2870 to reflect all latests commits.

@suyograo
Copy link
Contributor

+1 to merge. nice work @colinsurprenant

@jsvd
Copy link
Member

jsvd commented Mar 26, 2015

LGTM

@ph
Copy link
Contributor

ph commented Mar 27, 2015

LGTM tests are running :shipit:

rework output function init

removed double logger.debug

use event constant

fix LogStash::SHUTDOWN to actually be the shutdown signal event push in the pipeline

fix outputs conditionals handling

code gen methods not lambdas, remove intermediate arrays, base multi_filter to simplify code gen

remove block from filter_func

force_encoding at compile time

commented multi_filter method

typo

DRY & cleanups

optimizations: string constants, timestamp initialization, useless logger instance

specs for signal events

Filters::Base interface and multi_filter method

closes #2870
@elasticsearch-bot
Copy link

Merged sucessfully into 1.5 master!

colinsurprenant added a commit that referenced this pull request Mar 30, 2015
rework output function init

removed double logger.debug

use event constant

fix LogStash::SHUTDOWN to actually be the shutdown signal event push in the pipeline

fix outputs conditionals handling

code gen methods not lambdas, remove intermediate arrays, base multi_filter to simplify code gen

remove block from filter_func

force_encoding at compile time

commented multi_filter method

typo

DRY & cleanups

optimizations: string constants, timestamp initialization, useless logger instance

specs for signal events

Filters::Base interface and multi_filter method

closes #2870

Fixes #2869
@suyograo suyograo deleted the fix/perf_regression branch December 16, 2015 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants