Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat async publisher support #782

Merged
merged 1 commit into from
Jan 25, 2016

Conversation

urso
Copy link

@urso urso commented Jan 20, 2016

Add new config option to have filebeat publisher pipeline run in async mode.
This can benefit load-balancing performance, as the publisher pipeline will
be kept more busy with new bulk-events to be published.

@@ -307,6 +307,15 @@ filebeat:
-------------------------------------------------------------------------------------


===== publish_async

Experimental!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was an asciidoc tag for that? @dedemorton ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of one, but I'll ask those who are wiser in the ways of asciidoc. @debadair or @palecur Do you know if there is a tag that we could use here for experimental config options? Do we have a convention for documenting experimental options?

@urso By experimental, I assume we mean that customers can try the option, but their mileage may vary (and there's no guarantee that the option will be supported in future versions). If so, maybe we need a sentence that basically conveys "use at your own risk."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the sources from here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html The experimental[] string seems to generate a warning. Not sure if that works well in this context, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By experimental, I assume we mean that customers can try the option, but their mileage may vary

Jup. It's a trade-off between throughput and memory usage when using load-balancing. In future versions I'd consider to remove the flag and enable async mode whenever load-balancing is enabled in output plugins.

@ruflin
Copy link
Member

ruflin commented Jan 20, 2016

LGTM

// collect collects finished bulk-Events in order and forward processed batches
// to registrar. Reports to registrar are guaranteed to be in same order
// as bulk-Events have been received by the spooler
func (p *asyncLogPublisher) collect() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how we could test if this works as expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by having all filebeat system tests use both modes. I think we can easily craft unit test for sync and async mode.

@tsg
Copy link
Contributor

tsg commented Jan 20, 2016

LGTM

@ruflin
Copy link
Member

ruflin commented Jan 21, 2016

@urso @tsg I would currently suggest to completely remove this feature from the docs and config as it is experimental. We can point people with a specific performance issue specifically to this feature.

@tsg
Copy link
Contributor

tsg commented Jan 21, 2016

IMHO, depends on when we release this. If we put it in 1.2, then it should be default off and maybe even undocumented (although I don't particularly like having undocumented options). But if we put it in 2.0-beta1, then we could even make it the default. That's exactly what the betas are for, after all, and we'd lose feedback if we hide the option.

@urso is there a particular reason not to be confident about this?

@ruflin
Copy link
Member

ruflin commented Jan 21, 2016

@tsg Good point. 👍

@urso
Copy link
Author

urso commented Jan 21, 2016

If something goes wrong, we've got a very bad memory leak. But from my manual testings all fine. Will add some unit tests to check the filebeat publisher modes working as expected.

There is no actual reason to have this feature experimental. But users might be unhappy about increased CPU/memory usage if enabled by default. Will need to run some more tests with recent changes in publisher pipeline, as these might have reduced overall memory usage for async mode.

@urso urso force-pushed the enh/filebeat-async-spooler branch from 088976f to 0702590 Compare January 21, 2016 21:59
@urso
Copy link
Author

urso commented Jan 21, 2016

@ruflin @tsg can you review again. Added unit tests for publisher also testing registrar gets data in correct order.

if ctx.sync {
func (c *client) getClient(opts []ClientOption) (Context, eventPublisher) {
ctx := makeContext(opts)
if ctx.Sync {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code makes it look like, Async is the default (which I think it isn't). The code is correct, just my first impression.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code you are looking at is in libbeat. Yes in libbeat the default is Async. It is filebeat passing the Sync flag enforing the publisher to do a blocking sync call.

@ruflin
Copy link
Member

ruflin commented Jan 22, 2016

LGTM

- Add new config option to have filebeat publisher pipeline run in async mode.
This can benefit load-balancing performance, as the publisher pipeline will
be kept more busy with new bulk-events to be published.

- Add test filebeat publisher sync/async mode do process all events + keep
correct order when forwarding finished events to registrar.

- Start exposing publisher Context and message type. We might see some more
fields exposed in future, to make it more easy to hook into publisher pipeline
in libbeat.
@urso urso force-pushed the enh/filebeat-async-spooler branch from c34fa38 to a6c2dc1 Compare January 22, 2016 15:04
tsg added a commit that referenced this pull request Jan 25, 2016
@tsg tsg merged commit 7a2aeeb into elastic:master Jan 25, 2016
@urso urso deleted the enh/filebeat-async-spooler branch January 27, 2016 12:48
@cleesmith
Copy link

Just read about this on the blog:

This helps with the overall throughput by enabling load balancing between multiple 
output threads, at the cost of memory usage. With the right settings and enough 
memory and CPU power, we’ve seen Filebeat pushing around 45K events/s, 
compared to around 18K before this change.

Maybe this isn't the right place to ask, but how would I recreate such a test with the timings mentioned.
Is there a repeatable process in place? I ask, because I would like to do this for unifiedbeat but
without logstash in between elasticsearch. Of course, IDS alerts are no where near as voluminous
as syslogs, but it would be nice to be as performant as possible with indexing data.
Thanks.

@urso
Copy link
Author

urso commented Jan 27, 2016

No fully fledged test framework in place yet. For filebeat testing so far we've got a set of config files with different output options like console, file and logstash output. When using console output we do pipe the output to /dev/null. Logstash itself is configured with beats input only and null output. As input we're using NASA HTTP logs

Timings are collected as described in this post. Difference is I'm using topbeat and govarbeat to collect CPU/memory/throughput stats to elasticsearch/file/console for example.

@cleesmith
Copy link

Thanks I will give that a try. I guess /dev/null would indicate the fastest that's possible--if not a bit unrealistic (no ACKing/waiting), but still valuable to know.

@urso
Copy link
Author

urso commented Jan 27, 2016

yes, using /dev/null to get an idea how fast we can get, as outputs always generate some kind of back presssure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants