Filebeat async publisher support #782

urso · 2016-01-20T14:29:26Z

Add new config option to have filebeat publisher pipeline run in async mode.
This can benefit load-balancing performance, as the publisher pipeline will
be kept more busy with new bulk-events to be published.

tsg · 2016-01-20T16:15:52Z

filebeat/docs/configuration.asciidoc

@@ -307,6 +307,15 @@ filebeat:
 -------------------------------------------------------------------------------------


+===== publish_async
+
+Experimental!


I think there was an asciidoc tag for that? @dedemorton ?

I don't know of one, but I'll ask those who are wiser in the ways of asciidoc. @debadair or @palecur Do you know if there is a tag that we could use here for experimental config options? Do we have a convention for documenting experimental options?

@urso By experimental, I assume we mean that customers can try the option, but their mileage may vary (and there's no guarantee that the option will be supported in future versions). If so, maybe we need a sentence that basically conveys "use at your own risk."

See the sources from here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html The experimental[] string seems to generate a warning. Not sure if that works well in this context, though.

By experimental, I assume we mean that customers can try the option, but their mileage may vary

Jup. It's a trade-off between throughput and memory usage when using load-balancing. In future versions I'd consider to remove the flag and enable async mode whenever load-balancing is enabled in output plugins.

ruflin · 2016-01-20T20:50:12Z

LGTM

ruflin · 2016-01-20T20:50:40Z

filebeat/beat/publish.go

+// collect collects finished bulk-Events in order and forward processed batches
+// to registrar. Reports to registrar are guaranteed to be in same order
+// as bulk-Events have been received by the spooler
+func (p *asyncLogPublisher) collect() {


Any idea how we could test if this works as expected?

by having all filebeat system tests use both modes. I think we can easily craft unit test for sync and async mode.

tsg · 2016-01-20T22:04:36Z

LGTM

ruflin · 2016-01-21T08:27:59Z

@urso @tsg I would currently suggest to completely remove this feature from the docs and config as it is experimental. We can point people with a specific performance issue specifically to this feature.

tsg · 2016-01-21T08:40:50Z

IMHO, depends on when we release this. If we put it in 1.2, then it should be default off and maybe even undocumented (although I don't particularly like having undocumented options). But if we put it in 2.0-beta1, then we could even make it the default. That's exactly what the betas are for, after all, and we'd lose feedback if we hide the option.

@urso is there a particular reason not to be confident about this?

ruflin · 2016-01-21T09:28:13Z

@tsg Good point. 👍

urso · 2016-01-21T11:53:25Z

If something goes wrong, we've got a very bad memory leak. But from my manual testings all fine. Will add some unit tests to check the filebeat publisher modes working as expected.

There is no actual reason to have this feature experimental. But users might be unhappy about increased CPU/memory usage if enabled by default. Will need to run some more tests with recent changes in publisher pipeline, as these might have reduced overall memory usage for async mode.

urso · 2016-01-21T22:15:21Z

@ruflin @tsg can you review again. Added unit tests for publisher also testing registrar gets data in correct order.

ruflin · 2016-01-22T08:16:30Z

libbeat/publisher/client.go

-	if ctx.sync {
+func (c *client) getClient(opts []ClientOption) (Context, eventPublisher) {
+	ctx := makeContext(opts)
+	if ctx.Sync {


This code makes it look like, Async is the default (which I think it isn't). The code is correct, just my first impression.

The code you are looking at is in libbeat. Yes in libbeat the default is Async. It is filebeat passing the Sync flag enforing the publisher to do a blocking sync call.

ruflin · 2016-01-22T08:17:05Z

LGTM

- Add new config option to have filebeat publisher pipeline run in async mode. This can benefit load-balancing performance, as the publisher pipeline will be kept more busy with new bulk-events to be published. - Add test filebeat publisher sync/async mode do process all events + keep correct order when forwarding finished events to registrar. - Start exposing publisher Context and message type. We might see some more fields exposed in future, to make it more easy to hook into publisher pipeline in libbeat.

Filebeat async publisher support

cleesmith · 2016-01-27T16:16:47Z

Just read about this on the blog:

This helps with the overall throughput by enabling load balancing between multiple 
output threads, at the cost of memory usage. With the right settings and enough 
memory and CPU power, we’ve seen Filebeat pushing around 45K events/s, 
compared to around 18K before this change.

Maybe this isn't the right place to ask, but how would I recreate such a test with the timings mentioned.
Is there a repeatable process in place? I ask, because I would like to do this for unifiedbeat but
without logstash in between elasticsearch. Of course, IDS alerts are no where near as voluminous
as syslogs, but it would be nice to be as performant as possible with indexing data.
Thanks.

urso · 2016-01-27T16:24:26Z

No fully fledged test framework in place yet. For filebeat testing so far we've got a set of config files with different output options like console, file and logstash output. When using console output we do pipe the output to /dev/null. Logstash itself is configured with beats input only and null output. As input we're using NASA HTTP logs

Timings are collected as described in this post. Difference is I'm using topbeat and govarbeat to collect CPU/memory/throughput stats to elasticsearch/file/console for example.

cleesmith · 2016-01-27T16:38:51Z

Thanks I will give that a try. I guess /dev/null would indicate the fastest that's possible--if not a bit unrealistic (no ACKing/waiting), but still valuable to know.

urso · 2016-01-27T21:41:40Z

yes, using /dev/null to get an idea how fast we can get, as outputs always generate some kind of back presssure.

urso force-pushed the enh/filebeat-async-spooler branch from b1fc9bd to 8d69148 Compare January 20, 2016 14:30

urso added enhancement review Filebeat Filebeat labels Jan 20, 2016

urso force-pushed the enh/filebeat-async-spooler branch from 8d69148 to 088976f Compare January 20, 2016 15:59

tsg reviewed Jan 20, 2016
View reviewed changes

ruflin reviewed Jan 20, 2016
View reviewed changes

urso force-pushed the enh/filebeat-async-spooler branch from 088976f to 0702590 Compare January 21, 2016 21:59

ruflin reviewed Jan 22, 2016
View reviewed changes

urso force-pushed the enh/filebeat-async-spooler branch from c34fa38 to a6c2dc1 Compare January 22, 2016 15:04

tsg added a commit that referenced this pull request Jan 25, 2016

Merge pull request #782 from urso/enh/filebeat-async-spooler

7a2aeeb

Filebeat async publisher support

tsg merged commit 7a2aeeb into elastic:master Jan 25, 2016

urso deleted the enh/filebeat-async-spooler branch January 27, 2016 12:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filebeat async publisher support #782

Filebeat async publisher support #782

urso commented Jan 20, 2016

tsg Jan 20, 2016

dedemorton Jan 20, 2016

tsg Jan 20, 2016

urso Jan 20, 2016

ruflin commented Jan 20, 2016

ruflin Jan 20, 2016

urso Jan 20, 2016

tsg commented Jan 20, 2016

ruflin commented Jan 21, 2016

tsg commented Jan 21, 2016

ruflin commented Jan 21, 2016

urso commented Jan 21, 2016

urso commented Jan 21, 2016

ruflin Jan 22, 2016

urso Jan 22, 2016

ruflin commented Jan 22, 2016

cleesmith commented Jan 27, 2016

urso commented Jan 27, 2016

cleesmith commented Jan 27, 2016

urso commented Jan 27, 2016

Filebeat async publisher support #782

Filebeat async publisher support #782

Conversation

urso commented Jan 20, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Jan 20, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsg commented Jan 20, 2016

ruflin commented Jan 21, 2016

tsg commented Jan 21, 2016

ruflin commented Jan 21, 2016

urso commented Jan 21, 2016

urso commented Jan 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Jan 22, 2016

cleesmith commented Jan 27, 2016

urso commented Jan 27, 2016

cleesmith commented Jan 27, 2016

urso commented Jan 27, 2016