Add support for clustering Logstash instances #2632

suyograo · 2015-02-17T21:47:43Z

Today, each Logstash instance is a full pipeline -- inputs, filters and outputs stages. In large-scale Logstash deployments, users run multiple instances of Logstash in order to horizontally scale event processing. This requires manual management of individual configuration files, or custom/3rd party configuration automation tools such as Puppet or Chef.

We plan to introduce a concept of a Logstash cluster, where instances can be controlled as a whole (on a cluster level), instead of being separate parts. This would entail the following features:

Provide an option to centrally store a Logstash config, which is shared across all the instances in the cluster. This would be the single source of truth for all the instances
Provide APIs to control the cluster, dynamically, to change configuration. See Provide APIs to manage pipeline #2612
Provide APIs to monitor instances at the cluster level. See Provide APIs to monitor pipeline #2611

Logstash can still be started in a single-instance, non-clustered mode; file based configuration will continue to work.

Clustering instances will also provide the necessary groundwork for potential long-term enhancements like automatic load balancing, failover, running multiple pipelines and so on.

bitsofinfo · 2015-03-12T19:31:37Z

For item 1 above, maybe design it so that everything talks through an "ConfigurationStore" abstraction which is implemented via plugins? I.e. so it could be extensible, supporting different implementations of where the "gold copy" of configuration is actually persisted and changes propagated to/from. Have different impls (i.e. ES itself, zookeeper etc)

wiibaa · 2015-03-19T12:24:18Z

@suyograo should this ticket also mention plan for encryption support or should it be done separately ?
Some background in:
https://logstash.jira.com/browse/LOGSTASH-428
https://logstash.jira.com/browse/LOGSTASH-918

webmstr · 2015-07-07T17:55:20Z

What about items that are cached in LS instance now, like the data for the elapsed{} filter?

suyograo · 2015-08-04T22:40:38Z

@bitsofinfo thats exactly our thinking...we'll make it pluggable, so its easy to add in an alternate implementation for a config store. The first implementation will use ES as a config store.

splashx · 2015-08-13T07:42:10Z

What's the recommended workaround now, specially if you need several logstash instances to monitor one folder (with several files)? As sincedb files are not shared among the instances, it's a pain in the ass to edit each .conf manually using excludes. Plus I also believe sincedb files are not recommended to be shared (one file for all) ATM as there is no concept of exclusive read/write access. Hints?

blysik · 2015-08-13T18:37:03Z

I would have a single machine which mounted and sent the files via logstash-forwarder or beaver, to a load-balanced group of logstash machines.

gh-amistry · 2015-08-14T22:37:19Z

Hey @suyograo, thanks for working on supporting clustering for Logstash. The current documentation already seems to imply that it's already available:

Alternately, increase the Elasticsearch cluster’s rate of data consumption by adding more Logstash indexing instances.

https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html

Does this refer to the same issue? Thanks!

magnusbaeck · 2015-08-15T08:12:44Z

@gh-amistry: The documentation is not meant to imply the availability of clustered Logstash instances. The preceding paragraphs in that text describe a setup where multiple Logstash instances can pull messages from a message queue. That is already available, but each Logstash instance in such a setup is independent and doesn't share any state or configuration with other instances.

splashx · 2015-08-15T15:34:15Z

@blysik that's a good workaround, until beaver/logstash-forwarder becomes the bottleneck - you'll have to stop the process, launch a second instance, create excludes and split the load. VERY not friendly.

gh-amistry · 2015-08-17T18:06:54Z

Thanks @magnusbaeck for the clarification. Our goal is to have Logstash instances pulling from different topics in Kafka (topics may have different input formats), then have the outputs go to the same ElasticSearch cluster. Will this issue address this type of Logstash scalability?

magnusbaeck · 2015-08-18T03:31:11Z

Our goal is to have Logstash instances pulling from different topics in Kafka (topics may have different input formats), then have the outputs go to the same ElasticSearch cluster. Will this issue address this type of Logstash scalability?

This is possible already. I don't see how Logstash clustering support would help, really.

elvarb · 2015-10-02T22:13:34Z

The metric filter could become a problem in a cluster since it only counts within its own context. I can see three solutions to the problem, there are probably many more solutions I'm overlooking.

The cluster stores and replicates all metric values within the cluster.
The cluster makes sure that if a metric filter is used in a pipeline it will only start one instance of it.
Warn users that the metric filter does not work in a cluster.

jordansissel · 2015-10-02T22:42:47Z

The metric filter could become a problem in a cluster since it only counts within its own context

I do not anticipate this being a problem. The current designs of logstash cluster work will not have this problem because filter state (metrics and multiline filters, for example) is not shared among nodes.

elvarb · 2015-10-02T23:03:00Z

If we have two logstash instances processing http logs for a single http application, we will have two different metric results for the response codes. Or am I misunderstanding this?

salyh · 2015-10-08T08:02:05Z

Are there plans to resolve this issue with logstash 2.x? What does the roadmap look like?

gokhancamas · 2016-10-18T10:43:54Z

Is there any progress about this issue in logstash 5.x?

untergeek · 2016-10-18T15:30:27Z

@gokhancamas This feature is unlikely to be added to Logstash before 6.0

rammulay · 2017-05-09T19:44:37Z

I am trying to understand how metrics filter will work in a logstash cluster. We are trying to decide whether we can have multiple logstash instances (as part of a cluster) for our application that is running in multiple pods or do we have to use just one logstash instance for metrics filter to work properly on log data from all app instances. If we use one logstash instance, scaling and availability becomes an issue.

elvarb · 2017-05-09T19:53:28Z

I would output the metrics from each logstash instance to statsd to combine them.

jordansissel · 2017-05-09T20:06:10Z

@rammulay it is unclear if the metrics filter is even the right solution for measuring things going through Logstash. The metrics filter may not be necessary anymore now that we have stats APIs in Logstash.

rammulay · 2017-05-09T20:25:40Z

@elvarb thanks for your suggestion. I think it is safe to say that the metrics filter will not work across multiple logstash instances.
@jordansissel are you referring to the node-stats-api? I am not sure how that is going to help me with aggregating and alerting based on application logs.

jordansissel · 2017-05-09T20:57:49Z

@rammulay ahh, thats a good question. We had an offline discussion a few days ago about the future of the metrics filter (or rather, the use case, aggregating/alerting on log data), and we had some consensus that the right place to do this was with Elasticsearch aggregations, at least, maybe for a while. We have some ideas that may enable stream aggregations that work across logstash instances, but nothing is designed yet.

rammulay · 2017-05-10T00:29:31Z

@jordansissel you mean something like ElastAlert? The team that maintains our elasticsearch instance asked us not to use this for performance reasons (it is a shared)...hence looking at moving this upstream into logstash. We maintain our own logstash server(s).

elvarb · 2017-05-17T10:56:19Z

@rammulay another workaround would be to have one dedicated logstash input on one server that all the other logstash instances sends their metrics to. That way that single logstash input would be acting as a statsd server and would be combining them together.

Sadly though the metrics filter itself does not support sum
https://www.elastic.co/guide/en/logstash/current/plugins-filters-metrics.html

So you would have to resort to writing your own plugin or using the ruby filter.

timothy-spencer · 2017-08-31T23:46:35Z

Is this feature going to address the use case talked about here?

https://discuss.elastic.co/t/multiple-logstash-docker-containers-sharing-an-s3-input/36077

I'm hoping that this feature is going to create some sort of shared sincedb functionality (perhaps stored in ES) that would let us spread our S3 based inputs out horizontally. Or am I misinterpreting the focus of the issue here?

demisx · 2019-03-07T23:19:19Z

Any update on this? At this point, it seems impossible to have more than one Logstash node running in parallel with the same JDBC input query against a given DB cluster. Seems like HA multi-node setup will be sending duplicate data downstream.

rwaweber · 2019-06-23T04:14:52Z

It might be kind of neat to be able to use a clustered logstash as a means of doing pipeline distribution/scheduling, where the unit of work to distribute is a whole pipeline. We could also have the logstash cluster determine the optimal set of resources to dedicate to a given pipeline. Or maybe the user could provide that, similar to kubernetes resource constraints? (It would be sweet if logstash could know the resource quantity available for a given server and deny/refuse to schedule pipelines that would exceed that quantity).

As a separate idea, this might also result in some form of loadbalancing for network inputs and a distributed offset for things like database(JDBC) or datastore(s3) inputs.

I think the latter may be a bit simpler to implement than the former, since network loadbalancing would assume either logstash would know how to distribute traffic or a client would, whereas a distributed sincedb(offset) could theoretically be stored in elasticsearch like what @timothy-spencer suggested, and something like this might be able to get close to atomic-ish writes in elasticsearch for said sincedb?

It might be neat to model some of the scheduling/distribution ideas around some pre-existing systems in this realm like:

jputman08 · 2020-09-18T17:29:00Z

Any new info on this request? Thanks!

insist93 · 2022-08-11T02:29:45Z

Any new info on this request? Thanks!

Matthew-Jenkins · 2023-03-30T18:10:53Z

Any update on this? At this point, it seems impossible to have more than one Logstash node running in parallel with the same JDBC input query against a given DB cluster. Seems like HA multi-node setup will be sending duplicate data downstream.

I'm not seeing a documented solution to this either

suyograo added enhancement feature v2.0.0 labels Feb 17, 2015

suyograo mentioned this issue Feb 17, 2015

Provide load balancing and high availability features to Logstash cluster #2633

Open

suyograo added roadmap and removed enhancement labels Feb 17, 2015

suyograo added enhancement and removed feature labels Apr 14, 2015

This was referenced May 20, 2015

Feature - Setup an HA Path from Logstash Forwarder to Elasticsearch #2580

Closed

Feature Request: Setting up a Highly Available (HA) pipeline with Logstash Nodes #2579

Open

suyograo removed the v2.0.0 label Sep 1, 2015

suyograo mentioned this issue Nov 17, 2015

Coordinating Support on Multiple Logstash Instances #4187

Closed

ppf2 mentioned this issue Feb 29, 2016

Allow externalization of urls configuration logstash-plugins/logstash-input-http_poller#46

Open

suyograo removed the manageability label Sep 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for clustering Logstash instances #2632

Add support for clustering Logstash instances #2632

suyograo commented Feb 17, 2015

bitsofinfo commented Mar 12, 2015

wiibaa commented Mar 19, 2015

webmstr commented Jul 7, 2015

suyograo commented Aug 4, 2015

splashx commented Aug 13, 2015

blysik commented Aug 13, 2015

gh-amistry commented Aug 14, 2015

magnusbaeck commented Aug 15, 2015

splashx commented Aug 15, 2015

gh-amistry commented Aug 17, 2015

magnusbaeck commented Aug 18, 2015

elvarb commented Oct 2, 2015

jordansissel commented Oct 2, 2015

elvarb commented Oct 2, 2015

salyh commented Oct 8, 2015

gokhancamas commented Oct 18, 2016

untergeek commented Oct 18, 2016

rammulay commented May 9, 2017

elvarb commented May 9, 2017

jordansissel commented May 9, 2017

rammulay commented May 9, 2017 •

edited

Loading

jordansissel commented May 9, 2017 •

edited

Loading

rammulay commented May 10, 2017

elvarb commented May 17, 2017

timothy-spencer commented Aug 31, 2017

demisx commented Mar 7, 2019

rwaweber commented Jun 23, 2019

jputman08 commented Sep 18, 2020

insist93 commented Aug 11, 2022

Matthew-Jenkins commented Mar 30, 2023

Add support for clustering Logstash instances #2632

Add support for clustering Logstash instances #2632

Comments

suyograo commented Feb 17, 2015

bitsofinfo commented Mar 12, 2015

wiibaa commented Mar 19, 2015

webmstr commented Jul 7, 2015

suyograo commented Aug 4, 2015

splashx commented Aug 13, 2015

blysik commented Aug 13, 2015

gh-amistry commented Aug 14, 2015

magnusbaeck commented Aug 15, 2015

splashx commented Aug 15, 2015

gh-amistry commented Aug 17, 2015

magnusbaeck commented Aug 18, 2015

elvarb commented Oct 2, 2015

jordansissel commented Oct 2, 2015

elvarb commented Oct 2, 2015

salyh commented Oct 8, 2015

gokhancamas commented Oct 18, 2016

untergeek commented Oct 18, 2016

rammulay commented May 9, 2017

elvarb commented May 9, 2017

jordansissel commented May 9, 2017

rammulay commented May 9, 2017 • edited Loading

jordansissel commented May 9, 2017 • edited Loading

rammulay commented May 10, 2017

elvarb commented May 17, 2017

timothy-spencer commented Aug 31, 2017

demisx commented Mar 7, 2019

rwaweber commented Jun 23, 2019

jputman08 commented Sep 18, 2020

insist93 commented Aug 11, 2022

Matthew-Jenkins commented Mar 30, 2023

rammulay commented May 9, 2017 •

edited

Loading

jordansissel commented May 9, 2017 •

edited

Loading