-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for clustering Logstash instances #2632
Comments
For item 1 above, maybe design it so that everything talks through an "ConfigurationStore" abstraction which is implemented via plugins? I.e. so it could be extensible, supporting different implementations of where the "gold copy" of configuration is actually persisted and changes propagated to/from. Have different impls (i.e. ES itself, zookeeper etc) |
@suyograo should this ticket also mention plan for encryption support or should it be done separately ? |
What about items that are cached in LS instance now, like the data for the elapsed{} filter? |
@bitsofinfo thats exactly our thinking...we'll make it pluggable, so its easy to add in an alternate implementation for a config store. The first implementation will use ES as a config store. |
What's the recommended workaround now, specially if you need several logstash instances to monitor one folder (with several files)? As sincedb files are not shared among the instances, it's a pain in the ass to edit each .conf manually using excludes. Plus I also believe sincedb files are not recommended to be shared (one file for all) ATM as there is no concept of exclusive read/write access. Hints? |
I would have a single machine which mounted and sent the files via logstash-forwarder or beaver, to a load-balanced group of logstash machines. |
Hey @suyograo, thanks for working on supporting clustering for Logstash. The current documentation already seems to imply that it's already available:
https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html Does this refer to the same issue? Thanks! |
@gh-amistry: The documentation is not meant to imply the availability of clustered Logstash instances. The preceding paragraphs in that text describe a setup where multiple Logstash instances can pull messages from a message queue. That is already available, but each Logstash instance in such a setup is independent and doesn't share any state or configuration with other instances. |
@blysik that's a good workaround, until beaver/logstash-forwarder becomes the bottleneck - you'll have to stop the process, launch a second instance, create excludes and split the load. VERY not friendly. |
Thanks @magnusbaeck for the clarification. Our goal is to have Logstash instances pulling from different topics in Kafka (topics may have different input formats), then have the outputs go to the same ElasticSearch cluster. Will this issue address this type of Logstash scalability? |
This is possible already. I don't see how Logstash clustering support would help, really. |
The metric filter could become a problem in a cluster since it only counts within its own context. I can see three solutions to the problem, there are probably many more solutions I'm overlooking.
|
I do not anticipate this being a problem. The current designs of logstash cluster work will not have this problem because filter state (metrics and multiline filters, for example) is not shared among nodes. |
If we have two logstash instances processing http logs for a single http application, we will have two different metric results for the response codes. Or am I misunderstanding this? |
Are there plans to resolve this issue with logstash 2.x? What does the roadmap look like? |
Is there any progress about this issue in logstash 5.x? |
@gokhancamas This feature is unlikely to be added to Logstash before 6.0 |
I am trying to understand how metrics filter will work in a logstash cluster. We are trying to decide whether we can have multiple logstash instances (as part of a cluster) for our application that is running in multiple pods or do we have to use just one logstash instance for metrics filter to work properly on log data from all app instances. If we use one logstash instance, scaling and availability becomes an issue. |
I would output the metrics from each logstash instance to statsd to combine them. |
@rammulay it is unclear if the metrics filter is even the right solution for measuring things going through Logstash. The metrics filter may not be necessary anymore now that we have stats APIs in Logstash. |
@elvarb thanks for your suggestion. I think it is safe to say that the metrics filter will not work across multiple logstash instances. |
@rammulay ahh, thats a good question. We had an offline discussion a few days ago about the future of the metrics filter (or rather, the use case, aggregating/alerting on log data), and we had some consensus that the right place to do this was with Elasticsearch aggregations, at least, maybe for a while. We have some ideas that may enable stream aggregations that work across logstash instances, but nothing is designed yet. |
@jordansissel you mean something like ElastAlert? The team that maintains our elasticsearch instance asked us not to use this for performance reasons (it is a shared)...hence looking at moving this upstream into logstash. We maintain our own logstash server(s). |
@rammulay another workaround would be to have one dedicated logstash input on one server that all the other logstash instances sends their metrics to. That way that single logstash input would be acting as a statsd server and would be combining them together. Sadly though the metrics filter itself does not support sum So you would have to resort to writing your own plugin or using the ruby filter. |
Is this feature going to address the use case talked about here? https://discuss.elastic.co/t/multiple-logstash-docker-containers-sharing-an-s3-input/36077 I'm hoping that this feature is going to create some sort of shared sincedb functionality (perhaps stored in ES) that would let us spread our S3 based inputs out horizontally. Or am I misinterpreting the focus of the issue here? |
Any update on this? At this point, it seems impossible to have more than one Logstash node running in parallel with the same JDBC input query against a given DB cluster. Seems like HA multi-node setup will be sending duplicate data downstream. |
It might be kind of neat to be able to use a clustered logstash as a means of doing pipeline distribution/scheduling, where the unit of work to distribute is a whole pipeline. We could also have the logstash cluster determine the optimal set of resources to dedicate to a given pipeline. Or maybe the user could provide that, similar to kubernetes resource constraints? (It would be sweet if logstash could know the resource quantity available for a given server and deny/refuse to schedule pipelines that would exceed that quantity). As a separate idea, this might also result in some form of loadbalancing for network inputs and a distributed offset for things like database(JDBC) or datastore(s3) inputs. I think the latter may be a bit simpler to implement than the former, since network loadbalancing would assume either logstash would know how to distribute traffic or a client would, whereas a distributed sincedb(offset) could theoretically be stored in elasticsearch like what @timothy-spencer suggested, and something like this might be able to get close to atomic-ish writes in elasticsearch for said sincedb? It might be neat to model some of the scheduling/distribution ideas around some pre-existing systems in this realm like: |
Any new info on this request? Thanks! |
1 similar comment
Any new info on this request? Thanks! |
I'm not seeing a documented solution to this either |
Today, each Logstash instance is a full pipeline -- inputs, filters and outputs stages. In large-scale Logstash deployments, users run multiple instances of Logstash in order to horizontally scale event processing. This requires manual management of individual configuration files, or custom/3rd party configuration automation tools such as Puppet or Chef.
We plan to introduce a concept of a Logstash cluster, where instances can be controlled as a whole (on a cluster level), instead of being separate parts. This would entail the following features:
Logstash can still be started in a single-instance, non-clustered mode; file based configuration will continue to work.
Clustering instances will also provide the necessary groundwork for potential long-term enhancements like automatic load balancing, failover, running multiple pipelines and so on.
The text was updated successfully, but these errors were encountered: