Auto creation of template #639

ruflin · 2016-01-06T10:25:14Z

Problem

Each beat loads structured documents into elasticsearch. To make sure every document has the correct mapping, before starting a beat the predefined mapping should be loaded. Currently this is a manual step and is often forgotten. This can lead to problems as elasticsearch automatically assumes types. A mapping can't be changed anymore at a later stage.

See also https://github.com/elastic/libbeat/issues/62

Versioning

An additional problem is, that the templates can change with the different versions of a beat. This means having data of 2 different beat versions can lead to problems.

Logstash Template

Logstash provides its own template to elasticsearch as part of the logstash-output-elasticsearch. This is a generic template and does not necessarly cover all cases from the beats. The template applies to all indices starting with logstash-, by default the beats templates apply to all indices starting with beatname-

Proposed solution

To not have to implement all possibilities with the first version and better understand on how the feature will be used, I suggest to split it up in two phases:

Phase 1 - Manual phase / opt in

Phase 1 will be fully backward compatible. It allows the user to load the template if he configures it so. By default, the template will not be loaded.

Manually configure template in config file
No default behaviour
Getting started guide on how to do it

Phase 2 - Automated / versioning

Phase 2 always loads the template on startup and adds versioning for each template, so no conflicts between the different template versions happen.

Automatic loading of templates
Versioning of templates
Disabling by user needed if does not want to have behaviour

Configuration

The configuration will look as following. It is part of the elasticsearch output as a direct connection to elasticsearch is needed to apply the template. People using Logstash for example for filebeat must apply the manually or use the logstasth-elasticsearch-output.

output:
  elasticsearch:

     # A template is used ot set the mapping in elasticsearch
     # By default in phase 1, if this is commented out, no template is loaded
     # These settings can be adjusted to load your own template or overwrite existing ones
     template:

     # Template name. By default the template name is the same as the beatname
     name: "beatname"

     # Path to template file
     # TODO: make this platform specific?
     path:

     # Overwrite existing template
     overwrite: false

This configuration is taken from Logstash: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-manage_template

Versioning

To support the different version of templates, each template must be versioned. In the first version, by default templates would not be applied. In a second version when versioning of templates is in place, templates should be loaded automatically. Versioning is required for the automated behaviour as otherwise this would lead very soon to conflicts. With versioning in place, it is possible to keep using older beats running and writing to indices with the old mapping and having a newer beat running at the same time, writing in new indices with the new mapping.

To solve this problem, it is probably not only required to version the templates but also the indices, so the correct templates can be applied. It should be checked with Kibana / Marvel / Logstash on how they are solving this problem.

Logstash

Logstash has with the elasticsearch-logstash-output its own plugin which already applies a template to elasticsearch. This template is a general template and is intended for all indices starting with logstash-. Especially filebeat is normally connected to LS and sends data to ES over the output plugin. Depending on the LS configuration, either the index pattern logstash- or filebeat- is used. The question is which mapping should be used when sending data over LS.

Option 1

Only use default logstash mapping
Use the logstash- index
Add filebeat specific offset field to ls mapping

The disadvantage of option 1 is, that it only applies to filebeat and not the other beats. The assumptionis made, that the other beats send data directly to ES.

Option 2

Apply Filebeat Mapping manually
Configure filebeat- index pattern in LS

The main disadvantage here is that the mapping has to be applied manually. People updating a beat will probably not apply the updated (versioned) mapping. The advantage is that it works with all beats.

Suggestion

I would suggest to go with option 2, as filebeat has one of the simplest patterns of all beats, means there will be very rare BC breaks and in the long term, also nodeingest could be used for filebeat.

Note

Similar to logstash, there should be a generic base pattern provided by libbeat which applies to all beats.

tsg · 2016-01-06T10:28:47Z

libbeat/beat/template.go

+
+func (beat *Beat) loadTemplate() error {
+
+	// TODO: Fin a way to check if flag was even set. If it was set but no path, default path should be used.


isn't that just template != nil?

It seems that doesn't work for string, ints etc. because of the default which is a string "". I don't think I can set nil as default value?

Ah, right, I was hoping it sets the pointer to nil if the flag is not used, but that doesn't seem to be the case. Maybe we'd be better with having the setting in the configuration file? I find that more friendly to packaging, config management, etc.

I like the idea. In general I also prefer to all the settings in the config file. The only "issue" here is that this should be done only once. But as it does not have any affect if it is applied multiple times, this should also not be an issue.

As this config belongs to the elasticsearch output, I suggest to also place it there. Something like

output: elasticsearch: #template: path

If template is not configure, default path is taken. If template is uncommented an no path is provided, template will not be loaded.

@tsg One thing that came to my mind is that the template will be loaded multiple times in most cases anyways, as there are multiple beats on different machines ...

@andrewkroh That's a good idea. I actually have to check if this could be done with one call. As in case lets say 20 beats start up and every beat sends the template, it is quite likely that between a HEAD request and sending the template, and other one already did it.

Also the idea came up to use versioning for the templates.

If I get it right, this is also what LS does: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-manage_template We might want to check the source code to make sure.

What happens if the beats that are sending data to the same Elasticsearch have different versions and different templates? If only one beat is loading the template, it might be that an older beat is expecting an older template.

We have to make sure all the beats are running the same version. Otherwise raise an exception.

@monicasarbu The idea here was to version the templates. But I still need to create our concept here.

ruflin · 2016-01-12T09:51:13Z

I had a look at the template for logstash-output-elasticsearch and there would be two things which would have to be added:

offset type long
support for also filebeat-* templates

Currently it is assume, that all templates start with logstash-*. We could recommend people who use filebeat with logstash to use the ls index to solve this issue as I think it is not possible to set 2 template patterns.

urso · 2016-01-12T11:56:23Z

libbeat/outputs/elasticsearch/client.go

+		reader := bytes.NewReader(content)
+
+
+		client.LoadTemplate(config.Template.Name, reader)


instead of just forwarding as is, do we want to do some validation?

tsg · 2016-01-12T13:35:53Z

@ruflin I was thinking we'll just tell people to configure the Beats templates in the logstash-elasticsearch-output. When using the beats with logstash, you need a pretty special logstash configuration anyway.

ruflin · 2016-01-12T13:37:29Z

@tsg If we go this route it would probably make sense to move the more generic stuff from the logstash template also into the filebeat template. I assume people using LS will do some transformations etc., means having fields which are not part of our template yet.

ruflin · 2016-01-12T13:50:13Z

topbeat/etc/topbeat.yml

@@ -40,7 +40,22 @@ output:
    # Scheme and port can be left out and will be set to the default (http and 9200)
    # In case you specify and additional path, the scheme is required: http://localhost:9200/path
    # IPv6 addresses should always be defined as: https://[2001:db8::1]:9200
-    hosts: ["localhost:9200"]
+    hosts: ["192.168.99.100:9200"]


This change will be removed again. Only for testing.

tsg · 2016-01-12T13:56:30Z

@ruflin right, we might want to integrate pretty much all of LS template into Filebeat's template, also in preparation for the ingest node. This would be worth a new discuss ticket?

andrewkroh · 2016-01-13T01:11:03Z

libbeat/outputs/elasticsearch/output.go

+		logp.Debug("test", "Test: %v", config.Template)
+
+		// Check if template already exist or should be overwritten
+		if !esClient.CheckTemplate(config.Template.Name) && config.Template.Overwrite {


There is a race condition with CheckTemplate and LoadTemplate when using multiple Beats (potentially of different versions) but I don't know of any way to work-around it.

I also think there is a potential conflict. For different versions the solution is probably to version the templates and indices. I recently updated the issue and add a note about the version problem: #639 The initial solution would be to not make it automatic.

tsg · 2016-01-13T08:17:54Z

Regarding the versioning issue, it sounds to me like the only upgrade-safe way to do it is to encode the "schema" version in the index names, like Marvel does.

I wonder if in Phase 1 we shouldn't load the template by default with overwrite: false. This has the benefit of simplifying the first experience, while still being safe enough against user provided templates.

Regarding the Logstash setup, how about option 3, Filebeat manages the template directly with Elasticsearch. Architecture wise, we said before that it's fine if in a *beat -> Logstash -> Elasticsearch, *beat queries Elasticsearch directly for configuration, topology information, own metrics, etc. I see the template management similar to that. I see a challenge with this option is how to organize the configuration file.

ruflin · 2016-01-13T08:35:34Z

I think it is a good idea to go in Phase with overwrite: false but having a warning / error message in log.

The challenge with option 3 is as you mentioned the config file as it cannot depend on the elasticsearch output as it would then also send events to it. But as we have to think about this anyways with configuration and metrics lets add this to it. I like the idea and will check if I can come up with a good solution.

ruflin · 2016-01-25T14:08:33Z

We have decided to go with Phase 1 first and overwrite: false by default. PR will be updated.

ruflin · 2016-01-27T08:53:06Z

For changing the documentation I created a follow up issue here: #862

Closes elastic/libbeat#62

dedemorton · 2016-01-27T18:18:41Z

CHANGELOG.asciidoc

@@ -58,6 +58,7 @@ https://github.com/elastic/beats/compare/v1.1.0...master[Check the HEAD diff]
 - Update builds to Golang version 1.5.3
 - Add ability to override configuration settings using environment variables {issue}114[114]
 - Libbeat now always exits through a single exit method for proper cleanup and control {pull}736[736]
+- Possibility to create elasticsearch mapping on startup {pull}639[639]


Suggest changing to: Add ability to create Elasticsearch mapping on startup

ruflin · 2016-01-28T13:53:52Z

@dedemorton Doc adjustments done.

monicasarbu · 2016-01-28T17:03:11Z

filebeat/filebeat.yml

@@ -151,6 +151,9 @@ filebeat:
  # Event count spool threshold - forces network flush if exceeded
  #spool_size: 2048

+  # Enable async publisher pipeline in filebeat (Experimental!)
+  #publish_async: false


is this option here by mistake?

It was automatically generated, so I think it went missing in one of the previous updates.

monicasarbu · 2016-01-28T17:17:25Z

LGTM.

Auto creation of template

andrewkroh · 2016-01-28T21:03:44Z

LGTM. The organization of a few of the imports don't match up with how goimports organizes them.

ruflin · 2016-01-29T07:40:24Z

@andrewkroh Will clean that up in a later PR.

tsg reviewed Jan 6, 2016
View reviewed changes

ruflin force-pushed the load-template branch from feeb538 to 3484ff9 Compare January 12, 2016 09:51

urso reviewed Jan 12, 2016
View reviewed changes

ruflin reviewed Jan 12, 2016
View reviewed changes

ruflin force-pushed the load-template branch from 4101d49 to 3a2a9f8 Compare January 12, 2016 15:54

andrewkroh reviewed Jan 13, 2016
View reviewed changes

ruflin added the in progress Pull request is currently in progress. label Jan 25, 2016

ruflin force-pushed the load-template branch 2 times, most recently from 689f236 to 37dcc78 Compare January 27, 2016 08:40

ruflin changed the title ~~Prototype of auto creation of template.~~ Auto creation of template. Jan 27, 2016

ruflin changed the title ~~Auto creation of template.~~ Auto creation of template Jan 27, 2016

ruflin force-pushed the load-template branch from 37dcc78 to 5f78821 Compare January 27, 2016 08:49

ruflin mentioned this pull request Jan 27, 2016

Create Documentation for Template Creation #862

Closed

2 tasks

ruflin added review v2.0.0 libbeat and removed in progress Pull request is currently in progress. labels Jan 27, 2016

ruflin force-pushed the load-template branch from 5f78821 to 760ccee Compare January 27, 2016 09:17

Possibility to create elasticsearch mapping on startup

2f26ec1

Closes elastic/libbeat#62

ruflin force-pushed the load-template branch from 760ccee to 2f26ec1 Compare January 27, 2016 09:17

dedemorton reviewed Jan 27, 2016
View reviewed changes

Update docs

e64ad83

monicasarbu reviewed Jan 28, 2016
View reviewed changes

monicasarbu added a commit that referenced this pull request Jan 28, 2016

Merge pull request #639 from ruflin/load-template

2e9b161

Auto creation of template

monicasarbu merged commit 2e9b161 into elastic:master Jan 28, 2016

ruflin deleted the load-template branch January 29, 2016 07:40

monicasarbu mentioned this pull request Mar 7, 2016

Topbeat: can't reliably retrieve proc.cpu.total_p value from ES #1009

Closed

ruflin mentioned this pull request Mar 11, 2016

Backport template loading #1137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto creation of template #639

Auto creation of template #639

ruflin commented Jan 6, 2016

tsg Jan 6, 2016

ruflin Jan 6, 2016

tsg Jan 6, 2016

ruflin Jan 6, 2016

ruflin Jan 6, 2016

ruflin Jan 6, 2016

tsg Jan 6, 2016

monicasarbu Jan 12, 2016

monicasarbu Jan 12, 2016

ruflin Jan 12, 2016

ruflin commented Jan 12, 2016

urso Jan 12, 2016

tsg commented Jan 12, 2016

ruflin commented Jan 12, 2016

ruflin Jan 12, 2016

tsg commented Jan 12, 2016

andrewkroh Jan 13, 2016

ruflin Jan 13, 2016

tsg commented Jan 13, 2016

ruflin commented Jan 13, 2016

ruflin commented Jan 25, 2016

ruflin commented Jan 27, 2016

dedemorton Jan 27, 2016

ruflin Jan 28, 2016

ruflin commented Jan 28, 2016

monicasarbu Jan 28, 2016

ruflin Jan 28, 2016

monicasarbu commented Jan 28, 2016

andrewkroh commented Jan 28, 2016

ruflin commented Jan 29, 2016


		func (beat *Beat) loadTemplate() error {

		// TODO: Fin a way to check if flag was even set. If it was set but no path, default path should be used.

		reader := bytes.NewReader(content)


		client.LoadTemplate(config.Template.Name, reader)

Auto creation of template #639

Auto creation of template #639

Conversation

ruflin commented Jan 6, 2016

Problem

Versioning

Logstash Template

Proposed solution

Phase 1 - Manual phase / opt in

Phase 2 - Automated / versioning

Configuration

Versioning

Logstash

Option 1

Option 2

Suggestion

Note

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Jan 12, 2016

Choose a reason for hiding this comment

tsg commented Jan 12, 2016

ruflin commented Jan 12, 2016

Choose a reason for hiding this comment

tsg commented Jan 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsg commented Jan 13, 2016

ruflin commented Jan 13, 2016

ruflin commented Jan 25, 2016

ruflin commented Jan 27, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Jan 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

monicasarbu commented Jan 28, 2016

andrewkroh commented Jan 28, 2016

ruflin commented Jan 29, 2016