Cherry-pick #8914 to 6.x: Accept multiple ingest pipelines in Filebeat #9811

ycombinator · 2018-12-27T19:35:38Z

Cherry-pick of PR #8914 to 6.x branch. Original message:

Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to:

run sub-pipelines via the pipeline processor, and
conditionally run processors via an if field.

These abilities combined present the opportunity for a fileset to ingest the same logical information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline.

This PR allows filesets to specify one or more ingest pipelines via the ingest_pipeline property in their manifest.yml. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline.

Example with multiple pipelines

ingest_pipeline:
  - pipeline-ze-boss.json 
  - pipeline-plain.json
  - pipeline-json.json

Example with a single pipeline

This is just to show that the existing functionality will continue to work as-is.

ingest_pipeline: pipeline.json

Now, if the root pipeline wants to delegate processing to another pipeline, it must use a pipeline processor to do so. This processor's name field will need to reference the other pipeline by its name. To ensure correct referencing, the name field must be specified as follows:

{
  "pipeline" : {
    "name": "{< IngestPipeline "pipeline-plain" >}"
  }
}

This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with filebeat-<version>-<module>-<fileset>- when loading them into Elasticsearch.

houndci-bot · 2018-12-27T19:35:45Z

filebeat/fileset/fileset.go

-func (fs *Fileset) GetPipeline(esVersion string) (pipelineID string, content map[string]interface{}, err error) {
-	path, err := applyTemplate(fs.vars, fs.manifest.IngestPipeline, false)
+// GetPipelines returns the JSON content of the Ingest Node pipeline that parses the logs.
+func (fs *Fileset) GetPipelines(esVersion common.Version) (pipelines []pipeline, err error) {


exported method GetPipelines returns unexported type []fileset.pipeline, which can be annoying to use

hound has a point here ;-)

ycombinator · 2018-12-27T20:56:47Z

This PR depends on #9813 to be merged first. Then this PR should be rebased on 6.x.

Motivated by #8852 (comment). Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to: - run sub-pipelines via the [`pipeline` processor](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/pipeline-processor.html), and - conditionally run processors via an [`if` field](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/ingest-processors.html). These abilities combined present the opportunity for a fileset to ingest the same _logical_ information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline. This PR allows filesets to specify one or more ingest pipelines via the `ingest_pipeline` property in their `manifest.yml`. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline. ```yaml ingest_pipeline: - pipeline-ze-boss.json - pipeline-plain.json - pipeline-json.json ``` _This is just to show that the existing functionality will continue to work as-is._ ```yaml ingest_pipeline: pipeline.json ``` Now, if the root pipeline wants to delegate processing to another pipeline, it must use a `pipeline` processor to do so. This processor's `name` field will need to reference the other pipeline by its name. To ensure correct referencing, the `name` field must be specified as follows: ```json { "pipeline" : { "name": "{< IngestPipeline "pipeline-plain" >}" } } ``` This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with `filebeat-<version>-<module>-<fileset>-` when loading them into Elasticsearch. (cherry picked from commit 5ba1f11)

ycombinator · 2018-12-28T06:21:06Z

jenkins, test this

ycombinator added backport review labels Dec 27, 2018

houndci-bot reviewed Dec 27, 2018

View reviewed changes

ycombinator force-pushed the backport_8914_6.x branch from f0b8bda to 5952834 Compare December 28, 2018 01:42

ycombinator force-pushed the backport_8914_6.x branch from 5952834 to 4e8f855 Compare December 28, 2018 02:32

ycombinator requested review from ruflin and urso December 28, 2018 07:14

ruflin approved these changes Dec 28, 2018

View reviewed changes

ycombinator merged commit 7e38917 into elastic:6.x Dec 28, 2018

ycombinator deleted the backport_8914_6.x branch December 25, 2019 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick #8914 to 6.x: Accept multiple ingest pipelines in Filebeat #9811

Cherry-pick #8914 to 6.x: Accept multiple ingest pipelines in Filebeat #9811

ycombinator commented Dec 27, 2018

houndci-bot Dec 27, 2018

ruflin Dec 28, 2018

ycombinator commented Dec 27, 2018

ycombinator commented Dec 28, 2018

Cherry-pick #8914 to 6.x: Accept multiple ingest pipelines in Filebeat #9811

Cherry-pick #8914 to 6.x: Accept multiple ingest pipelines in Filebeat #9811

Conversation

ycombinator commented Dec 27, 2018

Example with multiple pipelines

Example with a single pipeline

houndci-bot Dec 27, 2018

Choose a reason for hiding this comment

ruflin Dec 28, 2018

Choose a reason for hiding this comment

ycombinator commented Dec 27, 2018

ycombinator commented Dec 28, 2018