Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #8914 to 6.x: Accept multiple ingest pipelines in Filebeat #9811

Merged
merged 1 commit into from
Dec 28, 2018

Conversation

ycombinator
Copy link
Contributor

Cherry-pick of PR #8914 to 6.x branch. Original message:

Motivated by #8852 (comment).

Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to:

These abilities combined present the opportunity for a fileset to ingest the same logical information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline.

This PR allows filesets to specify one or more ingest pipelines via the ingest_pipeline property in their manifest.yml. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline.

Example with multiple pipelines

ingest_pipeline:
  - pipeline-ze-boss.json 
  - pipeline-plain.json
  - pipeline-json.json

Example with a single pipeline

This is just to show that the existing functionality will continue to work as-is.

ingest_pipeline: pipeline.json

Now, if the root pipeline wants to delegate processing to another pipeline, it must use a pipeline processor to do so. This processor's name field will need to reference the other pipeline by its name. To ensure correct referencing, the name field must be specified as follows:

{
  "pipeline" : {
    "name": "{< IngestPipeline "pipeline-plain" >}"
  }
}

This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with filebeat-<version>-<module>-<fileset>- when loading them into Elasticsearch.

func (fs *Fileset) GetPipeline(esVersion string) (pipelineID string, content map[string]interface{}, err error) {
path, err := applyTemplate(fs.vars, fs.manifest.IngestPipeline, false)
// GetPipelines returns the JSON content of the Ingest Node pipeline that parses the logs.
func (fs *Fileset) GetPipelines(esVersion common.Version) (pipelines []pipeline, err error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method GetPipelines returns unexported type []fileset.pipeline, which can be annoying to use

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hound has a point here ;-)

@ycombinator
Copy link
Contributor Author

This PR depends on #9813 to be merged first. Then this PR should be rebased on 6.x.

Motivated by #8852 (comment).

Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to:
- run sub-pipelines via the [`pipeline` processor](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/pipeline-processor.html), and
- conditionally run processors via an [`if` field](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/ingest-processors.html).

These abilities combined present the opportunity for a fileset to ingest the same _logical_ information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline.

This PR allows filesets to specify one or more ingest pipelines via the `ingest_pipeline` property in their `manifest.yml`. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline.

```yaml
ingest_pipeline:
  - pipeline-ze-boss.json
  - pipeline-plain.json
  - pipeline-json.json
```
_This is just to show that the existing functionality will continue to work as-is._
```yaml
ingest_pipeline: pipeline.json
```

Now, if the root pipeline wants to delegate processing to another pipeline, it must use a `pipeline` processor to do so. This processor's `name` field will need to reference the other pipeline by its name. To ensure correct referencing, the `name` field must be specified as follows:

```json
{
  "pipeline" : {
    "name": "{< IngestPipeline "pipeline-plain" >}"
  }
}
```

This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with `filebeat-<version>-<module>-<fileset>-` when loading them into Elasticsearch.

(cherry picked from commit 5ba1f11)
@ycombinator
Copy link
Contributor Author

jenkins, test this

@ycombinator ycombinator merged commit 7e38917 into elastic:6.x Dec 28, 2018
@ycombinator ycombinator deleted the backport_8914_6.x branch December 25, 2019 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants