Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Management] Change indexing from {type}-{namespace}-{dataset} to {type}-{dataset}-{namespace} #56132

Merged
merged 5 commits into from
Feb 12, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 44 additions & 2 deletions docs/epm/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ A package contains all the assets for the Elastic Stack. A more detailed definit
Ingest Management enforces an indexing strategy to allow the system to automically detect indices and run queries on it. In short the indexing strategy looks as following:

```
{type}-{namespace}-{dataset}
{type}-{dataset}-{namespace}
```

The `{type}` can be `logs` or `metrics`. The `{namespace}` is the part where the user can use free form. The only two requirement are that it has only characters allowed in an Elasticsearch index name and does NOT contain a `-`. The `dataset` is defined by the data that is indexed. The same requirements as for the namespace apply. It is expected that the fields for type, namespace and dataset are part of each event and are constant keywords.
Expand All @@ -68,6 +68,17 @@ This indexing strategy has a few advantages:
* Having a global metrics and logs template, allows to create new indices on demand which still follow the convention. This is common in the case of k8s as an example.
* Constant keywords allow to narrow down the indices we need to access for querying very efficiently. This is especially relevant in environments which a large number of indices or with indices on slower nodes.

=== Ingest Pipeline

The ingest pipelines for a specific dataset will have the following naming scheme:

```
{type}-{dataset}-{package.version}
```

ruflin marked this conversation as resolved.
Show resolved Hide resolved
As an example, the ingest pipeline for the Nginx access logs is called `logs-nginx.access-3.4.1`. The same ingest pipeline is used for all namespaces. It is possible that a dataset has multiple ingest pipelines in which case a suffix is added to the name.

The version is included in each pipeline to allow upgrades. The pipeline itself is listed in the index template and is automatically applied at ingest time.

=== Templates & ILM Policies

Expand All @@ -77,10 +88,39 @@ The `metrics` and `logs` alias template contain all the basic fields from ECS.

Each type template contains an ILM policy. Modifying this default ILM policy will affect all data covered by the default templates.

The templates for a dataset are called as following:

```
{type}-{dataset}
```

The pattern used inside the index template is `{type}-{dataset}-*` to match all namespaces.

=== Defaults

If the Elastic Agent is used to ingest data and only the type is specified, `default` for the namespace is used and `generic` for the dataset.

=== Data filtering

Filtering for data in queries for example in visualizations or dashboards should always be done on the constant keyword fields. Visualizations needing data for the nginx.access dataset should query on `type:logs AND dataset:nginx.access`. As these are constant keywords the prefiltering is very efficient.

=== Security permissions

Security permissions can be set on different levels. To set special permissions for the access on the prod namespace an index pattern as below can be used:

```
/(logs|metrics)-[^-]+-prod-$/
```

To set specific permissions on the logs index, the following can be used:

```
/^(logs|metrics)-.*/
```

Todo: The above queries need to be tested.



== Package Manager

Expand All @@ -98,4 +138,6 @@ The new ingest pipeline is expected to still work with the data coming from olde

In case of a breaking change in the data structure, the new ingest pipeline is also expected to deal with this change. In case there are breaking changes which cannot be dealt with in an ingest pipeline, a new package has to be created.

Each package lists its minimal required agent version. In case there are agents enrolled with an older version, the user is notified to upgrade these agents as otherwise the new configs cannot be rolled out.
Each package lists its minimal required agent version. In case there are agents enrolled with an older version, the user is notified to upgrade these agents as otherwise the new configs cannot be rolled out.