Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document data streams and custom index lifecycle policies #6553

Merged
merged 7 commits into from
Nov 22, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 84 additions & 2 deletions docs/data-streams.asciidoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,86 @@
[[apm-data-streams]]
== Data streams
=== Data streams

// to do: fill with content. placeholder for external links for now
****
{agent} uses data streams to store append-only time series data across multiple indices
while giving users a single named resource for requests.
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved
Data streams are well-suited for logs, metrics, traces, and other continuously generated data,
and offer a host of benefits over other indexing strategies:

* Reduced number of fields per index
* More granular data control
* Flexible naming scheme
* Fewer ingest permissions required

See the {fleet-guide}/data-streams.html[Fleet and Elastic Agent Guide] to learn more.
****

[discrete]
[[apm-data-streams-naming-scheme]]
=== Data stream naming scheme

APM data follows the `<type>-<dataset>-<namespace>` naming scheme.
The `type` and `dataset` are predefined by the APM integration,
but the `namespace` is your opportunity to customize how different types of data are stored in {es}.
There is no recommendation for what to use as your namespace--it is intentionally flexible.
For example, you might create namespaces for each of your environments,
like `dev`, `prod`, `production`, etc.
Or, you might create namespaces that correspond to strategic business units within your organization.

[discrete]
[[apm-data-streams-list]]
=== APM data streams

By type, the APM data streams are:

Traces::

Traces are comprised of {apm-guide-ref}/data-model.html[spans and transactions].
Traces are stored in the following data streams:

- Application traces: `traces-apm-<namespace>`
- RUM and iOS agent application traces: `traces-apm.rum-<namespace>`

Metrics::

Metrics include application-based metrics and basic system metrics.
Metrics are stored in the following data streams:

- APM internal metrics: `metrics-apm.internal-<namespace>`
- APM profiling metrics: `metrics-apm.profiling-<namespace>`
- Application metrics: `metrics-apm.app.<service.name>-<namespace>`
+
Application metrics include the instrumented service's name--defined in each APM agent's
configuration--in the data stream name.
Service names therefore must follow certain index naming rules.
+
[%collapsible]
.Service name rules
====
* Service names are case-insensitive and must be unique.
For example, you cannot have a service named `Foo` and another named `foo`.
* Special characters will be removed from service names and replaced with underscores (`_`).
Special characters include:
+
[source,text]
----
'\\', '/', '*', '?', '"', '<', '>', '|', ' ', ',', '#', ':', '-'
----
====

Logs::

Logs include application error events and application logs.
Logs are stored in the following data streams:

- APM error/exception logging: `logs-apm.error-<namespace>`

[discrete]
[[apm-data-streams-next]]
=== What's next?

* Data streams define not only how data is stored in {es}, but also how data is retained over time.
See <<ilm-how-to>> to learn how to create your own data retention policies.

* See <<manage-storage>> for information on APM storage and processing costs,
processing and performance, and other index management features.
9 changes: 0 additions & 9 deletions docs/how-to.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,11 @@
Learn how to perform common APM configuration and management tasks.

* <<source-map-how-to>>
* <<ilm-how-to>>
* <<jaeger-integration>>
* <<ingest-pipelines>>
* <<manage-storage>>
* <<apm-tune-elasticsearch>>

include::./source-map-how-to.asciidoc[]

include::./ilm-how-to.asciidoc[]

include::./jaeger-integration.asciidoc[]

include::./ingest-pipelines.asciidoc[]

include::./manage-storage.asciidoc[]

include::./apm-tune-elasticsearch.asciidoc[]
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved
197 changes: 185 additions & 12 deletions docs/ilm-how-to.asciidoc
Original file line number Diff line number Diff line change
@@ -1,18 +1,191 @@
[[ilm-how-to]]
=== Index lifecycle management (ILM)
=== Index lifecycle management

// todo: add more context and an example
Index lifecycle policies allow you to automate the
lifecycle of your APM indices as they grow and age.
A default policy is applied to each APM data stream,
but can be customized depending on your business needs.

++++
<titleabbrev>Customize index lifecycle management</titleabbrev>
++++
See {ref}/index-lifecycle-management.html[ILM: Manage the index lifecycle] to learn more.

The index lifecycle management (ILM) feature in {es} allows you to automate the
lifecycle of your APM Server indices as they grow and age.
ILM is enabled by default, and a default policy is applied to all APM indices.
[discrete]
[[index-lifecycle-policies-default]]
=== Default policies

To view and edit these index lifecycle policies in {kib},
select *Stack Management* / *Index Lifecycle Management*.
Search for `apm`.
The table below describes the default index lifecycle policy applied to each APM data stream.
Each policy includes a rollover and delete definition:

See {ref}/getting-started-index-lifecycle-management.html[manage the index lifecycle] for more information.
* **Rollover**: Using rollover indices prevents a single index from growing too large and optimizes indexing and search performance. Rollover, i.e. writing to a new index, occurs after either an age or size metric is met.
* **Delete**: The delete phase permanently removes the index after a time threshold is met.

[cols="1,1,1",options="header"]
|===
|Data stream
|Rollover after
|Delete after

|`traces-apm`
|30 days / 50 gb
|10 days

|`traces-apm.rum`
|30 days / 50 gb
|90 days

|`metrics-apm.profiling`
|30 days / 50 gb
|10 days

|`metrics-apm.internal`
|30 days / 50 gb
|90 days

|`metrics-apm.app`
|30 days / 50 gb
|90 days

|`logs-apm.error`
|30 days / 50 gb
|10 days

|===

The APM index lifecycle policies can be viewed in {kib}.
Navigate to *Stack Management* / *Index Lifecycle Management*, and search for `apm`.

[discrete]
[[data-streams-custom-policy]]
=== Configure a custom index lifecycle policy

This tutorial explains how to apply a custom index lifecycle policy to the
`traces-apm` data stream with the `default` namespace.

[discrete]
[[data-streams-custom-one]]
=== Step 1: View data streams

The **Data Streams** view in {kib} shows you the data streams,
index templates, and index lifecycle policies associated with a given integration.

. Navigate to **Stack Management** > **Index Management** > **Data Streams**.
. Search for `apm` to see all data streams associated with APM data.
. Select the `traces-apm-default` data stream to view its associated index template and ILM policy.
As you can see, the data stream follows the <<apm-data-streams-naming-scheme>> and starts with its type, `traces-`.
+
[role="screenshot"]
image::images/data-stream-overview.png[Data streams info]

[discrete]
[[data-streams-custom-two]]
=== Step 2: Create an index lifecycle policy

. Navigate to **Stack Management** > **Index Lifecycle Policies**.
. Click **Create policy**.

Name your new policy. I've chosen `custom-traces-apm-default-policy`.
Customize the policy to your liking, and when you're done, click **Save policy**.

[discrete]
[[data-streams-custom-three]]
=== Step 3: Create a component template

To apply your new index lifecycle policy to a data stream,
and to ensure the policy continues to be applied in future versions,
you must create a component template.
The component template name must start with `.`, follow the data stream naming scheme,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you get the . part from? AFAIK they do not have a dot prefix.

Copy link
Member Author

@bmorelli25 bmorelli25 Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dot (and these docs) are adapted from a tutorial that @hop-dev wrote. Mark, can you comment on whether the . is required or not?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axw Thanks for pointing this out, I think this is a mistake in my original guide, you're right the component templates don't have a dot prefix.

@joshdover I can't see any record of us discussing having a dot prefix on the namespace specific component templates, I think we may be able to remove it? If so I'll get a PR in for the docs.

and end with `@custom`:

[source,text]
----
.<type>-<dataset>-<namespace>@custom
----

For example, to create custom index settings for the `traces-apm` data stream with a namespace of `default`,
the component template name would be:

[source,text]
----
.traces-apm-default@custom
----

. Navigate to **Stack Management** > **Index Management** > **Component Templates**
. Click **Create component template**.
. Use the template above to set the name--in this case, `.traces-apm-default@custom`. Click **Next**.
. Under **Index settings**, set the ILM policy name created in the previous step:
+
[source,json]
----
{
"lifecycle": {
"name": "custom-traces-apm-default-policy"
}
}
----
. Continue to **Review** and ensure your request looks similar to the image below.
If it does, click **Create component template**.
+
[role="screenshot"]
image::images/create-component-template.png[Create component template]

[discrete]
[[data-streams-custom-four]]
=== Step 4: Clone and modify the existing index template
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved

Now that you've created a component template,
you need to create an index template to apply the changes to the correct data stream.
The easiest way to do this is to duplicate and modify the integration's existing index template.

WARNING: When duplicating the index template, do not change or remove any managed properties. This may result in problems when upgrading.

. Navigate to **Stack Management** > **Index Management** > **Index Templates**.
. Find the index template you want to clone. The index template will have the `<type>` and `<dataset>` in its name,
but not the `<namespace>`. In this case, it's `traces-apm`.
. Select **Actions** > **Clone**
. Set the name of the new index template to `traces-apm-default`.
. Change the index pattern to include a namespace-- in this case, `traces-apm-default*`.
This ensures the previously created component template is only applied to the `default` namespace.
. Set the priority to `250`. This ensures that the new index template takes precedence over other index templates that match the index pattern.
. Under **Component templates**, search for and add the component template created in the previous step.
To ensure your namespace-specific settings are applied over other custom settings,
the new template should be added below the existing `@custom` template.
. Create the index template.

[discrete]
[[data-streams-custom-five]]
=== Step 5: Roll over the data stream (optional)

To confirm that the data stream is now using the new index template and ILM policy,
you can either repeat <<data-streams-custom-one,step one>>, or navigate to **Dev Tools ** and run the following:

[source,bash]
----
GET /_data_stream/traces-apm-default <1>
----
<1> The name of the data stream we've been hacking on

The result should include the following:

[source,json]
----
{
"data_streams" : [
{
...
"template" : "traces-apm-default", <1>
"ilm_policy" : "custom-traces-apm-default-policy", <2>
...
}
]
}
----
<1> The name of the custom index template created in step three
<2> The name of the ILM policy applied to the new component template in step two

New ILM policies only take effect when new indices are created,
so you either must wait for a rollover to occur (usually after 30 days or when the index size reaches 50GB),
or force a rollover using the {ref}/indices-rollover-index.html[{es} rollover API]:

[source,bash]
----
POST /traces-apm-default/_rollover/
----
Binary file added docs/images/create-component-template.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/data-stream-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/integrations-index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ include::features.asciidoc[]

include::how-to.asciidoc[]

include::input-apm.asciidoc[]
include::manage-storage.asciidoc[]

include::data-streams.asciidoc[]
include::input-apm.asciidoc[]

include::secure-agent-communication.asciidoc[]

Expand Down
24 changes: 15 additions & 9 deletions docs/manage-storage.asciidoc
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
[[manage-storage]]
=== Manage storage
== Manage storage

* <<storage-guide>>
* <<processing-and-performance>>
* <<reduce-apm-storage>>
* <<manage-indices-in-kibana>>
* <<update-data>>
{agent} uses <<apm-data-streams,data streams>> to store time series data across multiple indices while giving you a single named resource for requests.
Each data stream ships with a customizable <<ilm-how-to,index lifecycle policy>> that automates data retention as your indices grow and age.

The <<storage-guide,storage and sizing guide>> attempts to define a "typical" storage reference for Elastic APM,
while the <<processing-and-performance,processing and performance guide>> can better help you understand throughput for various instance sizes.

Finally, there are additional settings you can tweak to <<reduce-apm-storage,reduce storage>>,
or <<apm-tune-elasticsearch,tune data ingestion in Elasticsearch>>.

include::./data-streams.asciidoc[]

include::./ilm-how-to.asciidoc[]

[float]
[[storage-guide]]
=== Storage and sizing guide

Expand Down Expand Up @@ -71,7 +77,6 @@ APM data compresses quite well, so the storage cost in Elasticsearch will be con

NOTE: These examples were indexing the same data over and over with minimal variation. Because of that, the compression ratios observed of 80-90% are somewhat optimistic.

[float]
[[processing-and-performance]]
=== Processing and performance

Expand Down Expand Up @@ -117,7 +122,6 @@ This means that with a properly sized Elasticsearch instance, APM Server scales

NOTE: RUM deserves special consideration. The RUM agent runs in browsers, and there can be many thousands reporting to an APM Server with very variable network latency.

[float]
[[reduce-apm-storage]]
=== Reduce storage

Expand Down Expand Up @@ -212,3 +216,5 @@ POST *-apm-*/_update_by_query?expand_wildcards=all
// CONSOLE

TIP: Remember to also change the service name in the {apm-agents-ref}/index.html[APM agent configuration].

include::./apm-tune-elasticsearch.asciidoc[]