Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document data streams and custom index lifecycle policies #6553

Merged
merged 7 commits into from
Nov 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 83 additions & 2 deletions docs/data-streams.asciidoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,85 @@
[[apm-data-streams]]
== Data streams
=== Data streams

// to do: fill with content. placeholder for external links for now
****
{agent} uses data streams to store append-only time series data across multiple indices.
Data streams are well-suited for logs, metrics, traces, and other continuously generated data,
and offer a host of benefits over other indexing strategies:
* Reduced number of fields per index
* More granular data control
* Flexible naming scheme
* Fewer ingest permissions required
See the {fleet-guide}/data-streams.html[Fleet and Elastic Agent Guide] to learn more.
****

[discrete]
[[apm-data-streams-naming-scheme]]
=== Data stream naming scheme

APM data follows the `<type>-<dataset>-<namespace>` naming scheme.
The `type` and `dataset` are predefined by the APM integration,
but the `namespace` is your opportunity to customize how different types of data are stored in {es}.
There is no recommendation for what to use as your namespace--it is intentionally flexible.
For example, you might create namespaces for each of your environments,
like `dev`, `prod`, `production`, etc.
Or, you might create namespaces that correspond to strategic business units within your organization.

[discrete]
[[apm-data-streams-list]]
=== APM data streams

By type, the APM data streams are:

Traces::

Traces are comprised of {apm-guide-ref}/data-model.html[spans and transactions].
Traces are stored in the following data streams:

- Application traces: `traces-apm-<namespace>`
- RUM and iOS agent application traces: `traces-apm.rum-<namespace>`

Metrics::

Metrics include application-based metrics and basic system metrics.
Metrics are stored in the following data streams:

- APM internal metrics: `metrics-apm.internal-<namespace>`
- APM profiling metrics: `metrics-apm.profiling-<namespace>`
- Application metrics: `metrics-apm.app.<service.name>-<namespace>`
+
Application metrics include the instrumented service's name--defined in each APM agent's
configuration--in the data stream name.
Service names therefore must follow certain index naming rules.
+
[%collapsible]
.Service name rules
====
* Service names are case-insensitive and must be unique.
For example, you cannot have a service named `Foo` and another named `foo`.
* Special characters will be removed from service names and replaced with underscores (`_`).
Special characters include:
+
[source,text]
----
'\\', '/', '*', '?', '"', '<', '>', '|', ' ', ',', '#', ':', '-'
----
====

Logs::

Logs include application error events and application logs.
Logs are stored in the following data streams:

- APM error/exception logging: `logs-apm.error-<namespace>`

[discrete]
[[apm-data-streams-next]]
=== What's next?

* Data streams define not only how data is stored in {es}, but also how data is retained over time.
See <<ilm-how-to>> to learn how to create your own data retention policies.

* See <<manage-storage>> for information on APM storage and processing costs,
processing and performance, and other index management features.
9 changes: 0 additions & 9 deletions docs/how-to.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,11 @@
Learn how to perform common APM configuration and management tasks.

* <<source-map-how-to>>
* <<ilm-how-to>>
* <<jaeger-integration>>
* <<ingest-pipelines>>
* <<manage-storage>>
* <<apm-tune-elasticsearch>>

include::./source-map-how-to.asciidoc[]

include::./ilm-how-to.asciidoc[]

include::./jaeger-integration.asciidoc[]

include::./ingest-pipelines.asciidoc[]

include::./manage-storage.asciidoc[]

include::./apm-tune-elasticsearch.asciidoc[]
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved
170 changes: 158 additions & 12 deletions docs/ilm-how-to.asciidoc
Original file line number Diff line number Diff line change
@@ -1,18 +1,164 @@
[[ilm-how-to]]
=== Index lifecycle management (ILM)
=== Index lifecycle management

// todo: add more context and an example
Index lifecycle policies allow you to automate the
lifecycle of your APM indices as they grow and age.
A default policy is applied to each APM data stream,
but can be customized depending on your business needs.

++++
<titleabbrev>Customize index lifecycle management</titleabbrev>
++++
See {ref}/index-lifecycle-management.html[ILM: Manage the index lifecycle] to learn more.

The index lifecycle management (ILM) feature in {es} allows you to automate the
lifecycle of your APM Server indices as they grow and age.
ILM is enabled by default, and a default policy is applied to all APM indices.
[discrete]
[[index-lifecycle-policies-default]]
=== Default policies

To view and edit these index lifecycle policies in {kib},
select *Stack Management* / *Index Lifecycle Management*.
Search for `apm`.
The table below describes the default index lifecycle policy applied to each APM data stream.
Each policy includes a rollover and delete definition:

See {ref}/getting-started-index-lifecycle-management.html[manage the index lifecycle] for more information.
* **Rollover**: Using rollover indices prevents a single index from growing too large and optimizes indexing and search performance. Rollover, i.e. writing to a new index, occurs after either an age or size metric is met.
* **Delete**: The delete phase permanently removes the index after a time threshold is met.

[cols="1,1,1",options="header"]
|===
|Data stream
|Rollover after
|Delete after

|`traces-apm`
|30 days / 50 gb
|10 days

|`traces-apm.rum`
|30 days / 50 gb
|90 days

|`metrics-apm.profiling`
|30 days / 50 gb
|10 days

|`metrics-apm.internal`
|30 days / 50 gb
|90 days

|`metrics-apm.app`
|30 days / 50 gb
|90 days

|`logs-apm.error`
|30 days / 50 gb
|10 days

|===

The APM index lifecycle policies can be viewed in {kib}.
Navigate to *Stack Management* / *Index Lifecycle Management*, and search for `apm`.

[discrete]
[[data-streams-custom-policy]]
=== Configure a custom index lifecycle policy

When the APM package is installed, Fleet creates a default `*@custom` component template for each data stream.
The easiest way to configure a custom index lifecycle policy per data stream is to edit this template.

This tutorial explains how to apply a custom index lifecycle policy to the `traces-apm` data stream.

[discrete]
[[data-streams-custom-one]]
=== Step 1: View data streams

The **Data Streams** view in {kib} shows you the data streams,
index templates, and index lifecycle policies associated with a given integration.

. Navigate to **Stack Management** > **Index Management** > **Data Streams**.
. Search for `traces-apm` to see all data streams associated with APM trace data.
. In this example, I only have one data stream because I'm only using the `default` namespace.
You may have more if your setup includes multiple namespaces.
+
[role="screenshot"]
image::images/data-stream-overview.png[Data streams info]

[discrete]
[[data-streams-custom-two]]
=== Step 2: Create an index lifecycle policy

. Navigate to **Stack Management** > **Index Lifecycle Policies**.
. Click **Create policy**.

Name your new policy; For this tutorial, I've chosen `custom-traces-apm-policy`.
Customize the policy to your liking, and when you're done, click **Save policy**.

[discrete]
[[data-streams-custom-three]]
=== Step 3: Apply the index lifecycle policy

To apply your new index lifecylce policy to the `traces-apm-*` data stream,
edit the `<data-stream-name>@custom` component template.

. Click on the **Component Template** tab and search for `traces-apm`.
. Select the `traces-apm@custom` template and click **Manage** > **Edit**.
. Under **Index settings**, set the ILM policy name created in the previous step:
+
[source,json]
----
{
"lifecycle": {
"name": "custom-traces-apm-policy"
}
}
----
. Continue to **Review** and ensure your request looks similar to the image below.
If it does, click **Create component template**.
+
[role="screenshot"]
image::images/create-component-template.png[Create component template]

[discrete]
[[data-streams-custom-four]]
=== Step 4: Roll over the data stream (optional)

To confirm that the data stream is now using the new index template and ILM policy,
you can either repeat <<data-streams-custom-one,step one>>, or navigate to **Dev Tools ** and run the following:

[source,bash]
----
GET /_data_stream/traces-apm-default <1>
----
<1> The name of the data stream we've been hacking on appended with your <namespace>

The result should include the following:

[source,json]
----
{
"data_streams" : [
{
...
"template" : "traces-apm-default", <1>
"ilm_policy" : "custom-traces-apm-policy", <2>
...
}
]
}
----
<1> The name of the custom index template created in step three
<2> The name of the ILM policy applied to the new component template in step two

New ILM policies only take effect when new indices are created,
so you either must wait for a rollover to occur (usually after 30 days or when the index size reaches 50GB),
or force a rollover using the {ref}/indices-rollover-index.html[{es} rollover API]:

[source,bash]
----
POST /traces-apm-default/_rollover/
----

[discrete]
[[data-streams-custom-policy-namespace]]
=== Namespace-level index lifecycle policies

It is also possible to create more granular index lifecycle policies that apply to individual namespaces.
This process is similar to the above tutorial, but includes cloning and modify the existing index template to use
a new `*@custom` component template.

For more information on this process, see
{fleet-guide}/data-streams.html#data-streams-ilm-tutorial[Tutorial: Customize data retention for integrations]
Binary file added docs/images/create-component-template.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/data-stream-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/integrations-index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ include::features.asciidoc[]

include::how-to.asciidoc[]

include::input-apm.asciidoc[]
include::manage-storage.asciidoc[]

include::data-streams.asciidoc[]
include::input-apm.asciidoc[]

include::secure-agent-communication.asciidoc[]

Expand Down
67 changes: 13 additions & 54 deletions docs/manage-storage.asciidoc
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
[[manage-storage]]
=== Manage storage
== Manage storage

* <<storage-guide>>
* <<processing-and-performance>>
* <<reduce-apm-storage>>
* <<manage-indices-in-kibana>>
* <<update-data>>
{agent} uses <<apm-data-streams,data streams>> to store time series data across multiple indices.
Each data stream ships with a customizable <<ilm-how-to,index lifecycle policy>> that automates data retention as your indices grow and age.

The <<storage-guide,storage and sizing guide>> attempts to define a "typical" storage reference for Elastic APM,
and there are additional settings you can tweak to <<reduce-apm-storage,reduce storage>>,
or to <<apm-tune-elasticsearch,tune data ingestion in Elasticsearch>>.

include::./data-streams.asciidoc[]

include::./ilm-how-to.asciidoc[]

[float]
[[storage-guide]]
=== Storage and sizing guide

Expand Down Expand Up @@ -71,53 +75,6 @@ APM data compresses quite well, so the storage cost in Elasticsearch will be con

NOTE: These examples were indexing the same data over and over with minimal variation. Because of that, the compression ratios observed of 80-90% are somewhat optimistic.

[float]
[[processing-and-performance]]
=== Processing and performance

APM Server performance depends on a number of factors: memory and CPU available,
network latency, transaction sizes, workload patterns,
agent and server settings, versions, and protocol.

Let's look at a simple example that makes the following assumptions:

* The load is generated in the same region as where APM Server and Elasticsearch are deployed.
* We're using the default settings in cloud.
* A small number of agents are reporting.

This leaves us with relevant variables like payload and instance sizes.
See the table below for approximations.
As a reminder, events are
<<data-model-transactions,transactions>> and
<<data-model-spans,spans>>.

[options="header"]
|=======================================================================
|Transaction/Instance |512Mb Instance |2Gb Instance |8Gb Instance
|Small transactions

_5 spans with 5 stack frames each_ |600 events/second |1200 events/second |4800 events/second
|Medium transactions

_15 spans with 15 stack frames each_ |300 events/second |600 events/second |2400 events/second
|Large transactions

_30 spans with 30 stack frames each_ |150 events/second |300 events/second |1400 events/second
|=======================================================================

In other words, a 512 Mb instance can process \~3 Mbs per second,
while an 8 Gb instance can process ~20 Mbs per second.

APM Server is CPU bound, so it scales better from 2 Gb to 8 Gb than it does from 512 Mb to 2 Gb.
This is because larger instance types in Elastic Cloud come with much more computing power.

Don't forget that the APM Server is stateless.
Several instances running do not need to know about each other.
This means that with a properly sized Elasticsearch instance, APM Server scales out linearly.

NOTE: RUM deserves special consideration. The RUM agent runs in browsers, and there can be many thousands reporting to an APM Server with very variable network latency.

[float]
[[reduce-apm-storage]]
=== Reduce storage

Expand Down Expand Up @@ -212,3 +169,5 @@ POST *-apm-*/_update_by_query?expand_wildcards=all
// CONSOLE

TIP: Remember to also change the service name in the {apm-agents-ref}/index.html[APM agent configuration].

include::./apm-tune-elasticsearch.asciidoc[]
Loading