-
Notifications
You must be signed in to change notification settings - Fork 487
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs for the Grafana Agent Operator (#651)
* docs for the Grafana Agent Operator * fix identation of nested lists * Update docs/operator/README.md Co-authored-by: Mario <mariorvinas@gmail.com> * more detail in README * describe why CRDs * mirror docs/operator/README.md intro to cmd/agent-operator/README.md Co-authored-by: Mario <mariorvinas@gmail.com>
- Loading branch information
Showing
7 changed files
with
519 additions
and
80 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Grafana Agent Operator | ||
|
||
The Grafana Agent Operator is a Kubernetes operator that makes it easier to | ||
deploy the Grafana Agent and easier to collect telemetry data from your pods. | ||
|
||
It works by watching for [Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) | ||
that specify how you would like to collect telemetry data from your Kubernetes | ||
cluster and where you would like to send it. They abstract Kubernetes-specific | ||
configuration that is more tedious to perform manually. The Grafana Agent | ||
Operator manages corresponding Grafana Agent deployments in your cluster by | ||
watching for changes against the custom resources. | ||
|
||
Metric collection is based on the [Prometheus | ||
Operator](https://github.com/prometheus-operator/prometheus-operator) and | ||
supports the official v1 ServiceMonitor, PodMonitor, and Probe CRDs from the | ||
project. These custom resources represent abstractions for monitoring services, | ||
pods, and ingresses. They are especially useful for Helm users, where manually | ||
writing a generic SD to match all your charts can be difficult (or impossible!) | ||
or where manually writing a specific SD for each chart can be tedious. | ||
|
||
## Table of Contents | ||
|
||
1. [Getting Started](./getting-started.md) | ||
1. [Deploying CustomResourceDefinitions](./getting-started.md#deploying-customresourcedefinitions) | ||
2. [Installing on Kubernetes](./getting-started.md#installing-on-kubernetes) | ||
3. [Running locally](./getting-started.md#running-locally) | ||
4. [Deploying GrafanaAgent](./getting-started.md#deploying-grafanagent) | ||
2. [FAQ](./faq.md) | ||
3. [Architecture](./architecture.md) | ||
4. [Maintainers Guide](./maintainers-guide.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# Architecture | ||
|
||
This guide gives a high-level overview of how the Grafana Agent Operator | ||
works. Refer to the [maintainer's guide](./maintainers-guide.md) for | ||
detailed lower-level information targeted at maintainers. | ||
|
||
The Grafana Agent Operator works in two phases: | ||
|
||
1. Discover a hierarchy of custom resources | ||
2. Reconcile that hierarchy into a Grafana Agent deployment | ||
|
||
## Custom Resource Hierarchy | ||
|
||
The root of the custom resource hierarchy is the `GrafanaAgent` resource. It is | ||
primary resource the Operator looks for, and is called the "root" because it | ||
discovers many other sub-resources. | ||
|
||
The full hierarchy of custom resources is as follows: | ||
|
||
1. `GrafanaAgent` | ||
1. `PrometheusInstance` | ||
1. `PodMonitor` | ||
2. `Probe` | ||
3. `ServiceMonitor` | ||
|
||
Most of the resources above have the ability to reference a ConfigMap or a | ||
Secret. All referenced ConfigMaps or Secrets are added into the resource | ||
hierarchy. | ||
|
||
When a hierarchy is established, each item is watched for changes. Any changed | ||
item will cause a reconcile of the root GrafanaAgent resource, either | ||
creating, modifying, or deleting the corresponding Grafana Agent deployment. | ||
|
||
A single resource can belong to multiple hierarchies. For example, if two | ||
GrafanaAgents use the same Probe, modifying that Probe will cause both | ||
GrafanaAgents to be reconciled. | ||
|
||
## Reconcile | ||
|
||
When a resource hierarchy is created, updated, or deleted, a reconcile occurs. | ||
When a GrafanaAgent resource is deleted, the corresponding Grafana Agent | ||
deployment will also be deleted. | ||
|
||
Reconciling creates a few cluster resources: | ||
|
||
1. A Secret is generated holding the | ||
[configuration](../configuration-reference.md) of the Grafana Agent. | ||
2. Another Secret is created holding all referenced Secrets or ConfigMaps from | ||
the resource hierarchy. This ensures that Secrets referenced from a custom | ||
resource in another namespace can still be read. | ||
3. A Service is created to govern the created StatefulSets. | ||
4. One StatefulSet per Prometheus shard is created. | ||
|
||
PodMonitors, Probes, and ServiceMonitors are turned into individual scrape jobs | ||
which all use Kubernetes SD. | ||
|
||
## Sharding and Replication | ||
|
||
The GrafanaAgent resource can specify a number of shards. Each shard results in | ||
the creation of a StatefulSet with a hashmod + keep relabel_config per job: | ||
|
||
```yaml | ||
- source_labels: [__address__] | ||
target_label: __tmp_hash | ||
modulus: NUM_SHARDS | ||
action: hashmod | ||
- source_labels: [__tmp_hash] | ||
regex: CURRENT_STATEFULSET_SHARD | ||
action: keep | ||
``` | ||
This allows for some decent horizontal scaling capabilities, where each shard | ||
will handle roughly 1/N of the total scrape load. Note that this does not use | ||
consistent hashing, which means changing the number of shards will cause | ||
anywhere between 1/N to N targets to reshuffle. | ||
The sharding mechanism is borrowed from the Prometheus Operator. | ||
The number of replicas can be defined, similarly to the number of shards. This | ||
creates duplicate shards. This must be paired with a remote_write system that | ||
can perform HA duplication. Grafana Cloud and Cortex provide this out of the | ||
box, and the Grafana Agent Operator defaults support these two systems. | ||
The total number of created metrics pods will be product of `numShards * | ||
numReplicas`. | ||
|
||
## Labels | ||
|
||
Two labels are added by default to every metric: | ||
|
||
- `cluster`, representing the `GrafanaAgent` deployment. Holds the value of | ||
`<GrafanaAgent.metadata.namespace>/<GrafanaAgent.metadata.name>`. | ||
- `__replica__`, representing the replica number of the Agent. This label works | ||
out of the box with Grafana Cloud and Cortex's [HA | ||
deduplication](https://cortexmetrics.io/docs/guides/ha-pair-handling/). | ||
|
||
The shard number is not added as a label, as sharding is designed to be | ||
transparent on the receiver end. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# FAQ | ||
|
||
## Where do I find information on the supported values for the CustomResourceDefinitions? | ||
|
||
Once you've [deployed the CustomResourceDefinitions](./getting-started.md#deploying-customresourcedefinitions) | ||
to your Kubernetes cluster, use `kubectl explain <resource>` to get access to | ||
the documentation for each resource. For example, `kubectl explain GrafanaAgent` | ||
will describe the GrafanaAgent CRD, and `kubectl explain GrafanaAgent.spec` will | ||
give you information on its spec field. |
Oops, something went wrong.