Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Extension] Implement ECS Observer #1920

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions extension/observer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ Currently the only component that uses observers is the [receiver_creator](../..

* [k8sobserver](k8sobserver/README.md)
* [hostobserver](hostobserver/README.md)
* [ecsobserver](ecsobserver/README.md)
1 change: 1 addition & 0 deletions extension/observer/ecsobserver/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ../../../Makefile.Common
150 changes: 150 additions & 0 deletions extension/observer/ecsobserver/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# ECS Observer Extension

**Status: beta**

The `ecs_observer` uses the ECS/EC2 API to discover tasks and containers running in an ECS cluster.

There are 2 modes to discover the targets based on the config:
* **Mode 1: Docker label-Based:** Add docker labels to the containers to indicate the port and metric path. The ECS Observer can then be configured to discover targets with the matching docker label and container port.
* **Mode 2: ECS Task Definition ARN-based:** The ECS Observer can be configured to discover targets when the target's ECS task definition ARN matches the configured regex and it matches the configured container ports.

Two modes can be enabled together and the ECS Observer will de-duplicate the discovered targets based on: *{private_ip}:{port}/{metrics_path}*

### Configuration

| Name | | Description |
|---------------------|-------------|----------------------------------------------------------------|
| cluster_name | Mandatory | target ECS cluster name for service discovery |
| cluster_region | Mandatory | target ECS cluster's AWS region name |
| refresh_interval | Optional | how often to look for changes in endpoints (default: 10s) |
| sd_result_file | Optional | path of YAML file for the scrape target results. If this is enabled, the default endpoint update will be disabled and listeners will not get updates |
| docker_label | Optional | docker label-based service discovery configuration (see [here](#docker-label-based-service-discovery-configuration)). If this is enabled, docker label-based SD will be enabled |
| task_definitions | Optional | list of task definition ARN-based service discovery configurations (see [here](#task-definition-arn-based-service-discovery-configuration)). If this list is non-empty, task definition ARN-based SD will be enabled |

#### Docker Label-Based Service Discovery Configuration

| Name | | Description |
|---------------------|-------------|---------------------------------------------------------------|
| port_label | Mandatory | container's docker label name that specifies the metrics port |
| job_name_label | Optional | container's docker label name that specifies the scrape job name. (Default: "") |
| metrics_path_label | Optional | container's docker label name that specifies the metrics path. (Default: "") |

#### Task Definition ARN-Based Service Discovery Configuration

| Name | | Description |
|---------------------|-------------|---------------------------------------------------------------|
| task_definition_arn_pattern | Mandatory | Regex pattern to match against ECS task definition ARN |
| metrics_ports | Mandatory | container ports separated by semicolon. Only containers that expose these ports will be discovered |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a list?

| container_name_pattern | Optional | ECS task container name regex pattern |
| metrics_path | Optional | metrics path. (Default: "") |
| job_name | Optional | Scrape job name. (Default: "") |

### Endpoint Variables

Endpoint variables exposed by this observer are as follows.

| Variable | Description |
|-------------|--------------------------------------------------------------------|
| type.task | `true` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would call this ecstask. The existing ones are kind of generic naming (e.g. port) but there's an issue to make them more type-specific.

| port | port number |
| metricsPath | metrics path |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endpoint variables are usually of the form metrics_path

| labels | labels generated by the ECS Observer. Mainly used with Prometheus. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these docker labels? I think they'd probably be used in discovery rules as well like:

type.ecstask && labels["app"] == "redis"

See related comment below.


### AWS Permissions
The following permissions need to be granted so that the ECS Observer can query the ECS/EC2 API to get the task metadata:
- ECS:ListTasks,
- ECS:DescribeContainerInstances,
- ECS:DescribeTasks,
- ECS:DescribeTaskDefinition
- EC2:DescribeInstances

### Writing targets to a static file

There are 2 ways to export discovered endpoints:

1. Notification-based: this is the default method where another component listens in on endpoint updates.
2. Write to a file: if `sd_result_file` is configured, the scraped targets will be written to the file path given by `sd_result_file`. The default notification-based behaviour will be disabled. This is mainly used in conjunction with the Prometheus receiver.

The main motivation of this method is that using a single Prometheus receiver to scrape multiple targets is currently more efficient than spawning a separate receiver for each scrape target using the receiver creator. For a more detailed discussion, refer to [this issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/1395).

An example of the contents of the `sd_result_file` looks like this:
```yaml
- targets:
- 10.6.1.95:32785
labels:
__metrics_path__: /metrics
ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_B: "9406"
ECS_PROMETHEUS_JOB_NAME: demo-jar-ec2-bridge-subset-b-dynamic
ECS_PROMETHEUS_METRICS_PATH: /metrics
InstanceType: t3.medium
LaunchType: EC2
SubnetId: subnet-0347624eeea6c5969
TaskDefinitionFamily: demo-jar-ec2-bridge-dynamic-port-subset-b
TaskGroup: family:demo-jar-ec2-bridge-dynamic-port-subset-b
TaskRevision: "7"
VpcId: vpc-033b021cd7ecbcedb
container_name: demo-jar-ec2-bridge-dynamic-port-subset-b
job: task_def_2
- targets:
- 10.6.1.95:32783
labels:
__metrics_path__: /metrics
ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_B: "9406"
ECS_PROMETHEUS_JOB_NAME: demo-jar-ec2-bridge-subset-b-dynamic
ECS_PROMETHEUS_METRICS_PATH: /metrics
InstanceType: t3.medium
LaunchType: EC2
SubnetId: subnet-0347624eeea6c5969
TaskDefinitionFamily: demo-jar-ec2-bridge-dynamic-port-subset-b
TaskGroup: family:demo-jar-ec2-bridge-dynamic-port-subset-b
TaskRevision: "7"
VpcId: vpc-033b021cd7ecbcedb
Comment on lines +96 to +101
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like these should be individual entries in the Endpoint struct so a user can use them easily in a discovery rule and are documented.

container_name: demo-jar-ec2-bridge-dynamic-port-subset-b
job: task_def_2
```

## Example

```yaml
extensions:
ecs_observer:
refresh_interval: 15s
cluster_name: 'Cluster-1'
cluster_region: 'us-west-2'
sd_result_file: '/etc/ecs_sd_targets.yaml'
docker_label:
job_name_label: 'ECS_PROMETHEUS_JOB_NAME'
metrics_path_label: 'ECS_PROMETHEUS_METRICS_PATH'
port_label: 'ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_A'
task_definitions:
- job_name: 'task_def_1'
metrics_path: '/metrics'
metrics_ports: '9113;9090'
task_definition_arn_pattern: '.*:task-definition/nginx:[0-9]+'

receivers:
prometheus:
config:
scrape_configs:
- job_name: "task_def_1"
file_sd_configs:
- files:
- '/etc/ecs_sd_targets.yaml'

processors:
exampleprocessor:

exporters:
exampleexporter:

service:
pipelines:
metrics:
receivers: [receiver_creator]
processors: [exampleprocessor]
exporters: [exampleexporter]
extensions: [ecs_observer]
```

The full list of settings exposed for this receiver are documented [here](./config.go)
with detailed sample configurations [here](./testdata/config.yaml).
93 changes: 93 additions & 0 deletions extension/observer/ecsobserver/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package ecsobserver

import (
"regexp"
"strconv"
"strings"
"time"

"go.opentelemetry.io/collector/config/configmodels"
"go.uber.org/zap"
)

const (
portSeparator = ";"
)

// DockerLabelConfig defines the configuration fo docker label-based service discovery.
type DockerLabelConfig struct {
JobNameLabel string `mapstructure:"job_name_label"`
PortLabel string `mapstructure:"port_label"`
MetricsPathLabel string `mapstructure:"metrics_path_label"`
}

// TaskDefinitionConfig defines the configuration for task definition-based service discovery.
type TaskDefinitionConfig struct {
ContainerNamePattern string `mapstructure:"container_name_pattern"`
JobName string `mapstructure:"job_name"`
MetricsPath string `mapstructure:"metrics_path"`
MetricsPorts string `mapstructure:"metrics_ports"`
TaskDefArnPattern string `mapstructure:"task_definition_arn_pattern"`

containerNameRegex *regexp.Regexp
taskDefRegex *regexp.Regexp
metricsPortList []int
}

// init initializes the task definition config by compiling regex patterns and extracting
// the list of metric ports.
func (t *TaskDefinitionConfig) init() {
t.taskDefRegex = regexp.MustCompile(t.TaskDefArnPattern)

if t.ContainerNamePattern != "" {
t.containerNameRegex = regexp.MustCompile(t.ContainerNamePattern)
}

ports := strings.Split(t.MetricsPorts, portSeparator)
for _, v := range ports {
if port, err := strconv.Atoi(strings.TrimSpace(v)); err != nil || port < 0 {
continue
} else {
t.metricsPortList = append(t.metricsPortList, port)
}
}
}

// Config defines the configuration for ECS observers.
type Config struct {
configmodels.ExtensionSettings `mapstructure:",squash"`

// RefreshInterval determines how frequency at which the observer
// needs to poll for collecting information about new processes.
RefreshInterval time.Duration `mapstructure:"refresh_interval"`
// ClusterName is the target ECS cluster name for service discovery.
ClusterName string `mapstructure:"cluster_name"`
// ClusterRegion is the target ECS cluster's AWS region.
ClusterRegion string `mapstructure:"cluster_region"`
// ResultFile is the output path of the discovered targets YAML file (optional).
// This is mainly used in conjunction with the Prometheus receiver.
ResultFile string `mapstructure:"sd_result_file"`
// DockerLabel provides the configuration for docker label-based service discovery.
// If this is not provided, docker label-based service discovery is disabled.
DockerLabel *DockerLabelConfig `mapstructure:"docker_label"`
// TaskDefinitions is a list of task definition configurations for task
// definition-based service discovery (optional). If this is not provided,
// task definition-based SD is disabled.
TaskDefinitions []*TaskDefinitionConfig `mapstructure:"task_definitions"`

logger *zap.Logger
}
104 changes: 104 additions & 0 deletions extension/observer/ecsobserver/config_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package ecsobserver

import (
"path"
"testing"
"time"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/component/componenttest"
"go.opentelemetry.io/collector/config/configmodels"
"go.opentelemetry.io/collector/config/configtest"
"go.uber.org/zap"
)

func TestLoadConfig(t *testing.T) {
factories, err := componenttest.ExampleComponents()
assert.NoError(t, err)

factory := NewFactory()
factories.Extensions[typeStr] = factory
cfg, err := configtest.LoadConfigFile(t, path.Join(".", "testdata", "config.yaml"), factories)

require.Nil(t, err)
require.NotNil(t, cfg)

require.Len(t, cfg.Extensions, 2)

ext0 := cfg.Extensions["ecs_observer"]
assert.Equal(t, factory.CreateDefaultConfig(), ext0)

ext1 := cfg.Extensions["ecs_observer/1"]
assert.Equal(t,
&Config{
ExtensionSettings: configmodels.ExtensionSettings{
TypeVal: "ecs_observer",
NameVal: "ecs_observer/1",
},
RefreshInterval: 15 * time.Second,
ClusterName: "EC2-Testing",
ClusterRegion: "us-west-2",
ResultFile: "/opt/aws/amazon-cloudwatch-agent/etc/ecs_sd_targets.yaml",
DockerLabel: &DockerLabelConfig{
JobNameLabel: "ECS_PROMETHEUS_JOB_NAME",
MetricsPathLabel: "ECS_PROMETHEUS_METRICS_PATH",
PortLabel: "ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_A",
},
TaskDefinitions: []*TaskDefinitionConfig{
{
JobName: "task_def_1",
MetricsPath: "/stats/metrics",
MetricsPorts: "9901;9404;9406",
TaskDefArnPattern: ".*:task-definition/bugbash-java-fargate-awsvpc-task-def-only:[0-9]+",
},
{
ContainerNamePattern: "^bugbash-jar.*$",
MetricsPorts: "9902",
TaskDefArnPattern: ".*:task-definition/nginx:[0-9]+",
},
},
logger: zap.NewNop(),
},
ext1,
)
}

func TestTaskDefinitionConfigInit(t *testing.T) {
config := TaskDefinitionConfig{
JobName: "test_job_1",
MetricsPorts: "11;12; 13 ;a;14 ",
TaskDefArnPattern: "^task.*$",
}

config.init()
assert.Nil(t, config.containerNameRegex)
assert.True(t, config.taskDefRegex.MatchString("task12"))
assert.False(t, config.taskDefRegex.MatchString("atask12"))
assert.Equal(t, config.metricsPortList, []int{11, 12, 13, 14})

config = TaskDefinitionConfig{
ContainerNamePattern: "^container.*$",
MetricsPorts: "a;b; c ;d;e ",
}

config.init()
assert.NotNil(t, config.containerNameRegex)
assert.True(t, config.containerNameRegex.MatchString("container12"))
assert.False(t, config.containerNameRegex.MatchString("acontainer12"))
assert.Nil(t, config.metricsPortList)
}
Loading