-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Extension] Implement ECS Observer #1920
Changes from all commits
ff6d966
a10d252
d87dea2
5a173e5
d8fff43
b53eea8
94bf3a5
370e537
921921f
8eac694
2d5dbef
65fc92b
52d716b
dc82385
62049be
8918ab7
f5a1754
608aef7
d55f5cf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
include ../../../Makefile.Common |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
# ECS Observer Extension | ||
|
||
**Status: beta** | ||
|
||
The `ecs_observer` uses the ECS/EC2 API to discover tasks and containers running in an ECS cluster. | ||
|
||
There are 2 modes to discover the targets based on the config: | ||
* **Mode 1: Docker label-Based:** Add docker labels to the containers to indicate the port and metric path. The ECS Observer can then be configured to discover targets with the matching docker label and container port. | ||
* **Mode 2: ECS Task Definition ARN-based:** The ECS Observer can be configured to discover targets when the target's ECS task definition ARN matches the configured regex and it matches the configured container ports. | ||
|
||
Two modes can be enabled together and the ECS Observer will de-duplicate the discovered targets based on: *{private_ip}:{port}/{metrics_path}* | ||
|
||
### Configuration | ||
|
||
| Name | | Description | | ||
|---------------------|-------------|----------------------------------------------------------------| | ||
| cluster_name | Mandatory | target ECS cluster name for service discovery | | ||
| cluster_region | Mandatory | target ECS cluster's AWS region name | | ||
| refresh_interval | Optional | how often to look for changes in endpoints (default: 10s) | | ||
| sd_result_file | Optional | path of YAML file for the scrape target results. If this is enabled, the default endpoint update will be disabled and listeners will not get updates | | ||
| docker_label | Optional | docker label-based service discovery configuration (see [here](#docker-label-based-service-discovery-configuration)). If this is enabled, docker label-based SD will be enabled | | ||
| task_definitions | Optional | list of task definition ARN-based service discovery configurations (see [here](#task-definition-arn-based-service-discovery-configuration)). If this list is non-empty, task definition ARN-based SD will be enabled | | ||
|
||
#### Docker Label-Based Service Discovery Configuration | ||
|
||
| Name | | Description | | ||
|---------------------|-------------|---------------------------------------------------------------| | ||
| port_label | Mandatory | container's docker label name that specifies the metrics port | | ||
| job_name_label | Optional | container's docker label name that specifies the scrape job name. (Default: "") | | ||
| metrics_path_label | Optional | container's docker label name that specifies the metrics path. (Default: "") | | ||
|
||
#### Task Definition ARN-Based Service Discovery Configuration | ||
|
||
| Name | | Description | | ||
|---------------------|-------------|---------------------------------------------------------------| | ||
| task_definition_arn_pattern | Mandatory | Regex pattern to match against ECS task definition ARN | | ||
| metrics_ports | Mandatory | container ports separated by semicolon. Only containers that expose these ports will be discovered | | ||
| container_name_pattern | Optional | ECS task container name regex pattern | | ||
| metrics_path | Optional | metrics path. (Default: "") | | ||
| job_name | Optional | Scrape job name. (Default: "") | | ||
|
||
### Endpoint Variables | ||
|
||
Endpoint variables exposed by this observer are as follows. | ||
|
||
| Variable | Description | | ||
|-------------|--------------------------------------------------------------------| | ||
| type.task | `true` | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would call this ecstask. The existing ones are kind of generic naming (e.g. port) but there's an issue to make them more type-specific. |
||
| port | port number | | ||
| metricsPath | metrics path | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Endpoint variables are usually of the form |
||
| labels | labels generated by the ECS Observer. Mainly used with Prometheus. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these docker labels? I think they'd probably be used in discovery rules as well like:
See related comment below. |
||
|
||
### AWS Permissions | ||
The following permissions need to be granted so that the ECS Observer can query the ECS/EC2 API to get the task metadata: | ||
- ECS:ListTasks, | ||
- ECS:DescribeContainerInstances, | ||
- ECS:DescribeTasks, | ||
- ECS:DescribeTaskDefinition | ||
- EC2:DescribeInstances | ||
|
||
### Writing targets to a static file | ||
|
||
There are 2 ways to export discovered endpoints: | ||
|
||
1. Notification-based: this is the default method where another component listens in on endpoint updates. | ||
2. Write to a file: if `sd_result_file` is configured, the scraped targets will be written to the file path given by `sd_result_file`. The default notification-based behaviour will be disabled. This is mainly used in conjunction with the Prometheus receiver. | ||
|
||
The main motivation of this method is that using a single Prometheus receiver to scrape multiple targets is currently more efficient than spawning a separate receiver for each scrape target using the receiver creator. For a more detailed discussion, refer to [this issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/1395). | ||
|
||
An example of the contents of the `sd_result_file` looks like this: | ||
```yaml | ||
- targets: | ||
- 10.6.1.95:32785 | ||
labels: | ||
__metrics_path__: /metrics | ||
ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_B: "9406" | ||
ECS_PROMETHEUS_JOB_NAME: demo-jar-ec2-bridge-subset-b-dynamic | ||
ECS_PROMETHEUS_METRICS_PATH: /metrics | ||
InstanceType: t3.medium | ||
LaunchType: EC2 | ||
SubnetId: subnet-0347624eeea6c5969 | ||
TaskDefinitionFamily: demo-jar-ec2-bridge-dynamic-port-subset-b | ||
TaskGroup: family:demo-jar-ec2-bridge-dynamic-port-subset-b | ||
TaskRevision: "7" | ||
VpcId: vpc-033b021cd7ecbcedb | ||
container_name: demo-jar-ec2-bridge-dynamic-port-subset-b | ||
job: task_def_2 | ||
- targets: | ||
- 10.6.1.95:32783 | ||
labels: | ||
__metrics_path__: /metrics | ||
ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_B: "9406" | ||
ECS_PROMETHEUS_JOB_NAME: demo-jar-ec2-bridge-subset-b-dynamic | ||
ECS_PROMETHEUS_METRICS_PATH: /metrics | ||
InstanceType: t3.medium | ||
LaunchType: EC2 | ||
SubnetId: subnet-0347624eeea6c5969 | ||
TaskDefinitionFamily: demo-jar-ec2-bridge-dynamic-port-subset-b | ||
TaskGroup: family:demo-jar-ec2-bridge-dynamic-port-subset-b | ||
TaskRevision: "7" | ||
VpcId: vpc-033b021cd7ecbcedb | ||
Comment on lines
+96
to
+101
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like these should be individual entries in the Endpoint struct so a user can use them easily in a discovery rule and are documented. |
||
container_name: demo-jar-ec2-bridge-dynamic-port-subset-b | ||
job: task_def_2 | ||
``` | ||
|
||
## Example | ||
|
||
```yaml | ||
extensions: | ||
ecs_observer: | ||
refresh_interval: 15s | ||
cluster_name: 'Cluster-1' | ||
cluster_region: 'us-west-2' | ||
sd_result_file: '/etc/ecs_sd_targets.yaml' | ||
docker_label: | ||
job_name_label: 'ECS_PROMETHEUS_JOB_NAME' | ||
metrics_path_label: 'ECS_PROMETHEUS_METRICS_PATH' | ||
port_label: 'ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_A' | ||
task_definitions: | ||
- job_name: 'task_def_1' | ||
metrics_path: '/metrics' | ||
metrics_ports: '9113;9090' | ||
task_definition_arn_pattern: '.*:task-definition/nginx:[0-9]+' | ||
|
||
receivers: | ||
prometheus: | ||
config: | ||
scrape_configs: | ||
- job_name: "task_def_1" | ||
file_sd_configs: | ||
- files: | ||
- '/etc/ecs_sd_targets.yaml' | ||
|
||
processors: | ||
exampleprocessor: | ||
|
||
exporters: | ||
exampleexporter: | ||
|
||
service: | ||
pipelines: | ||
metrics: | ||
receivers: [receiver_creator] | ||
processors: [exampleprocessor] | ||
exporters: [exampleexporter] | ||
extensions: [ecs_observer] | ||
``` | ||
|
||
The full list of settings exposed for this receiver are documented [here](./config.go) | ||
with detailed sample configurations [here](./testdata/config.yaml). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package ecsobserver | ||
|
||
import ( | ||
"regexp" | ||
"strconv" | ||
"strings" | ||
"time" | ||
|
||
"go.opentelemetry.io/collector/config/configmodels" | ||
"go.uber.org/zap" | ||
) | ||
|
||
const ( | ||
portSeparator = ";" | ||
) | ||
|
||
// DockerLabelConfig defines the configuration fo docker label-based service discovery. | ||
type DockerLabelConfig struct { | ||
JobNameLabel string `mapstructure:"job_name_label"` | ||
PortLabel string `mapstructure:"port_label"` | ||
MetricsPathLabel string `mapstructure:"metrics_path_label"` | ||
} | ||
|
||
// TaskDefinitionConfig defines the configuration for task definition-based service discovery. | ||
type TaskDefinitionConfig struct { | ||
ContainerNamePattern string `mapstructure:"container_name_pattern"` | ||
JobName string `mapstructure:"job_name"` | ||
MetricsPath string `mapstructure:"metrics_path"` | ||
MetricsPorts string `mapstructure:"metrics_ports"` | ||
TaskDefArnPattern string `mapstructure:"task_definition_arn_pattern"` | ||
|
||
containerNameRegex *regexp.Regexp | ||
taskDefRegex *regexp.Regexp | ||
metricsPortList []int | ||
} | ||
|
||
// init initializes the task definition config by compiling regex patterns and extracting | ||
// the list of metric ports. | ||
func (t *TaskDefinitionConfig) init() { | ||
t.taskDefRegex = regexp.MustCompile(t.TaskDefArnPattern) | ||
|
||
if t.ContainerNamePattern != "" { | ||
t.containerNameRegex = regexp.MustCompile(t.ContainerNamePattern) | ||
} | ||
|
||
ports := strings.Split(t.MetricsPorts, portSeparator) | ||
for _, v := range ports { | ||
if port, err := strconv.Atoi(strings.TrimSpace(v)); err != nil || port < 0 { | ||
continue | ||
} else { | ||
t.metricsPortList = append(t.metricsPortList, port) | ||
} | ||
} | ||
} | ||
|
||
// Config defines the configuration for ECS observers. | ||
type Config struct { | ||
configmodels.ExtensionSettings `mapstructure:",squash"` | ||
|
||
// RefreshInterval determines how frequency at which the observer | ||
// needs to poll for collecting information about new processes. | ||
RefreshInterval time.Duration `mapstructure:"refresh_interval"` | ||
// ClusterName is the target ECS cluster name for service discovery. | ||
ClusterName string `mapstructure:"cluster_name"` | ||
// ClusterRegion is the target ECS cluster's AWS region. | ||
ClusterRegion string `mapstructure:"cluster_region"` | ||
// ResultFile is the output path of the discovered targets YAML file (optional). | ||
// This is mainly used in conjunction with the Prometheus receiver. | ||
ResultFile string `mapstructure:"sd_result_file"` | ||
// DockerLabel provides the configuration for docker label-based service discovery. | ||
// If this is not provided, docker label-based service discovery is disabled. | ||
DockerLabel *DockerLabelConfig `mapstructure:"docker_label"` | ||
// TaskDefinitions is a list of task definition configurations for task | ||
// definition-based service discovery (optional). If this is not provided, | ||
// task definition-based SD is disabled. | ||
TaskDefinitions []*TaskDefinitionConfig `mapstructure:"task_definitions"` | ||
|
||
logger *zap.Logger | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package ecsobserver | ||
|
||
import ( | ||
"path" | ||
"testing" | ||
"time" | ||
|
||
"github.com/stretchr/testify/assert" | ||
"github.com/stretchr/testify/require" | ||
"go.opentelemetry.io/collector/component/componenttest" | ||
"go.opentelemetry.io/collector/config/configmodels" | ||
"go.opentelemetry.io/collector/config/configtest" | ||
"go.uber.org/zap" | ||
) | ||
|
||
func TestLoadConfig(t *testing.T) { | ||
factories, err := componenttest.ExampleComponents() | ||
assert.NoError(t, err) | ||
|
||
factory := NewFactory() | ||
factories.Extensions[typeStr] = factory | ||
cfg, err := configtest.LoadConfigFile(t, path.Join(".", "testdata", "config.yaml"), factories) | ||
|
||
require.Nil(t, err) | ||
require.NotNil(t, cfg) | ||
|
||
require.Len(t, cfg.Extensions, 2) | ||
|
||
ext0 := cfg.Extensions["ecs_observer"] | ||
assert.Equal(t, factory.CreateDefaultConfig(), ext0) | ||
|
||
ext1 := cfg.Extensions["ecs_observer/1"] | ||
assert.Equal(t, | ||
&Config{ | ||
ExtensionSettings: configmodels.ExtensionSettings{ | ||
TypeVal: "ecs_observer", | ||
NameVal: "ecs_observer/1", | ||
}, | ||
RefreshInterval: 15 * time.Second, | ||
ClusterName: "EC2-Testing", | ||
ClusterRegion: "us-west-2", | ||
ResultFile: "/opt/aws/amazon-cloudwatch-agent/etc/ecs_sd_targets.yaml", | ||
DockerLabel: &DockerLabelConfig{ | ||
JobNameLabel: "ECS_PROMETHEUS_JOB_NAME", | ||
MetricsPathLabel: "ECS_PROMETHEUS_METRICS_PATH", | ||
PortLabel: "ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_A", | ||
}, | ||
TaskDefinitions: []*TaskDefinitionConfig{ | ||
{ | ||
JobName: "task_def_1", | ||
MetricsPath: "/stats/metrics", | ||
MetricsPorts: "9901;9404;9406", | ||
TaskDefArnPattern: ".*:task-definition/bugbash-java-fargate-awsvpc-task-def-only:[0-9]+", | ||
}, | ||
{ | ||
ContainerNamePattern: "^bugbash-jar.*$", | ||
MetricsPorts: "9902", | ||
TaskDefArnPattern: ".*:task-definition/nginx:[0-9]+", | ||
}, | ||
}, | ||
logger: zap.NewNop(), | ||
}, | ||
ext1, | ||
) | ||
} | ||
|
||
func TestTaskDefinitionConfigInit(t *testing.T) { | ||
config := TaskDefinitionConfig{ | ||
JobName: "test_job_1", | ||
MetricsPorts: "11;12; 13 ;a;14 ", | ||
TaskDefArnPattern: "^task.*$", | ||
} | ||
|
||
config.init() | ||
assert.Nil(t, config.containerNameRegex) | ||
assert.True(t, config.taskDefRegex.MatchString("task12")) | ||
assert.False(t, config.taskDefRegex.MatchString("atask12")) | ||
assert.Equal(t, config.metricsPortList, []int{11, 12, 13, 14}) | ||
|
||
config = TaskDefinitionConfig{ | ||
ContainerNamePattern: "^container.*$", | ||
MetricsPorts: "a;b; c ;d;e ", | ||
} | ||
|
||
config.init() | ||
assert.NotNil(t, config.containerNameRegex) | ||
assert.True(t, config.containerNameRegex.MatchString("container12")) | ||
assert.False(t, config.containerNameRegex.MatchString("acontainer12")) | ||
assert.Nil(t, config.metricsPortList) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a list?