-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Extension] Implement ECS Observer #1920
Conversation
cc: @mxiamxia for feedback |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@jrcamp please review this. |
Codecov Report
@@ Coverage Diff @@
## main #1920 +/- ##
==========================================
- Coverage 89.90% 89.24% -0.67%
==========================================
Files 380 389 +9
Lines 18335 18723 +388
==========================================
+ Hits 16484 16709 +225
- Misses 1385 1535 +150
- Partials 466 479 +13
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theRoughCode sorry for delayed review.
Can you follow https://github.com/open-telemetry/opentelemetry-collector/blob/master/CONTRIBUTING.md#how-to-structure-prs-to-get-expedient-reviews to break this up into more manageable pieces? Overall it's on a good track though.
I'd suggest doing a PR with just README.md changes first so we can iterate there first if needed to avoid unnecessary code churn.
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@theRoughCode friendly ping :) |
Hi @bogdandrutu @jrcamp sorry for not responding the comments. I am from @theRoughCode team at AWS, he has finished his internship and I got this task from @mxiamxia (prometheus service discovery using ECS API). Just want to have some clarification before I start working on it
|
@bogdandrutu @jrcamp apologies for not responding earlier, I've been super busy with work the past month and was hoping to get to this once stuff settled. But it looks like @pingleig is taking over, so I'll hand it over to him! @pingleig that's right, from our discussion a month back, we decided not to use the observer/creator framework for now until its performance is up to par with the "one receiver" method. I've added some documentation in this PR already that should help in drafting up the README for the initial PR that @jrcamp proposed. Feel free to ping me if you have any questions! |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
@pingleig do you plan to continue working on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pingleig README looks good overall just a couple notes. Can you start breaking it up based on https://github.com/open-telemetry/opentelemetry-collector/blob/master/CONTRIBUTING.md#how-to-structure-prs-to-get-expedient-reviews?
|-------------|--------------------------------------------------------------------| | ||
| type.task | `true` | | ||
| port | port number | | ||
| metricsPath | metrics path | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endpoint variables are usually of the form metrics_path
|
||
| Variable | Description | | ||
|-------------|--------------------------------------------------------------------| | ||
| type.task | `true` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would call this ecstask. The existing ones are kind of generic naming (e.g. port) but there's an issue to make them more type-specific.
| type.task | `true` | | ||
| port | port number | | ||
| metricsPath | metrics path | | ||
| labels | labels generated by the ECS Observer. Mainly used with Prometheus. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these docker labels? I think they'd probably be used in discovery rules as well like:
type.ecstask && labels["app"] == "redis"
See related comment below.
LaunchType: EC2 | ||
SubnetId: subnet-0347624eeea6c5969 | ||
TaskDefinitionFamily: demo-jar-ec2-bridge-dynamic-port-subset-b | ||
TaskGroup: family:demo-jar-ec2-bridge-dynamic-port-subset-b | ||
TaskRevision: "7" | ||
VpcId: vpc-033b021cd7ecbcedb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like these should be individual entries in the Endpoint struct so a user can use them easily in a discovery rule and are documented.
| Name | | Description | | ||
|---------------------|-------------|---------------------------------------------------------------| | ||
| task_definition_arn_pattern | Mandatory | Regex pattern to match against ECS task definition ARN | | ||
| metrics_ports | Mandatory | container ports separated by semicolon. Only containers that expose these ports will be discovered | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a list?
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@theRoughCode I think we can close this one as the content is being ported into other (smaller) PRs |
@pingleig sounds good! Thanks |
Description
This PR implements the ECS Observer that queries the ECS/EC2 API to discover scrape targets within a configured ECS cluster.
There are 2 modes to discover the targets based on the config:
Two modes can be enabled together and the ECS Observer will de-duplicate the discovered targets based on: {private_ip}:{port}/{metrics_path}
Implementation Overview
EndpointsListers:ListEndpoints
.TaskRetrievalProcessor
will query theECS:ListTasks
API to retrieve the list of current running ECS task ARNs for the specified ECS cluster, and each task will be retrieved usingECS:DescribeTasks
.TaskDefinitionProcessor
will get the ECS Task Definition from LRU cache or, if there is none in cache, callECS:DescribeTaskDefinition
and cache. LRU cache size (2000) based on ECS service quotaTaskFilterProcessor
will filter out tasks based on whether they match the configured docker label-based SD config or task definition ARN-based SD config.MetadataProcessor
will get the containerInstance/ec2 instance info from LRU cache if the tasks is running on EC2 launch type. LRU cache size (2000) based on ECS service quota. If not cached, callECS: DescribeContainerInstances
/EC2:DescribeInstances
with batch size = 100.sd_result_file
is configured, the list of discovered targets will be written to the configured file path. Otherwise,EndpointsListers:ListEndpoints
will return the list of discovered endpoints and all subscribers will be notified of the updated endpoints.Note: The main reason for enabling writing to a static file is because the current implementation of the observer/receiver creator framework is less efficient than using a single receiver to scrape from multiple targets. The related discussion can be found in #1395.
Link to tracking Issue: #1395
Testing:
Unit tests were added and ran successfully. A test setup in AWS ECS was also created to verify the implementation.
Note: Functions involving the ECS/EC2 API don't have unit tests. A next step would be to add mocked APIs to test these functions.
Documentation:
A new README doc was added for the
ecs_observer
, and theobserver
/receiver_creator
READMEs were modified.