Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move TCS Client to ecs-agent module #3726

Merged
merged 2 commits into from
Jun 10, 2023
Merged

Move TCS Client to ecs-agent module #3726

merged 2 commits into from
Jun 10, 2023

Conversation

Realmonia
Copy link
Contributor

@Realmonia Realmonia commented May 30, 2023

Summary

Move TCS Client to ecs-agent module, and switch to use wsclient in ecs-agent module.

Implementation details

  • Move agent/tcs/client and agent/tcs/model to be underecs-agent/tcs
  • Update go.mod in agent model accordingly
  • Update go.mod in ecs-agent model for necessary dependency (uuid) for generating uuid when making request to TACS
  • Switch to use ecs-agent/wsclient with updated interface; also remove statsEngine from tcs client creation as they are decoupled now
  • Update unit tests to use new tcs client

Testing

New tests cover the changes: Yes. Updated tests for new interface.

Github workflow passed.

Description for the changelog

Move TCS Client to ecs-agent module

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Realmonia Realmonia marked this pull request as ready for review May 30, 2023 23:03
@Realmonia Realmonia requested a review from a team as a code owner May 30, 2023 23:03
@Realmonia Realmonia changed the title Move TCS Client to ecs-agent module [WIP] Move TCS Client to ecs-agent module May 30, 2023
@Realmonia Realmonia changed the title [WIP] Move TCS Client to ecs-agent module Move TCS Client to ecs-agent module May 31, 2023
@@ -474,9 +474,8 @@ func (engine *DockerStatsEngine) publishMetrics(includeServiceConnectStats bool)
metricsMetadata, taskMetrics, metricsErr := engine.GetInstanceMetrics(includeServiceConnectStats)
if metricsErr == nil {
metricsMessage := ecstcs.TelemetryMessage{
Metadata: metricsMetadata,
TaskMetrics: taskMetrics,
IncludeServiceConnectStats: includeServiceConnectStats,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is includeServiceConnectStats value no more used here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good catch, I think I miss some change, I should have 1) removed this field from the telemetryMessage type as well as the condition here, or 2) keep use this value. I was planning (but failed) to do 1). I want to do 1) because:

  1. ServiceConnect is associated with task. Therefore, having or not having service connect metrics "should" be associated with each "TaskMetric", instead of a "TelemetryMessage", as "TelemetryMessage" contains all tasks on the instance. For Fargate it is a little bit different, since each Fargate instance corresponding to one agent corresponding to one task.
  2. EC2 agent collects service connect metrics once per minute, i.e. every 3 telemetryMessages should have 1 message contains service connect metrics. Therefore, when we call GetInstanceMetrics, we do need to pass in this parameter to actually try to collect service connect metrics. Whether or not a task has service connect metrics is a different story.
  3. On the other side of the channel, TCS client,
    1. when receiving the telemetry message, it calls publishMetrics code when either metrics is not disabled, or there is SC metrics, or both. However, when metrics are disabled, SC metrics are still being collected, which means in the case that metrics are disabled and there is SC metrics, getInstanceMetrics will collect those task metrics of which task has SC enabled, and task metrics will only have SC metrics here, and it will form a telemetryMessage. The only case, that the condition in channel receiver side skip the publish, is when both metrics is disabled and there is no SC metrics, there will not be a telemetry message at all here, so the condition can be removed. In fact, it would not even get to that line - the instance will be an idle instance here.
    2. when sending the metrics, it checks whether each taskMetric object contains with service connect metric by checking this. There is no need for this value. To be more specific, when IncludeServiceConnectStats is set to true, there might be some taskMetric in taskMetrics contains SC stats; when IncludeServiceConnectStats is set to false, there will be no taskMetric in taskMetrics contains SC stats. As a result, checking if SC stats present in taskMetric should be a simpler and direct way to achieve same goal.

By the way, currently we do use this includeServiceConnectStats parameter in TCSClient's metricsToPublishMetricRequests function under /agent, while Fargate's implementation of TCSClient does not contain it. Again it is looping through each taskMetric object and trying to copy its service connect stats, this flag does not provide any value.

Would you prefer 1) removed this field from the telemetryMessage type as well as the condition here, or 2) keep use this value ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer 1 as long as we have a clear change in our tests that reflects the code change.

Copy link
Contributor Author

@Realmonia Realmonia Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we have getInstance test changes verifying there is no empty telemetry message under all cases this and We also have SC integ test to verify cases where metrics are disabled this.

The missed part should no longer have a condition and therefore no specific test for that (we do not have a test for it before either after we moved to channel based stats engine, as the metrics disabled tests are moved to engine_test here and not able to catch the miss). We also have unit tests to test channel send/receive flow - this and this.

Is there any test you can think of that I should add?

@@ -474,9 +474,8 @@ func (engine *DockerStatsEngine) publishMetrics(includeServiceConnectStats bool)
metricsMetadata, taskMetrics, metricsErr := engine.GetInstanceMetrics(includeServiceConnectStats)
if metricsErr == nil {
metricsMessage := ecstcs.TelemetryMessage{
Metadata: metricsMetadata,
TaskMetrics: taskMetrics,
IncludeServiceConnectStats: includeServiceConnectStats,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

ecs-agent/tcs/client/client_test.go Show resolved Hide resolved
@Realmonia Realmonia merged commit ba48cc1 into aws:dev Jun 10, 2023
@prateekchaudhry prateekchaudhry mentioned this pull request Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants