Propagating Container Instance and Task Tags to Task Metadata endpoint #1720

linkar-ec2 · 2018-12-05T18:15:03Z

Summary

Makes Container Instance and Task Tags available to Task Metadata v3 endpoint on a different URL (/taskWithTags)

Implementation details

-Updates the ECS API to get API calls for Tags.
-Passes container instance ARN to metadata setup to retrieve tags through ECS API calls
-Adds ECS Client to Docker Task State for ECS API calls
-Increments Functional Test timeout by 2 minutes
-Modifies V3TaskMetadataValidator image to accept argument to check Tags for functional test

Testing

Builds on Linux (make release)
Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
Unit tests on Linux (make test) pass
Unit tests on Windows (go test -timeout=25s ./agent/...) pass
Integration tests on Linux (make run-integ-tests) pass
Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
Functional tests on Linux (make run-functional-tests) pass
Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes:

Description for the changelog

Progagating Container Instance and Task Tags to Task Metadata endpoint /taskWithTags

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

yumex93 · 2018-12-05T20:30:08Z

agent/ecs_client/model/api/docs-2.json

@@ -3,36 +3,42 @@
  "service": "<p>Amazon Elastic Container Service (Amazon ECS) is a highly scalable, fast, container management service that makes it easy to run, stop, and manage Docker containers on a cluster. You can host your cluster on a serverless infrastructure that is managed by Amazon ECS by launching your services or tasks using the Fargate launch type. For more control, you can host your tasks on a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances that you manage by using the EC2 launch type. For more information about launch types, see <a href=\"http://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_types.html\">Amazon ECS Launch Types</a>.</p> <p>Amazon ECS lets you launch and stop container-based applications with simple API calls, allows you to get the state of your cluster from a centralized service, and gives you access to many familiar Amazon EC2 features.</p> <p>You can use Amazon ECS to schedule the placement of containers across your cluster based on your resource needs, isolation policies, and availability requirements. Amazon ECS eliminates the need for you to operate your own cluster management and configuration management systems or worry about scaling your management infrastructure.</p>",
  "operations": {
    "CreateCluster": "<p>Creates a new Amazon ECS cluster. By default, your account receives a <code>default</code> cluster when you launch your first container instance. However, you can create your own cluster with a unique name with the <code>CreateCluster</code> action.</p> <note> <p>When you call the <a>CreateCluster</a> API operation, Amazon ECS attempts to create the service-linked role for your account so that required resources in other AWS services can be managed on your behalf. However, if the IAM user that makes the call does not have permissions to create the service-linked role, it is not created. For more information, see <a href=\"http://docs.aws.amazon.com/AmazonECS/latest/developerguide/using-service-linked-roles.html\">Using Service-Linked Roles for Amazon ECS</a> in the <i>Amazon Elastic Container Service Developer Guide</i>.</p> </note>",
-    "CreateService": "<p>Runs and maintains a desired number of tasks from a specified task definition. If the number of tasks running in a service drops below <code>desiredCount</code>, Amazon ECS spawns another copy of the task in the specified cluster. To update an existing service, see <a>UpdateService</a>.</p> <p>In addition to maintaining the desired count of tasks in your service, you can optionally run your service behind a load balancer. The load balancer distributes traffic across the tasks that are associated with the service. For more information, see <a href=\"http://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-load-balancing.html\">Service Load Balancing</a> in the <i>Amazon Elastic Container Service Developer Guide</i>.</p> <p>You can optionally specify a deployment configuration for your service. During a deployment, the service scheduler uses the <code>minimumHealthyPercent</code> and <code>maximumPercent</code> parameters to determine the deployment strategy. The deployment is triggered by changing the task definition or the desired count of a service with an <a>UpdateService</a> operation.</p> <p>The <code>minimumHealthyPercent</code> represents a lower limit on the number of your service's tasks that must remain in the <code>RUNNING</code> state during a deployment, as a percentage of the <code>desiredCount</code> (rounded up to the nearest integer). This parameter enables you to deploy without using additional cluster capacity. For example, if your service has a <code>desiredCount</code> of four tasks and a <code>minimumHealthyPercent</code> of 50%, the scheduler can stop two existing tasks to free up cluster capacity before starting two new tasks. Tasks for services that <i>do not</i> use a load balancer are considered healthy if they are in the <code>RUNNING</code> state. Tasks for services that <i>do</i> use a load balancer are considered healthy if they are in the <code>RUNNING</code> state and the container instance they are hosted on is reported as healthy by the load balancer. The default value for a replica service for <code>minimumHealthyPercent</code> is 50% in the console and 100% for the AWS CLI, the AWS SDKs, and the APIs. The default value for a daemon service for <code>minimumHealthyPercent</code> is 0% for the AWS CLI, the AWS SDKs, and the APIs and 50% for the console.</p> <p>The <code>maximumPercent</code> parameter represents an upper limit on the number of your service's tasks that are allowed in the <code>RUNNING</code> or <code>PENDING</code> state during a deployment, as a percentage of the <code>desiredCount</code> (rounded down to the nearest integer). This parameter enables you to define the deployment batch size. For example, if your replica service has a <code>desiredCount</code> of four tasks and a <code>maximumPercent</code> value of 200%, the scheduler can start four new tasks before stopping the four older tasks (provided that the cluster resources required to do this are available). The default value for a replica service for <code>maximumPercent</code> is 200%. If you are using a daemon service type, the <code>maximumPercent</code> should remain at 100%, which is the default value.</p> <p>When the service scheduler launches new tasks, it determines task placement in your cluster using the following logic:</p> <ul> <li> <p>Determine which of the container instances in your cluster can support your service's task definition (for example, they have the required CPU, memory, ports, and container instance attributes).</p> </li> <li> <p>By default, the service scheduler attempts to balance tasks across Availability Zones in this manner (although you can choose a different placement strategy) with the <code>placementStrategy</code> parameter):</p> <ul> <li> <p>Sort the valid container instances, giving priority to instances that have the fewest number of running tasks for this service in their respective Availability Zone. For example, if zone A has one running service task and zones B and C each have zero, valid container instances in either zone B or C are considered optimal for placement.</p> </li> <li> <p>Place the new service task on a valid container instance in an optimal Availability Zone (based on the previous steps), favoring container instances with the fewest number of running tasks for this service.</p> </li> </ul> </li> </ul>",
+    "CreateService": "<p>Runs and maintains a desired number of tasks from a specified task definition. If the number of tasks running in a service drops below <code>desiredCount</code>, Amazon ECS spawns another copy of the task in the specified cluster. To update an existing service, see <a>UpdateService</a>.</p> <p>In addition to maintaining the desired count of tasks in your service, you can optionally run your service behind a load balancer. The load balancer distributes traffic across the tasks that are associated with the service. For more information, see <a href=\"http://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-load-balancing.html\">Service Load Balancing</a> in the <i>Amazon Elastic Container Service Developer Guide</i>.</p> <p>You can optionally specify a deployment configuration for your service. The deployment is triggered by changing properties, such as the task definition or the desired count of a service, with an <a>UpdateService</a> operation.</p> <p>If a service is using the <code>ECS</code> deployment controller, the <b>minimum healthy percent</b> represents a lower limit on the number of tasks in a service that must remain in the <code>RUNNING</code> state during a deployment, as a percentage of the desired number of tasks (rounded up to the nearest integer), and while any container instances are in the <code>DRAINING</code> state if the service contains tasks using the EC2 launch type. This parameter enables you to deploy without using additional cluster capacity. For example, if your service has a desired number of four tasks and a minimum healthy percent of 50%, the scheduler may stop two existing tasks to free up cluster capacity before starting two new tasks. Tasks for services that <i>do not</i> use a load balancer are considered healthy if they are in the <code>RUNNING</code> state; tasks for services that <i>do</i> use a load balancer are considered healthy if they are in the <code>RUNNING</code> state and they are reported as healthy by the load balancer. The default value for minimum healthy percent is 100%.</p> <p>If a service is using the <code>ECS</code> deployment controller, the <b>maximum percent</b> parameter represents an upper limit on the number of tasks in a service that are allowed in the <code>RUNNING</code> or <code>PENDING</code> state during a deployment, as a percentage of the desired number of tasks (rounded down to the nearest integer), and while any container instances are in the <code>DRAINING</code> state if the service contains tasks using the EC2 launch type. This parameter enables you to define the deployment batch size. For example, if your service has a desired number of four tasks and a maximum percent value of 200%, the scheduler may start four new tasks before stopping the four older tasks (provided that the cluster resources required to do this are available). The default value for maximum percent is 200%.</p> <p>If a service is using the <code>CODE_DEPLOY</code> deployment controller and tasks that use the EC2 launch type, the <b>minimum healthy percent</b> and <b>maximum percent</b> values are only used to define the lower and upper limit on the number of the tasks in the service that remain in the <code>RUNNING</code> state while the container instances are in the <code>DRAINING</code> state. If the tasks in the service use the Fargate launch type, the minimum healthy percent and maximum percent values are not used, although they are currently visible when describing your service.</p> <p>Tasks for services that <i>do not</i> use a load balancer are considered healthy if they are in the <code>RUNNING</code> state. Tasks for services that <i>do</i> use a load balancer are considered healthy if they are in the <code>RUNNING</code> state and the container instance they are hosted on is reported as healthy by the load balancer. The default value for a replica service for <code>minimumHealthyPercent</code> is 100%. The default value for a daemon service for <code>minimumHealthyPercent</code> is 0%.</p> <p>When the service scheduler launches new tasks, it determines task placement in your cluster using the following logic:</p> <ul> <li> <p>Determine which of the container instances in your cluster can support your service's task definition (for example, they have the required CPU, memory, ports, and container instance attributes).</p> </li> <li> <p>By default, the service scheduler attempts to balance tasks across Availability Zones in this manner (although you can choose a different placement strategy) with the <code>placementStrategy</code> parameter):</p> <ul> <li> <p>Sort the valid container instances, giving priority to instances that have the fewest number of running tasks for this service in their respective Availability Zone. For example, if zone A has one running service task and zones B and C each have zero, valid container instances in either zone B or C are considered optimal for placement.</p> </li> <li> <p>Place the new service task on a valid container instance in an optimal Availability Zone (based on the previous steps), favoring container instances with the fewest number of running tasks for this service.</p> </li> </ul> </li> </ul>",


This file includes some changes not related to ur pr. From my stand point of view, if your change needs the update of docs-2.json and api-2.json, you should only keep those updates related to ur change.

I've updated with only Tags related items needed for the PR

agent/engine/dockerstate/docker_task_engine_state.go

yumex93 · 2018-12-05T21:08:18Z

agent/handlers/task_server_setup.go

 	muxRouter.HandleFunc(v3.ContainerMetadataPath, v3.ContainerMetadataHandler(state))
-	muxRouter.HandleFunc(v3.TaskMetadataPath, v3.TaskMetadataHandler(state, cluster, availabilityZone))
+	muxRouter.HandleFunc(v3.TaskMetadataPath, v3.TaskMetadataHandler(state, cluster, availabilityZone, containerInstanceArn, false))
+	muxRouter.HandleFunc(v3.TaskWithTagsMetadataPath, v3.TaskMetadataHandler(state, cluster, availabilityZone, containerInstanceArn, true))


why we only expose tags through v3 metadata endpoint?

v2 is a subset of v3 but only support awsvpc mode. I've updated v2 endpoint to include a /v2/metadataWithTags endpoint

misc/v3-task-endpoint-validator/v3-task-endpoint-validator.go

yumex93 · 2018-12-05T22:56:53Z

agent/functional_tests/tests/functionaltests_unix_test.go

@@ -474,6 +474,10 @@ func TestV3TaskEndpointHostNetworkMode(t *testing.T) {
 	testV3TaskEndpoint(t, "v3-task-endpoint-validator", "v3-task-endpoint-validator", "host", "ecs-functional-tests-v3-task-endpoint-validator")
 }

+func TestV3TaskEndpointTags(t *testing.T) {


Since you also add the V2 endpoint, test cases need to be added accordingly.

I've edited the V2 functional test to check for Tags now

I think there is value in testing v2 metadata without long arn format too. I would suggest adding a new test for tags.

TestTaskMetadataValidator (below) checks the v2 endpoint. Tags are checked in the image itself. For both, we need long arn format because Tags require it. Not having long Arn will throw an error from the ECS API and be reported in Agent logs.

haikuoliu · 2018-12-06T20:07:34Z

agent/app/agent.go

@@ -556,6 +556,7 @@ func (agent *ecsAgent) startAsyncRoutines(
 	statsEngine := stats.NewDockerStatsEngine(agent.cfg, agent.dockerClient, containerChangeEventStream)

 	// Start serving the endpoint to fetch IAM Role credentials and other task metadata
+	state.SetECSClient(client)


this should be added before // Start serving the endpoint to fetch IAM Role credentials and other task metadata

haikuoliu · 2018-12-06T21:21:51Z

agent/functional_tests/tests/functionaltests_test.go

+	networkMode := "host"
+	awslogsPrefix := "ecs-functional-tests-v3-task-endpoint-validator"
+	agentOptions := &AgentOptions{
+		EnableTaskENI: true,


why task eni needs to be enabled?

I don't think it does. Removed.

haikuoliu · 2018-12-07T06:23:11Z

Makefile

@@ -181,7 +181,7 @@ test-in-docker:
 	docker run --net=none -v "$(PWD):/go/src/github.com/aws/amazon-ecs-agent" --privileged "amazon/amazon-ecs-agent-test:make"

 run-functional-tests: testnnp test-registry ecr-execution-role-image telemetry-test-image
-	. ./scripts/shared_env && go test -tags functional -timeout=30m -v ./agent/functional_tests/...
+	. ./scripts/shared_env && go test -tags functional -timeout=32m -v ./agent/functional_tests/...


any reason change it to 32m?

Added a functional test and we seemed to already be close to the 30m timeout. Tests sometimes time out on the TestTaskIPCNamespaceSharing (which takes ~57 seconds). Timeout panic is not a problem after bumping the timeout to 32m

haikuoliu · 2018-12-07T07:12:35Z

agent/engine/dockerstate/docker_task_engine_state.go

@@ -102,6 +108,8 @@ type DockerTaskEngineState struct {
 	ipToTask               map[string]string // ip address -> task arn
 	v3EndpointIDToTask     map[string]string // container's v3 endpoint id -> taskarn
 	v3EndpointIDToDockerID map[string]string // container's v3 endpoint id -> DockerId
+
+	client api.ECSClient


it's kind of wired to me to put the ecs client in TaskEngineState, it's nothing to do with task state other than simply provide a ecs client, do you think it would be better to initialize it in task metadata server?

I agree it seems out of place. But the task metadata server is not a struct and is initialized through static methods. The alternatives here would be:

pass client as a parameter through the chain of calls just like state

initialize a global variable in an init() function within task metadata server

Because we already pass state through the chain of calls, I thought it best to place the client within the state, where it can be maintained.

@haikuoliu Refactored to pass client as a parameter through call chain.

haikuoliu · 2018-12-07T08:01:06Z

agent/handlers/v2/task_container_metadata_handler.go

@@ -31,6 +31,9 @@ const (
 	// TaskMetadataPath specifies the relative URI path for serving task metadata.
 	TaskMetadataPath = "/v2/metadata"

+	// TaskWithTagsMetadataPath specifies the relative URI path for serving task metadata with Container Instance and Task Tags.
+	TaskWithTagsMetadataPath = "/v2/metadataWithTags"


Just to confirm, did you discuss this with akram?

Brought it up and he deferred to our team. v2 and v3 metadata responses are built the same way. To be consistent, it is better to provide the Tags in both responses.

haikuoliu

Some minor comments

haikuoliu · 2018-12-10T18:46:53Z

agent/functional_tests/tests/functionaltests_test.go

+	_, err := ECS.PutAccountSetting(&putAccountSettingInput)
+	assert.NoError(t, err)
+
+	awslogsPrefix := "ecs-functional-tests-v3-task-endpoint-validator"


I think we need to add tags to the aws logs prefix to differentiate the regular task endpoint test

haikuoliu · 2018-12-10T18:57:43Z

agent/handlers/task_server_setup.go

+	muxRouter.HandleFunc(v2.ContainerMetadataPath, v2.TaskContainerMetadataHandler(state, ecsClient, cluster, availabilityZone, containerInstanceArn, false))
+	muxRouter.HandleFunc(v2.TaskMetadataPath, v2.TaskContainerMetadataHandler(state, ecsClient, cluster, availabilityZone, containerInstanceArn, false))
+	muxRouter.HandleFunc(v2.TaskWithTagsMetadataPath, v2.TaskContainerMetadataHandler(state, ecsClient, cluster, availabilityZone, containerInstanceArn, true))
+	muxRouter.HandleFunc(v2.TaskMetadataPathWithSlash, v2.TaskContainerMetadataHandler(state, ecsClient, cluster, availabilityZone, containerInstanceArn, false))


To keep consistency, please add the slash case like what we did for TaskMetadataPathWithSlash

haikuoliu · 2018-12-10T19:01:53Z

agent/handlers/v3/task_metadata_handler.go

@@ -29,9 +30,10 @@ const v3EndpointIDMuxName = "v3EndpointIDMuxName"

 // TaskMetadataPath specifies the relative URI path for serving task metadata.
 var TaskMetadataPath = "/v3/" + utils.ConstructMuxVar(v3EndpointIDMuxName, utils.AnythingButSlashRegEx) + "/task"
+var TaskWithTagsMetadataPath = "/v3/" + utils.ConstructMuxVar(v3EndpointIDMuxName, utils.AnythingButSlashRegEx) + "/taskWithTags"


Can you add a comment for this line, I think exported var/method in golang needs comment

haikuoliu · 2018-12-10T19:09:02Z

agent/handlers/v2/response.go

+			resp.ContainerInstanceTags[*tag.Key] = *tag.Value
+		}
+	} else {
+		seelog.Errorf("Could not get container instance tags for %s: %s", containerInstanceArn, err.Error())


If we are unable to retrieve tags, we'd better return an error instead of failing silently.

You can check the tags in the tags handler and return error if it doesn't exist.

There are 2 calls to ListTagsForResource and either could fail based on customer setting for long-arn. I think the correct behavior is to fail silently

haikuoliu · 2018-12-10T19:20:15Z

misc/v3-task-endpoint-validator/v3-task-endpoint-validator.go

@@ -186,6 +187,9 @@ func verifyTaskMetadataResponse(taskMetadataRawMsg json.RawMessage) error {
 	}

 	taskExpectedFieldNotEmptyArray := []string{"TaskARN", "Family", "Revision", "PullStartedAt", "PullStoppedAt", "Containers", "AvailabilityZone"}
+	if checkContainerInstanceTags {
+		taskExpectedFieldNotEmptyArray = append(taskExpectedFieldNotEmptyArray, "ContainerInstanceTags")


Can we also check task tags here?

Checking Task Tags are a bit tricky since you can only assign them after task is created (task ARN assigned), at which point the validator would already be executing.

haikuoliu

LGTM

…t; Bumping functional test timeout from 30m to 32m

linkar-ec2 requested a review from a team December 5, 2018 18:15

yumex93 reviewed Dec 5, 2018

View reviewed changes

agent/engine/dockerstate/docker_task_engine_state.go Outdated Show resolved Hide resolved

yumex93 reviewed Dec 5, 2018

View reviewed changes

misc/v3-task-endpoint-validator/v3-task-endpoint-validator.go Show resolved Hide resolved

yumex93 reviewed Dec 5, 2018

View reviewed changes

yumex93 approved these changes Dec 5, 2018

View reviewed changes

linkar-ec2 added the bot/test label Dec 6, 2018

linkar-ec2 changed the title ~~Progagating Container Instance and Task Tags to Task Metadata endpoint~~ Propagating Container Instance and Task Tags to Task Metadata endpoint Dec 6, 2018

haikuoliu reviewed Dec 7, 2018

View reviewed changes

linkar-ec2 force-pushed the dev branch from 5a35ec8 to 303d61e Compare December 7, 2018 18:36

linkar-ec2 added bot/test and removed bot/test labels Dec 7, 2018

linkar-ec2 force-pushed the dev branch 2 times, most recently from 01561b0 to 94ae028 Compare December 7, 2018 19:46

linkar-ec2 added bot/test and removed bot/test labels Dec 8, 2018

linkar-ec2 closed this Dec 9, 2018

linkar-ec2 reopened this Dec 9, 2018

haikuoliu reviewed Dec 10, 2018

View reviewed changes

haikuoliu approved these changes Dec 10, 2018

View reviewed changes

Progagating Container Instance and Task Tags to Task Metadata endpoin…

070b8ca

…t; Bumping functional test timeout from 30m to 32m

linkar-ec2 force-pushed the dev branch from f1278c7 to 070b8ca Compare December 10, 2018 23:20

linkar-ec2 added bot/test and removed bot/test labels Dec 11, 2018

linkar-ec2 merged commit 473dac4 into aws:dev Dec 11, 2018

yumex93 added this to the 1.24.0 milestone Jan 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagating Container Instance and Task Tags to Task Metadata endpoint #1720

Propagating Container Instance and Task Tags to Task Metadata endpoint #1720

linkar-ec2 commented Dec 5, 2018 •

edited

Loading

yumex93 Dec 5, 2018

linkar-ec2 Dec 5, 2018

yumex93 Dec 5, 2018

linkar-ec2 Dec 5, 2018

yumex93 Dec 5, 2018

linkar-ec2 Dec 6, 2018

sharanyad Dec 8, 2018

linkar-ec2 Dec 8, 2018 •

edited

Loading

haikuoliu Dec 6, 2018

linkar-ec2 Dec 7, 2018

haikuoliu Dec 6, 2018

linkar-ec2 Dec 7, 2018

haikuoliu Dec 7, 2018

linkar-ec2 Dec 7, 2018

haikuoliu Dec 7, 2018

linkar-ec2 Dec 7, 2018 •

edited

Loading

linkar-ec2 Dec 7, 2018

haikuoliu Dec 7, 2018

linkar-ec2 Dec 7, 2018

haikuoliu left a comment

haikuoliu Dec 10, 2018

linkar-ec2 Dec 10, 2018

haikuoliu Dec 10, 2018

linkar-ec2 Dec 10, 2018

haikuoliu Dec 10, 2018

linkar-ec2 Dec 10, 2018

haikuoliu Dec 10, 2018

linkar-ec2 Dec 10, 2018

haikuoliu Dec 10, 2018

linkar-ec2 Dec 10, 2018

haikuoliu left a comment

Propagating Container Instance and Task Tags to Task Metadata endpoint #1720

Propagating Container Instance and Task Tags to Task Metadata endpoint #1720

Conversation

linkar-ec2 commented Dec 5, 2018 • edited Loading

Summary

Implementation details

Testing

Description for the changelog

Licensing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linkar-ec2 Dec 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linkar-ec2 Dec 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haikuoliu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haikuoliu left a comment

Choose a reason for hiding this comment

linkar-ec2 commented Dec 5, 2018 •

edited

Loading

linkar-ec2 Dec 8, 2018 •

edited

Loading

linkar-ec2 Dec 7, 2018 •

edited

Loading