Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch container state change events #867

Merged
merged 6 commits into from
Aug 9, 2017
Merged

Conversation

adnxn
Copy link
Contributor

@adnxn adnxn commented Jun 27, 2017

Summary

This change updates the model for the ecs client and includes the corresponding code changes to SubmitTaskStateChange in the api package. Secondly we've added the batching logic to eventhandler.

Implementation details

Two main points of interest:
In agent/eventhandler/task_handler.go, as ContainerStateChange events are propagated up, they are collected and bucketed by task arn. Then on a task state transition, we attach the the collection of ContainerStateChange events to the TaskStateChange payload that is sent to the the backend.

In agent/api/ecsclient/client.go, we made the changes required to reflect the model changes and also added logic to attach the container state change message payloads to task state change message.

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes:

Description for the changelog

Licensing

This contribution is under the terms of the Apache 2.0 License:
yes

trimmed := change.Reason[0:ecsMaxReasonLength]
statechange.Reason = &trimmed
} else {
statechange.Reason = &change.Reason

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use aws.String instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

}
}
stat := change.Status.String()
if stat == "DEAD" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure tbh. the check was in the original SubmitContainerStateChange code. although api.ContainerStatus doesnt have a "DEAD" state, so we may be good to remove this. @aaithal any ideas?

stat = "STOPPED"
}

if stat != "STOPPED" && stat != "RUNNING" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can compare directly with api.ContainerStopped and api.ContainerRunning without converting the status to string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

log.Info("Not submitting not supported upstream container state", "state", stat)
return nil
}
statechange.Status = &stat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, please use aws.String instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

statechange.Status = &stat

if change.ExitCode != nil {
exitCode := int64(*change.ExitCode)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, you can use aws.IntValue instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

bindIP := binding.BindIP
protocol := binding.Protocol.String()
networkBindings[i] = &ecs.NetworkBinding{
BindIP: &bindIP,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, use aws.String or aws.Int64 instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -47,13 +47,16 @@ type TaskHandler struct {
tasksToEvents map[string]*eventList
// tasksToEventsLock for locking the map
tasksToEventsLock sync.RWMutex

batchMap map[string][]api.ContainerStateChange

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

return nil

default:
return errors.New("eventhandler: unable to determine event type from state change event")
}
}

// batchContainerEvent collects container state change events for a given task arn
func (handler *TaskHandler) batchContainerEvent(event api.ContainerStateChange) {
handler.batchMap[event.TaskArn] = append(handler.batchMap[event.TaskArn], event)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need a lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure if this needs a lock. the batchMap data is not shared across goroutines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we guarantee that handler.AddStateChangeEvent is not called across goroutines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm. that's true. i'll make this change.

// flushBatch attaches the task arn's container events to TaskStateChange event that
// is being submittied to the backend
func (handler *TaskHandler) flushBatch(event *api.TaskStateChange) {
event.Containers = append(event.Containers, handler.batchMap[event.TaskArn]...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, this needs a lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

@adnxn adnxn force-pushed the batch-state-change-events branch from 7dade26 to 9750ac7 Compare June 28, 2017 20:53
@@ -279,6 +279,51 @@ func (client *APIECSClient) getCustomAttributes() []*ecs.Attribute {
return attributes
}

func (client *APIECSClient) buildContainerStateChangePayload(change api.ContainerStateChange) *ecs.ContainerStateChange {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you move this function below SubitTaskStateChange? As a general rule, I try to organize the code such that caller is above callee.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

return nil

default:
return errors.New("eventhandler: unable to determine event type from state change event")
}
}

// batchContainerEvent collects container state change events for a given task arn
func (handler *TaskHandler) batchContainerEvent(event api.ContainerStateChange) {
handler.batchMap[event.TaskArn] = append(handler.batchMap[event.TaskArn], event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we guarantee that handler.AddStateChangeEvent is not called across goroutines.

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly looks good. I have a bunch of minor comments.

status := change.Status

if status != api.ContainerStopped && status != api.ContainerRunning {
log.Info("Not submitting not supported upstream container state", "state", status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use seelog instead here?

statechange.Status = aws.String(status.String())

if change.ExitCode != nil {
exitCode := int64(*change.ExitCode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use aws.Int64Val instead

@@ -70,6 +70,9 @@ func New(p client.ConfigProvider, cfgs ...*aws.Config) *ECS {

// newClient creates, initializes and returns a new service client instance.
func newClient(cfg aws.Config, handlers request.Handlers, endpoint, signingRegion, signingName string) *ECS {
if len(signingName) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick to signingName == "" since this is a string type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the nil case?

@@ -47,13 +47,18 @@ type TaskHandler struct {
tasksToEvents map[string]*eventList
// tasksToEventsLock for locking the map
tasksToEventsLock sync.RWMutex
// batchMap is used to collect container events
// between task transitions
batchMap map[string][]api.ContainerStateChange
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this as tasksToContainerStates ? If you do that, you can rename the lock as well

return nil

default:
return errors.New("eventhandler: unable to determine event type from state change event")
}
}

// batchContainerEvent collects container state change events for a given task arn
func (handler *TaskHandler) batchContainerEvent(event api.ContainerStateChange) {
handler.batchMapLock.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use defer for unlocking:

handler.batchMapLock.Lock()
defer handler.batchMapLock.Unlock()

handler.batchMap[event.TaskArn] = append(handler.batchMap[event.TaskArn], event)

// flushBatch attaches the task arn's container events to TaskStateChange event that
// is being submittied to the backend
func (handler *TaskHandler) flushBatch(event *api.TaskStateChange) {
handler.batchMapLock.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, use defer for unlocking

if err != nil {
log.Warn("Could not submit a task state change", "err", err)
return err
}

// WIP LOG payload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When are you getting rid of this?

ContainerName: aws.String(change.ContainerName),
}

if change.Reason != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the nil case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so change api.ContainerStateChange is a struct value and can't be nil. at worst it may have its fields zeroed if no value is specified.

@@ -70,6 +70,9 @@ func New(p client.ConfigProvider, cfgs ...*aws.Config) *ECS {

// newClient creates, initializes and returns a new service client instance.
func newClient(cfg aws.Config, handlers request.Handlers, endpoint, signingRegion, signingName string) *ECS {
if len(signingName) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the nil case?

This change updates the model for the ecs client and includes the
corresponding code changes to SubmitTaskStateChange. Secondly we've
added the batching logic to eventhandler, which collects container state
change events as the generated and then attaches the collection of
events to the SubmitTaskStateChange payload as dictated but the model
changes.
This change updates the eventhandler tests to reflect the batching
changes.
This change updates existing test coverage. We also add a lock around
the map that aggregates container state change events. There is also a
minor fix to a test failure that was causing a nil pointer panic in the
agent/stats package.
@adnxn adnxn force-pushed the batch-state-change-events branch 3 times, most recently from a474c3b to 900e02c Compare July 25, 2017 20:03
This commit includes several minor changes to address reviewer comments
@adnxn adnxn force-pushed the batch-state-change-events branch from 900e02c to 19b2238 Compare July 25, 2017 20:43
@adnxn adnxn changed the title batch container state change events - WIP batch container state change events Jul 26, 2017
status := change.Status

if status != api.ContainerStopped && status != api.ContainerRunning {
seelog.Info("Not submitting not supported upstream container state", "state", status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be logged as a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

// tasksToContainerStates is used to collect container events
// between task transitions
tasksToContainerStates map[string][]api.ContainerStateChange
tasksToContainerStatesLock sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch to using a single lock for this struct? Want to avoid the repeat of "deadly embrace" because of multiple locks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do. and yup 😅

containerEvents[i] = client.buildContainerStateChangePayload(containerEvent)
}

req.Containers = containerEvents

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we combined several event into one, it may be good to log the actual events here, so that we can know which statechange was actually been submitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. i've added info level logging in agent/eventhandler/task_handler.go to capture this info. we log other information about which events are submitted there as well.

@adnxn adnxn force-pushed the batch-state-change-events branch from d8d0646 to 6a096a4 Compare July 28, 2017 14:11
@adnxn adnxn force-pushed the batch-state-change-events branch from 6a096a4 to 26478c1 Compare July 28, 2017 14:27
status := change.Status

if status != api.ContainerStopped && status != api.ContainerRunning {
seelog.Warn("Not submitting unsupported upstream container state", "state", status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use seelog.Warnf here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity - why one over the other? the existing code seems to use seelog.Warn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the formatted string kind because it gives you control over the output format and not having to deal with whatever golang thinks the stringified version of an object should look like.

client.EXPECT().SubmitContainerStateChange(sortaRedundantCont.(api.ContainerStateChange)).Do(func(interface{}) { wait.Done() })
client.EXPECT().SubmitTaskStateChange(notReplacedTask.(api.TaskStateChange)).Do(func(interface{}) { wait.Done() })
client.EXPECT().SubmitTaskStateChange(sortaRedundantTask.(api.TaskStateChange)).Do(func(interface{}) { wait.Done() })
time.Sleep(1 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the sleep here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was using it to ensure ordering. ill change it to use waitgroups.

@adnxn adnxn force-pushed the batch-state-change-events branch from 3fe6f58 to 38d5a99 Compare July 28, 2017 22:46
status := change.Status

if status != api.ContainerStopped && status != api.ContainerRunning {
seelog.Warnf("Not submitting unsupported upstream container state: %v", status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also log details of the task here? It's not much use when debugging if that's not logged. Also, please use ".. : %s", status.String()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed formatting. also - we are capturing details about both task and container events a layer above in agent/eventhandler/task_handler.go as they are passed to the client. did you mean we should capture debug logs here as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, when you're printing a warning/error, it's useful to have as much info as possible about what led to that in the message itself (because earlier lines might get lost because of other async routines getting in the way)

@@ -70,6 +70,9 @@ func New(p client.ConfigProvider, cfgs ...*aws.Config) *ECS {

// newClient creates, initializes and returns a new service client instance.
func newClient(cfg aws.Config, handlers request.Handlers, endpoint, signingRegion, signingName string) *ECS {
if signingName == "" {
signingName = "ecs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a const (or, you can reuse ServiceName)

handler.taskHandlerLock.Lock()
defer handler.taskHandlerLock.Unlock()

seelog.Info("TaskHandler, batching container event :", event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Infof here.

@adnxn adnxn force-pushed the batch-state-change-events branch from 38d5a99 to a41ff4f Compare August 3, 2017 16:51
This commit also contains changes to log formatting to address code
review comments.
@adnxn adnxn force-pushed the batch-state-change-events branch from a41ff4f to 211d4ba Compare August 3, 2017 17:07
Copy link

@richardpen richardpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, after this you should work on fix the leak in #798.

@adnxn adnxn merged commit 211d4ba into aws:dev Aug 9, 2017
adnxn added a commit that referenced this pull request Aug 9, 2017
This implements the changes required to batch container state
change events and include them with the task state change payload
jahkeup added a commit to jahkeup/amazon-ecs-agent that referenced this pull request Aug 11, 2017
@jhaynes jhaynes mentioned this pull request Aug 22, 2017
@samuelkarp samuelkarp added this to the 1.14.4 milestone Aug 22, 2017
@adnxn adnxn deleted the batch-state-change-events branch March 14, 2018 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants