Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade go docker client #987

Merged
merged 5 commits into from
Oct 12, 2017
Merged

Conversation

richardpen
Copy link

@richardpen richardpen commented Sep 20, 2017

Summary

Upgrade go docker client

Implementation details

The only change here is that added buffer for the channel to receive docker events. As go docker client changed the behavior that if the listener buffer is full the event won't be sent. See related code here

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes:

Description for the changelog

Licensing

This contribution is under the terms of the Apache 2.0 License:

@@ -39,7 +39,8 @@ import (
)

const (
dockerDefaultTag = "latest"
dockerDefaultTag = "latest"
containerEventsBufferSize = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed? Please add a description in either the commit message or as inline documentation. Doesn't this open up the possibility of the ECS Agent processing multiple events from docker simultaneously? Have you analyzed the impact of the same on the container and task engines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is added in the Summary, I will add the same in the commit message.
I don't think ECS Agent can process multiple events simultaneously, based on the code the events will be handled serially. But I could be wrong, please correct me if that's not true.

Have you analyzed the impact of the same on the container and task engines?

Can you elaborate here, I don't think this will have impact on the container or task engine since the behavior of handling this doesn't change. The event will still be handled serially in task engine.

@@ -39,7 +39,8 @@ import (
)

const (
dockerDefaultTag = "latest"
dockerDefaultTag = "latest"
containerEventsBufferSize = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 1000 ?
Could you please add more details here ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 is picked up due to the maximum number of containers can be ran on an instance is ~1000. Let me know if you have better ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not accurate. There's a limit of 1023 containers using the bridge network mode, but there is no limit of containers using host or none.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note/comment is here going to be useful.
It would nice to calculate this value rather than using a fixed constant imho.

@richardpen richardpen force-pushed the upgrade-go-docker-client branch from 9deeed2 to 5fed2e5 Compare September 21, 2017 18:41
@richardpen
Copy link
Author

@aaithal @samuelkarp @vsiddharth I have changed this to the same behavior as before, where the go routine that was used to send events was moved from go docker client to agent. Let me know if you have other suggestions.

@@ -712,13 +715,24 @@ func metadataFromContainer(dockerContainer *docker.Container) DockerContainerMet
return metadata
}

// CopyDockerEventsWithoutBlocking copies the event from input to output without blocking the input
func CopyDockerEventsWithoutBlocking(in chan *docker.APIEvents, out chan *docker.APIEvents) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this doesn't block. This just means that you have an effective buffer of 2*containerEventsBufferSize and if that's full, you'll still block...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the out is full, the read of in won't be block, as it simply reads and creates a go routine to write. In this case these go routines will be blocked until out isn't full, but go docker client can keeps writing to in so that agent won't miss any event.

In rare case, if the process to create the go routine slows down, which means producer is much faster than consumer, the in could be full, that's why I had a buffer size of 100 for in. This is the best way I can think of, please suggest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that you're creating a goroutine for each event. I think this is not a good idea; we'll have goroutine explosion and will lose any ordering that was provided by go-dockerclient.

You can implement a non-blocking, infinite (subject to how much memory you have) buffer through appending to a slice. However, I think the more important question here is to determine what the correct behavior should be.

  • Do we rely on every single event being delivered? We have logic that tolerates event loss and reconciles with explicit inspects.
  • What happens when we lose an event? Our reconciliation runs every 10 minutes, so we could have a 10-minute window of inconsistency.
  • Do we need every event? No, there's only a small subset of events that we pay attention to at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is not a good idea, but this is the same as how the agent works today. Previously the go routine to send event was created in go docker client code, but now they are created in the agent code.

Currently, the go docker client doesn't grantee the order of the event sent, so I think agent have the tolerance for the order of the event.

Do we rely on every single event being delivered? We have logic that tolerates event loss and reconciles with explicit inspects.
What happens when we lose an event? Our reconciliation runs every 10 minutes, so we could have a 10-minute window of inconsistency.

Ten minutes is really a big gap, we probably don't want the task status change report have a 10 minutes delay.

Do we need every event? No, there's only a small subset of events that we pay attention to at all.

We don't need all these events, but the agent has to do the analysis, currently go docker client doesn't support the filter.

I will try to implement a non-blocking infinite buffer to solve the goroutine explosion problem while keeping the order of the event. Any concerns or suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to filter here so we can discard early. It also might make more sense to do analysis of expected event volume under high load to determine if we can figure out an appropriate buffer size.

@richardpen richardpen force-pushed the upgrade-go-docker-client branch from 5fed2e5 to c5ed3ff Compare September 29, 2017 17:59
@richardpen
Copy link
Author

@aaithal @samuelkarp @vsiddharth I did some test with ~800 containers start and stop, no event will miss with the buffer size 100, while with 1 there will still be some events missing. Also from the benchmark test, docker can produce the event with at most 60/second. Please let me know if you have other concerns.

@aaithal
Copy link
Contributor

aaithal commented Sep 29, 2017

@richardpen Awesome! Thanks for doing that. Can you post the details of your experiments here (scripts if you have some) so that we can reproduce/rerun these tests in the future? My concern still is that 60 events per second might be the max tps that docker daemon is hitting today, but it can change with improvements made to that code base. It's not a good place to be constatnly worry about the buffer size that we have with each new release of docker. We should ensure that we have a better reconciliation story for missing events altogether, because at some point of time we will miss these events.

@richardpen
Copy link
Author

@aithal, I agree with you that this still depends on the tps of docker events, and this isn't the only factor that impact the behavior. It will also be impacted by the speed of write to a channel(go docker client write event) and read from the channel and create a go routine(agent read the channel and create a go routine to record the event). I'm not sure how better we can do, but this should unblock us for now to upgrade the go docker client. We can probably have some other options like make the reconciliation interval configurable in the future or move to use other client tools?

The test I did for benchmarking docker events spike is launching short-life cycle containers on a high performance instance, and calculate the output of docker events. eg:

#!/bin/bash

for i in {1..1000};
do
    docker run -d --rm ubuntu bash sleep 5s &
done

The test with ecs-agent is try launch 1000 short-life cycle containers(with sleep 1m) on high performance instance, and count the event sent by go docker client and the event received by the agent.

@aaithal
Copy link
Contributor

aaithal commented Oct 2, 2017

@richardpen Thanks. I'd be a lot more comfortable with this change if we reduced our steady state polling/reconciliation time to 5m instead of the 10m that it's currently set to. I'd even say that something like 2m or 3m should be fine. But, at the very least, we should reduce it to 5m so that the worst case latency of missing an event and reporting the same to the backend is reduced by half of what its set to today.

containerTypeEvent = "container"
)

var ignoredDockerEvents = []string{"rename", "kill", "destroy", "pause", "exec_create", "exec_start", "top", "attach", "export", "pull", "push", "tag", "untag", "import", "delete"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, we should filter events by specifying type=container parameter for /events API. We should add support for this in the vendored library here. I understand that this may require a new PR and if you want address this separately, please add a TODO for that.

If you're doing a TODO with this PR, please reformat this list to be the reverse of what it is right now. That is, let's have a list of not ignored Docker events and accept only that as this list might/will grow over time. And, make it so that each list entry is added in a new line:

var events = []strint{
 "event1",
 "event2",
 ...
}

}

// Serve starts reading from the input channel and writes to the buffer
func (buffer *InfiniteBuffer) Serve(events chan *docker.APIEvents) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The very common Serve() method from http is a blocking call and it might be confusing for us to have a method named Serve that's not blocking. Can you rename this to StartListener() or something?

}

// Write writes the event into the buffer
func (buffer *InfiniteBuffer) Write(event *docker.APIEvents) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyEvents might be a better name here?

type InfiniteBuffer struct {
events []*docker.APIEvents
empty bool
waitDone chan struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a sync.Waitgroup instead for this? It makes it simpler to understand the code.

richardpen added 2 commits October 6, 2017 21:48
@richardpen richardpen force-pushed the upgrade-go-docker-client branch from 854b23f to b09465f Compare October 9, 2017 17:58
Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. Can you set this up on an instance and run a stress workload and make sure that the state transition latencies do not go up?

@richardpen
Copy link
Author

From the benchmark test, it shows the latencies was reduced and the failure rate wasn't increased. Any other concerns?

Copy link
Contributor

@vsiddharth vsiddharth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@@ -42,6 +42,8 @@ const (
dockerDefaultTag = "latest"
// imageNameFormat is the name of a image may look like: repo:tag
imageNameFormat = "%s:%s"
// the buffer size will ensure agent doesn't miss any event from docker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dockerEventBufferSize is used to ensure ....

@richardpen richardpen merged commit 1045da5 into aws:dev Oct 12, 2017
@richardpen richardpen deleted the upgrade-go-docker-client branch November 21, 2017 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants