-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade go docker client #987
Conversation
@@ -39,7 +39,8 @@ import ( | |||
) | |||
|
|||
const ( | |||
dockerDefaultTag = "latest" | |||
dockerDefaultTag = "latest" | |||
containerEventsBufferSize = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this changed? Please add a description in either the commit message or as inline documentation. Doesn't this open up the possibility of the ECS Agent processing multiple events from docker simultaneously? Have you analyzed the impact of the same on the container and task engines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description is added in the Summary, I will add the same in the commit message.
I don't think ECS Agent can process multiple events simultaneously, based on the code the events will be handled serially. But I could be wrong, please correct me if that's not true.
Have you analyzed the impact of the same on the container and task engines?
Can you elaborate here, I don't think this will have impact on the container or task engine since the behavior of handling this doesn't change. The event will still be handled serially in task engine.
@@ -39,7 +39,8 @@ import ( | |||
) | |||
|
|||
const ( | |||
dockerDefaultTag = "latest" | |||
dockerDefaultTag = "latest" | |||
containerEventsBufferSize = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 1000
?
Could you please add more details here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1000 is picked up due to the maximum number of containers can be ran on an instance is ~1000. Let me know if you have better ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not accurate. There's a limit of 1023 containers using the bridge
network mode, but there is no limit of containers using host
or none
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note/comment
is here going to be useful.
It would nice to calculate this value rather than using a fixed constant imho.
9deeed2
to
5fed2e5
Compare
@aaithal @samuelkarp @vsiddharth I have changed this to the same behavior as before, where the go routine that was used to send events was moved from go docker client to agent. Let me know if you have other suggestions. |
@@ -712,13 +715,24 @@ func metadataFromContainer(dockerContainer *docker.Container) DockerContainerMet | |||
return metadata | |||
} | |||
|
|||
// CopyDockerEventsWithoutBlocking copies the event from input to output without blocking the input | |||
func CopyDockerEventsWithoutBlocking(in chan *docker.APIEvents, out chan *docker.APIEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this doesn't block. This just means that you have an effective buffer of 2*containerEventsBufferSize
and if that's full, you'll still block...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the out
is full, the read of in
won't be block, as it simply reads and creates a go routine to write. In this case these go routines will be blocked until out
isn't full, but go docker client can keeps writing to in
so that agent won't miss any event.
In rare case, if the process to create the go routine slows down, which means producer is much faster than consumer, the in
could be full, that's why I had a buffer size of 100 for in
. This is the best way I can think of, please suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that you're creating a goroutine for each event. I think this is not a good idea; we'll have goroutine explosion and will lose any ordering that was provided by go-dockerclient.
You can implement a non-blocking, infinite (subject to how much memory you have) buffer through appending to a slice. However, I think the more important question here is to determine what the correct behavior should be.
- Do we rely on every single event being delivered? We have logic that tolerates event loss and reconciles with explicit
inspect
s. - What happens when we lose an event? Our reconciliation runs every 10 minutes, so we could have a 10-minute window of inconsistency.
- Do we need every event? No, there's only a small subset of events that we pay attention to at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is not a good idea, but this is the same as how the agent works today. Previously the go routine to send event was created in go docker client code, but now they are created in the agent code.
Currently, the go docker client doesn't grantee the order of the event sent, so I think agent have the tolerance for the order of the event.
Do we rely on every single event being delivered? We have logic that tolerates event loss and reconciles with explicit inspects.
What happens when we lose an event? Our reconciliation runs every 10 minutes, so we could have a 10-minute window of inconsistency.
Ten minutes is really a big gap, we probably don't want the task status change report have a 10 minutes delay.
Do we need every event? No, there's only a small subset of events that we pay attention to at all.
We don't need all these events, but the agent has to do the analysis, currently go docker client doesn't support the filter.
I will try to implement a non-blocking infinite buffer to solve the goroutine explosion problem while keeping the order of the event. Any concerns or suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to filter here so we can discard early. It also might make more sense to do analysis of expected event volume under high load to determine if we can figure out an appropriate buffer size.
5fed2e5
to
c5ed3ff
Compare
@aaithal @samuelkarp @vsiddharth I did some test with ~800 containers start and stop, no event will miss with the buffer size 100, while with 1 there will still be some events missing. Also from the benchmark test, docker can produce the event with at most 60/second. Please let me know if you have other concerns. |
@richardpen Awesome! Thanks for doing that. Can you post the details of your experiments here (scripts if you have some) so that we can reproduce/rerun these tests in the future? My concern still is that 60 events per second might be the max tps that docker daemon is hitting today, but it can change with improvements made to that code base. It's not a good place to be constatnly worry about the buffer size that we have with each new release of docker. We should ensure that we have a better reconciliation story for missing events altogether, because at some point of time we will miss these events. |
@aithal, I agree with you that this still depends on the tps of docker events, and this isn't the only factor that impact the behavior. It will also be impacted by the speed of The test I did for benchmarking docker events spike is launching short-life cycle containers on a high performance instance, and calculate the output of docker events. eg:
The test with ecs-agent is try launch 1000 short-life cycle containers(with sleep 1m) on high performance instance, and count the event sent by go docker client and the event received by the agent. |
@richardpen Thanks. I'd be a lot more comfortable with this change if we reduced our steady state polling/reconciliation time to |
agent/engine/docker_events_buffer.go
Outdated
containerTypeEvent = "container" | ||
) | ||
|
||
var ignoredDockerEvents = []string{"rename", "kill", "destroy", "pause", "exec_create", "exec_start", "top", "attach", "export", "pull", "push", "tag", "untag", "import", "delete"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, we should filter events by specifying type=container
parameter for /events
API. We should add support for this in the vendored library here. I understand that this may require a new PR and if you want address this separately, please add a TODO for that.
If you're doing a TODO with this PR, please reformat this list to be the reverse of what it is right now. That is, let's have a list of not ignored Docker events and accept only that as this list might/will grow over time. And, make it so that each list entry is added in a new line:
var events = []strint{
"event1",
"event2",
...
}
agent/engine/docker_events_buffer.go
Outdated
} | ||
|
||
// Serve starts reading from the input channel and writes to the buffer | ||
func (buffer *InfiniteBuffer) Serve(events chan *docker.APIEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The very common Serve()
method from http
is a blocking call and it might be confusing for us to have a method named Serve
that's not blocking. Can you rename this to StartListener()
or something?
agent/engine/docker_events_buffer.go
Outdated
} | ||
|
||
// Write writes the event into the buffer | ||
func (buffer *InfiniteBuffer) Write(event *docker.APIEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CopyEvents
might be a better name here?
agent/engine/docker_events_buffer.go
Outdated
type InfiniteBuffer struct { | ||
events []*docker.APIEvents | ||
empty bool | ||
waitDone chan struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a sync.Waitgroup
instead for this? It makes it simpler to understand the code.
The docker events filter part has been moved to docker_events_buffer
854b23f
to
b09465f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. Can you set this up on an instance and run a stress workload and make sure that the state transition latencies do not go up?
From the benchmark test, it shows the latencies was reduced and the failure rate wasn't increased. Any other concerns? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -42,6 +42,8 @@ const ( | |||
dockerDefaultTag = "latest" | |||
// imageNameFormat is the name of a image may look like: repo:tag | |||
imageNameFormat = "%s:%s" | |||
// the buffer size will ensure agent doesn't miss any event from docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dockerEventBufferSize
is used to ensure ....
Summary
Upgrade go docker client
Implementation details
The only change here is that added buffer for the channel to receive docker events. As go docker client changed the behavior that if the listener buffer is full the event won't be sent. See related code here
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes:
Description for the changelog
Licensing
This contribution is under the terms of the Apache 2.0 License: