-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add restart tracker to ecs-agent module #4162
Conversation
|
||
// RestartPolicy represents a policy that contains key information considered when | ||
// deciding whether or not a container should be restarted after it has exited. | ||
type RestartPolicy struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we move this into a separate file ecs-agent/api/container/restart/policy.go for clearer separation, and consider renaming the files from this PR to ecs-agent/api/container/restart/tracker.go and ecs-agent/api/container/restart/tracker_test.go for more concise naming?
rt.lock.RLock() | ||
defer rt.lock.RUnlock() | ||
rt.RestartCount++ | ||
rt.LastStartedAt = startedAt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the churn and not calling this out earlier:
Based off what was decided during internal design, this field should be named LastRestartAt
(not LastStartedAt
) and RecordRestart
method should not take in the started at time for the container from the container runtime, right? Otherwise this seems to assume that the container runtime of the calling Agent always updates the started at time for the container when a container is restarted. Per our previous internal discussions, such assumptions about the container runtime of the calling Agent should not be made as the code should be agnostic of architectural/implementation details of any calling Agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can rename the field, but the reason this is an argument to RecordRestart
is to ensure we call this method with the correct time value. LastStartedAt
will likely be passed into ShouldRestart
after the first restart of a container, so we want to make sure that is set when the container would normally get to RUNNING
, not necessarily the same time we would record a restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that started at time of a container will be updated by the underlying container runtime of the calling Agent. Based off previous internal discussions and the fact this code should be agnostic of the calling Agent (and its underlying container runtime), is that a safe assumption to make?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, removed time argument to RecordRestart
in favor of time.Now
and adjusted unit tests to round time checks to check within a millisecond of accuracy
|
||
func (rt *RestartTracker) GetLastStartedAt() time.Time { | ||
rt.lock.RLock() | ||
defer rt.lock.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why do we have a lock here? Will rt
be accessed by more than 1 thread at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, for agents that implement multiple asynchronous containers that independently restart we may get simultaneous access. I believe it is also common coding practice in ecs-agent
RestartAttemptPeriod: 60 * time.Second, | ||
}) | ||
assert.Equal(t, 0, rt.RestartCount) | ||
for i := 1; i < 1000; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think doing this 1000 times is necessary - if I'm not mistaken, this means that this loop would repeatedly execute across all test runs without a need to. Can we factor out this for loop?
|
||
// After restarting, we should inform restart decisions with LastRestartedAt instead of the passed in startedAt time. | ||
rt.RecordRestart() | ||
shouldRestart, reason = rt.ShouldRestart(&exitCode, time.Now().Add(-61*time.Second), apicontainerstatus.ContainerRunning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Best practice would be to save time.Now().Add(-61*time.Second)
in a startedAt variable, similar to what you have done for TestShouldRestart
.
Summary
This change adds a new restart module to /api/container/. The restart tracker contains a configurable policy and some helper methods. There is an accompanying test file unit testing the tracker.
Implementation details
Restart Tracker
RecordRestart
andShouldRestart
to make a restart decision based on passed in container metadataShouldRestart
.ContainerRestartPolicy
as implemented in the ecsacs container module into the restart trackerTesting
This component has not been plugged in yet, but all written tests pass.
New tests cover the changes: yes
Description for the changelog
Add restart module
Does this PR include breaking model changes? If so, Have you added transformation functions?
no
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.