Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize ENI ack timer if needed upon restart #2219

Merged
merged 1 commit into from
Sep 25, 2019

Conversation

fenxiong
Copy link
Contributor

@fenxiong fenxiong commented Sep 23, 2019

Summary

Fix #2193. Previously, the ack timer of each ENI attachment is not initialized upon agent restarts. If agent is stopped and started after an ENI attachment is added to state but before it's acked, we will end up with one of these: (1) If the attachment has not expired, the agent will ack the attachment status, and will crash on NPE while attempting to stop the ack timer after the ack (which is likely what happened in #2193 (comment)); (2) If the attachment has expired, the ENI attachment is left in agent state (since the task will not start in that case, we won't be removing it during task cleanup). This commit fixes the two situations by attempting to initialize the ack timer upon restart, so that - (1) if the attachment has not expired, the ack timer will be initialized so that we won't get NPE when acking it; (2) if the attachment has expired, it gets removed from agent state.

Implementation details

This is implemented in a similar way as how we deal with uninitialized fields for task resource. Add an Initialize method on ENI attachment which initializes the ack timer, and call it in engine.synchronizeState which is called upon agent restarts.

Testing

New tests cover the changes: yes

Added unit tests. Manually tested the following situations:

  • Situation 1: Stop agent after ENI attachment is added to state, but before it's acked; restart agent before ENI attachment has expired.
    Before fix: agent panics on nil pointer reference when attempting to ack ENI attached status; After fix: agent successfully acks ENI attached status.

  • Situation 2: Stop agent after ENI attachment is added to state, but before it's acked; restart agent after ENI attachment has expired.
    Before fix: ENI ack timer is not started, but ENI attachment is left in agent state; After fix: ENI ack timer is not started, and ENI attachment is removed from state after agent starts.

  • Situation 3: Stop agent after ENI attachment is added to state and acked; restart agent.
    Before fix: ENI ack timer is not started, and ENI attachment is removed during task cleanup; After fix: we already had expected behavior in this case, verified that this is maintained.

Description for the changelog

Fixed a bug where the agent might crash if it's restarted right after launching a task in awsvpc network mode #2219

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Previously, the ack timer of each ENI attachment is not initialized upon agent restarts. If agent is stopped and started after an ENI attachment is added to state but before it's acked, we will end up with one of these: (1) If the attachment has not expired, the agent crashes on NPE while attempting to stop the ack timer after acking the attachment; (2) If the attachment has expired, the ENI attachment is left in agent state (since the task will not start in that case, we won't be removing it during task cleanup). This commit fixes the two situations by attempting to initialize the ack timer upon restart, so that - (1) if the attachment has not expired, the ack timer will be initialized so that we won't get NPE when acking it; (2) if the attachment has expired, it gets removed from agent state.
@fenxiong fenxiong requested a review from a team September 24, 2019 18:54
@fenxiong fenxiong merged commit ecb3ac1 into aws:dev Sep 25, 2019
@fenxiong fenxiong added this to the 1.32.0 milestone Sep 25, 2019
@fenxiong fenxiong mentioned this pull request Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants