Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: configure restart policy for networking pause container #15732

Merged
merged 1 commit into from
Jan 10, 2023

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Jan 9, 2023

This PR modifies the configuration of the networking pause contaier to include the "unless-stopped" restart policy. The pause container should always be restored into a running state until Nomad itself issues a stop command for the container.

This is not a perfect fix for #12216 but it should cover the 99% use case - where a pause container gets accidentally killed / exits for some reason. There is still a possibility where the pause container and main task container are stopped and started in the order where the bad behavior persists, but this is fundamentally unavoidable due to how docker itself abstracts and manages the underlying network namespace referenced by the containers.

Closes #12216

@shoenig shoenig force-pushed the b-pause-container-restart-policy branch from 1e2d74b to e90d3cb Compare January 9, 2023 19:40
This PR modifies the configuration of the networking pause contaier to include
the "unless-stopped" restart policy. The pause container should always be
restored into a running state until Nomad itself issues a stop command for the
container.

This is not a _perfect_ fix for #12216 but it should cover the 99% use case -
where a pause container gets accidently stopped / killed for some reason. There
is still a possibility where the pause container and main task container are
stopped and started in the order where the bad behavior persists, but this is
fundamentally unavoidable due to how docker itself abstracts and manages the
underlying network namespace referenced by the containers.

Closes #12216
@shoenig shoenig force-pushed the b-pause-container-restart-policy branch from e90d3cb to fe9bd3f Compare January 9, 2023 19:46
@shoenig shoenig added this to the 1.4.x milestone Jan 9, 2023
@shoenig shoenig added backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line labels Jan 9, 2023
@shoenig shoenig marked this pull request as ready for review January 9, 2023 20:06
Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! It seems like this should have always been like this 😅

@shoenig shoenig merged commit cd48910 into main Jan 10, 2023
@shoenig shoenig deleted the b-pause-container-restart-policy branch January 10, 2023 13:50
philrenaud pushed a commit that referenced this pull request Jan 23, 2023
This PR modifies the configuration of the networking pause contaier to include
the "unless-stopped" restart policy. The pause container should always be
restored into a running state until Nomad itself issues a stop command for the
container.

This is not a _perfect_ fix for #12216 but it should cover the 99% use case -
where a pause container gets accidently stopped / killed for some reason. There
is still a possibility where the pause container and main task container are
stopped and started in the order where the bad behavior persists, but this is
fundamentally unavoidable due to how docker itself abstracts and manages the
underlying network namespace referenced by the containers.

Closes #12216
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failed pause containers result in failed task restarts and orphaned tasks
2 participants