Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zero downtime for upgrades #2140

Closed
pcrespov opened this issue Feb 9, 2021 · 1 comment
Closed

zero downtime for upgrades #2140

pcrespov opened this issue Feb 9, 2021 · 1 comment
Assignees
Labels
a:infra+ops maintenance of infrastructure or operations (discussed in retro) bug buggy, it does not work as expected

Comments

@pcrespov
Copy link
Member

pcrespov commented Feb 9, 2021

  • New version is deployed.
  • The deployer updates the stack
  • The user gets interrupted because the service is down and sometimes gets a gateway error
    • Instead, the old service should not be shut-down until the new stack is in place

Possible cause:

The swarm is already configured to have zero downtime per service (i.e. a given service gets turned off ONLY when the new one is started). The problem might be that even if services are ready, the state between services is not ready. For example, the new webserver is updated correctly but traffik proxy has still not detected it. That would cause a wrong gateway failure on a front-end request

Ideas to solve this problem

  • Incorporate more conditions on the "healtcheck" validation function (e.g. traffik has discovered all backend services)
  • Might be to deploy the entire stack separately first, have a set of rules to validate (e.g. all services healthy, all services connected, traffik routings ready) and then switch.
  • Notify the frontend that a new version was deployed Front-end notifies of a new version available #2128
@pcrespov pcrespov added the a:infra+ops maintenance of infrastructure or operations (discussed in retro) label Feb 9, 2021
@pcrespov pcrespov assigned Surfict and unassigned pcrespov Feb 9, 2021
@pcrespov pcrespov added the bug buggy, it does not work as expected label Feb 9, 2021
@pcrespov
Copy link
Member Author

duplicated from #2212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:infra+ops maintenance of infrastructure or operations (discussed in retro) bug buggy, it does not work as expected
Projects
None yet
Development

No branches or pull requests

3 participants