Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more stable canary deploys in Singularity #2192

Merged
merged 29 commits into from
Apr 20, 2021
Merged

Conversation

ssalinas
Copy link
Member

Current Mechanism:

  • New Deploy POSTed to API
  • Singularity sees this and creates new tasks for:
    • Incremental Deploy
      • incremental count of tasks spun up
      • wait for new tasks to pass health check
      • add new tasks to load balancer + take equivalent number of old tasks out
      • poll for lb completion
      • mark old tasks as cleaning + shut down
      • repeat above steps until all new tasks are launched (currently has an optional hook to wait for api call or wait set amount of time between 'steps' of the deploy)
    • Regular Deploy
      • All tasks spun up at once
      • Wait for all to pass health check
      • If load balanced add to load balancer (encompasses removal of old tasks as well in this lb update)
      • Poll for load balancer completion
      • Mark old tasks as cleaning + shut them down

New system:
Ideally we make the lb completion a bit more of a separate piece of state per task. Right now the fact that it can be encompassed by a deploy vs individual task update makes the state polling rather confusing. Individual state updates there would allow us to be a bit more granular or even operate on only the lb state for an individual task

Some new options I'd add to the deploy api:

  • canary count - similar to the incremental deploy step size. e.g. deploy 1 at a time, 2 at a time, etc. Except, this can be deploy {count}, accept -> deploy rest
  • canary acceptance trigger -> let the movement from canary be triggered by either an api call or just by a time
  • canary cycles - allows the canary phase to run for X tasks at a time for this many times. (e.g. replacing the existing incremental deploy function)

I see the deploy flow as follows:

  • New POST to API
  • If not atomic swap
    • spin up canary task
    • wait for health check pass
    • if load balanced add singular canary to lb
    • wait for time or api call or configured checks to proceed
  • spin up all new tasks
  • wait for health check pass
  • add to lb
  • poll for lb complete
  • wait for acceptance phase api call or time trigger

So overall, the phases of the deploy look like:
Launch -> Health check -> LB Update -> Acceptance ->Finish and Clean Up
with the option of running this for a canary set of tasks before running for all

Large-ish differences to call out:

  • This would deprecate the existing incrementalDeploy functionality, but allow things to specify a canary count + number of cycles to get similar behavior (backwards compatible via api)
  • Incremental deploy pause is made into a more formal acceptance phase, will plan to add an extendable interface here for checks to run during that phase possibly
  • atomic swap is now an option, but ideally eventually not the default. This could possibly have ramifications for other rollouts that include things like LB changes during deploy time (i.e. should we reject lb updates for things running a canary since we don'y know whether to use the old or new lb config?)

@ssalinas ssalinas changed the title Support more stable canary deploys in Singularity WIP - Support more stable canary deploys in Singularity Mar 19, 2021
@ssalinas
Copy link
Member Author

Progress so far:

  • New model for canary settings
  • extracted a bunch of SingularityDeployChecker out into other methods/helpers so that it's actually a bit more readable
  • Moved all of the SingularityDeploy object usages to not have to check optional as often, making all of the logic simpler
  • ^ same for SingularityDeployProgress
  • moved to storing lb update progress and associated tasks in the SingularityDeployProgress object

Still TODO:

  • Actually use the new canary settings properly
  • make the difference in lb updates/health checks between atomicSwap vs not
  • implement extendable interface for checks that can run between deploy groups
  • Actually make all the tests pass...

@ssalinas
Copy link
Member Author

Updates here:

  • Refactoring of deploy checker now includes:
    • Adding a spot for post-deploy-step hooks to run. This is the cause of all the extra deployChecker.checkDeploys() calls in the tests. It takes one additional cycle to complete the deploy based on the hooks. Might try to optimize that later
    • Uses new canary settings object instead of the old individual incremental deploy settings fields
  • Current functionality is an extension of the old version, default behavior is still the same

Next TODOs:

  • actually implement the extendable hooks and state management for each (failure messages, etc)
  • update to respect new canary cycle count field

@ssalinas
Copy link
Member Author

Implementation of customizable hooks now added with a no-op implementation in tets. Still TODO tomorrow:

  • wiring in SingualrityService class for custom hooks
  • canary cycle count
  • unit tests for customizable hooks
  • wire acceptance results to failure messages

@ssalinas ssalinas changed the title WIP - Support more stable canary deploys in Singularity Support more stable canary deploys in Singularity Mar 26, 2021
@ssalinas ssalinas added the staging Merged to staging branch label Apr 7, 2021
@ssalinas
Copy link
Member Author

ssalinas commented Apr 7, 2021

Working nicely in staging on first few runs. Still need to add docs on this new stuff here

@ssalinas
Copy link
Member Author

ssalinas commented Apr 8, 2021

Additional TODO on here:

  • Add unit test to ensure we can add old deploy tasks back to LB if acceptance checks fail for atomic style deploy

@ssalinas
Copy link
Member Author

ssalinas commented Apr 8, 2021

Other TODO, (maybe configurably?) disallow canary deploy when there is custom LB property change

@pschoenfelder
Copy link
Contributor

🚢

@ssalinas ssalinas merged commit 22829dd into master Apr 20, 2021
@ssalinas ssalinas deleted the deploy_phases branch April 20, 2021 17:19
@ssalinas ssalinas added this to the 1.5.0 milestone May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
staging Merged to staging branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants