Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better recovery #3709

Merged
merged 60 commits into from
Jul 22, 2024
Merged

Better recovery #3709

merged 60 commits into from
Jul 22, 2024

Commits on Jul 22, 2024

  1. Flatten the structure of machine updates

    The goal of this is to make recovering from individual machine update
    failures easier, so that the entire deployment can succeed
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    64db673 View commit details
    Browse the repository at this point in the history
  2. Add recovery to machine updates

    The really cool part about these changes are it didn't actually take
    much change to how machine updates work. All I really do is call
    updateMachines, but with our original state instead of the state that we
    initially wanted to go to.
    
    I made a bug fix to lease clearing, since in some edge cases we weren't correctly clearing them.
    
    I made a bug fix to machine waits, since we were sometimes causing an
    infinite loop from not giving the Wait function time to set the waitErr, whoops
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    6db4e43 View commit details
    Browse the repository at this point in the history
  3. Allow destroying machines

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    5aa0cef View commit details
    Browse the repository at this point in the history
  4. Update current state on every rollback attempt

    This was dumb to not do before, since obviously the state wouldn't be current
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    e333394 View commit details
    Browse the repository at this point in the history
  5. remove debug code

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    ef9acbe View commit details
    Browse the repository at this point in the history
  6. Don't recursively try to rollback

    We should only attempt the rollback functionality when we initially try
    to update machines, not on every rollback after that obviously
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    29c9a2b View commit details
    Browse the repository at this point in the history
  7. Disable deleting machines for now

    We need to avoid deleting unmanaged machines
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    81c5bad View commit details
    Browse the repository at this point in the history
  8. fix up the code to create and destroy machines

    wasn't testing that path correctly. also, i added a quick optimization
    to waitForMachineState to avoid an unnecessary API call (just checking
    the machine state right then and there)
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    cb96bd1 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    1934ec3 View commit details
    Browse the repository at this point in the history
  10. Check that configs are not equal before attempting an update

    Just another saved API call
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    af3beea View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e649eff View commit details
    Browse the repository at this point in the history
  12. Small changes to focus on pushing forward deploys

    The main change was just focusing on newAppState instead of the old. The
    rest is just cleaning up logging tbh
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    2df9f45 View commit details
    Browse the repository at this point in the history
  13. Fix waiting for multiple states

    If any one /wait finishes, we should just return
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    33d1571 View commit details
    Browse the repository at this point in the history
  14. Run machine based tests

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    d0855b0 View commit details
    Browse the repository at this point in the history
  15. Return on some unrecoverable errors + cache health check results

    The more interesting bit is caching health checks. If we're trying to
    push forward,there's no point in wasting bandcwidth always retrying
    health checks. If a machine passes them once, they're good forever
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    870bf35 View commit details
    Browse the repository at this point in the history
  16. quick lint

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    393d9ff View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    aedf269 View commit details
    Browse the repository at this point in the history
  18. more linting

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    b9605f0 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    eda2604 View commit details
    Browse the repository at this point in the history
  20. Smoke checks

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    2826af9 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    2972b27 View commit details
    Browse the repository at this point in the history
  22. smoke checks

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    76f7f3d View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    138e2d9 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    6b526af View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    3148eda View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    ce1b30f View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    a484615 View commit details
    Browse the repository at this point in the history
  28. move lease collection out of restartMachines

    I prefer that the updateMachines function controls it
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    14167c9 View commit details
    Browse the repository at this point in the history
  29. Correctly update machines

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    ba45439 View commit details
    Browse the repository at this point in the history
  30. respect skipLaunch

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    66dd69b View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    29c1dcd View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    9dfd3e2 View commit details
    Browse the repository at this point in the history
  33. Configuration menu
    Copy the full SHA
    038a367 View commit details
    Browse the repository at this point in the history
  34. some code cleanup

    this is much easier to read
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    a526558 View commit details
    Browse the repository at this point in the history
  35. correctly catch unrecoverable errors

    I wasn't correctly using errors.As. I also cleaned up catching context
    canceled errors
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    1e819d5 View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    e8e91d1 View commit details
    Browse the repository at this point in the history
  37. remove unnecessary error check

    We do that in md.updateMachine
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    008780a View commit details
    Browse the repository at this point in the history
  38. fix update by replace

    i forgot that lm isn't a pointer, so we need to check
    entry.leasableMachine for newly created machines
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    175aa53 View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    f4e03bc View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    9f79bea View commit details
    Browse the repository at this point in the history
  41. Add tests for some of plan.go

    This covers most of the major functions and the major places there could
    be issues
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    9009e4d View commit details
    Browse the repository at this point in the history
  42. Add tracing to plan.go

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    1482fd6 View commit details
    Browse the repository at this point in the history
  43. lint

    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    caade93 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    4abfdca View commit details
    Browse the repository at this point in the history
  45. a few more clarifying comments

    also make sure to print to stderr
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    02c11dd View commit details
    Browse the repository at this point in the history
  46. ensure that we use machinedeployment flapsClient

    also add some tests for updateorcreatemachine
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    f207dac View commit details
    Browse the repository at this point in the history
  47. Configuration menu
    Copy the full SHA
    299fcac View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    941b1f4 View commit details
    Browse the repository at this point in the history
  49. Fix TestUpdateMachines

    md.warnAboutListenAddress requires this function
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    0f4bcb9 View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    ee7829f View commit details
    Browse the repository at this point in the history
  51. Configuration menu
    Copy the full SHA
    b27654d View commit details
    Browse the repository at this point in the history
  52. Configuration menu
    Copy the full SHA
    c0266ef View commit details
    Browse the repository at this point in the history
  53. Configuration menu
    Copy the full SHA
    da18ca2 View commit details
    Browse the repository at this point in the history
  54. Configuration menu
    Copy the full SHA
    bf82b9d View commit details
    Browse the repository at this point in the history
  55. Respect lease timeouts

    Also refresh the lease in the background
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    95a3c94 View commit details
    Browse the repository at this point in the history
  56. remove code to start machine

    we can't start machines if we have a lease acquired
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    42c1964 View commit details
    Browse the repository at this point in the history
  57. Configuration menu
    Copy the full SHA
    4998e71 View commit details
    Browse the repository at this point in the history
  58. Configuration menu
    Copy the full SHA
    b251e56 View commit details
    Browse the repository at this point in the history
  59. set default for deploy-retries to 0

    i also added back the original deployment code, and use that if
    deploy-restries is set to 0. That way, we can more slowly roll this out
    without risking breaking user apps if there's some terrible bug
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    a717b49 View commit details
    Browse the repository at this point in the history
  60. get deploy-retries from launchdarkly by default

    Users can still set deploy-retries to whatever value they'd like, however.
    billyb2 committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    f5b3330 View commit details
    Browse the repository at this point in the history