-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swarm: Add update/rollback order #30261
Conversation
Still depends on swarmkit PR |
bc1f8c2
to
71b5f61
Compare
Rebased. This is still waiting for the swarmkit PR, but I've updated it to match a change in the swarmkit PR. An additional option called |
71b5f61
to
1e4820a
Compare
Rebased and updated to support the rollback case. Still waiting for the swarmkit PR. |
1e4820a
to
b748c60
Compare
Swarmkit PR was merged Updated this one to bring it in line, such as This should finally be ready to go now. |
Looks like CI is failing @aaronlehmann |
CI will be fixed when #31714 is merged. Until then, vendoring swarmkit is blocked. Let's try to make that happen. |
nit: Underscores vs dashes. I believe in the CLI we favor dashes (e.g. |
@aluzzardi: I think you're right about dashes vs. underscores. @thaJeztah can you please confirm? |
Oh, correct, yes. And, actually; the same is used for the API; https://github.com/docker/docker/blob/b36ce6f2f6cfdee4d376b58137dde2bc20fc4312/api/swagger.yaml#L304-L313 |
b748c60
to
72d1e5f
Compare
Updated to use |
cli/command/service/opts.go
Outdated
@@ -494,6 +496,8 @@ func addServiceFlags(flags *pflag.FlagSet, opts *serviceOptions) { | |||
flags.StringVar(&opts.update.onFailure, flagUpdateFailureAction, "pause", `Action on update failure ("pause"|"continue"|"rollback")`) | |||
flags.Var(&opts.update.maxFailureRatio, flagUpdateMaxFailureRatio, "Failure rate to tolerate during an update") | |||
flags.SetAnnotation(flagUpdateMaxFailureRatio, "version", []string{"1.25"}) | |||
flags.StringVar(&opts.update.order, flagUpdateOrder, "start-first", `Update order ("start-first"|"stop-first")`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should be stop-first
. stop-first
is usually safe as tasks would not compete for resources, like CPU/mem/TCP ports, while start-first
is not.
https://github.com/docker/swarmkit/blob/master/cmd/swarmctl/service/flagparser/flags.go#L39
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that was a mistake.
72d1e5f
to
b836795
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
@@ -74,6 +74,7 @@ Options: | |||
--rollback-max-failure-ratio float Failure rate to tolerate during a rollback | |||
--rollback-monitor duration Duration after each task rollback to monitor for failure | |||
(ns|us|ms|s|m|h) (default 0s) | |||
--rollback-order string Rollback order ("start-first"|"stop-first") (default "stop-first") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stating default in service update
command is a little misleading. If you don't specify the flag, it keeps existing value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, however the other --update-*
flags have the same problem.
--update-delay duration Delay between updates (ns|us|ms|s|m|h) (default 0s)
--update-failure-action string Action on update failure ("pause"|"continue"|"rollback") (default "pause")
--update-monitor duration Duration after each task update to monitor for failure (ns|us|ms|s|m|h)
--update-parallelism uint Maximum number of tasks updated simultaneously (0 to update all at once) (default 1)
Let's deal with this in a separate PR. Would you mind filing an issue?
@@ -175,6 +175,116 @@ func (s *DockerSwarmSuite) TestAPISwarmServicesUpdate(c *check.C) { | |||
map[string]int{image1: instances}) | |||
} | |||
|
|||
func (s *DockerSwarmSuite) TestAPISwarmServicesUpdateStartFirst(c *check.C) { | |||
const nodeCount = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeCount
is not used here.
I've tried to rebuild rebuild/windowsRS1 several times and it always fails. Is this a known issue? |
ping @docker/core-swarmkit-maintainers Any chance of getting this in before the freeze? |
I am not a maintainer, but it LGTM. |
LGTM ping @aluzzardi |
daemon/cluster/convert/service.go
Outdated
case types.UpdateOrderStartFirst: | ||
converted.Order = swarmapi.UpdateConfig_START_FIRST | ||
default: | ||
return nil, fmt.Errorf("unrecongized update order %s", updateConfig.Order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo unrecongized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (IANAM)
This parameter controls the order of operations when rolling out an update task. Either the old task is stopped before starting the new one, or the new task is started first, and the running tasks will briefly overlap. This commit adds Rollout to the API, and --update-order / --rollback-order flags to the CLI. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
428fe0b
to
6763641
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Update order is a new parameter to control the order of operations when rolling out an updated task. Either the old task is stopped before starting the new one, or the new task is started first, and the running tasks will briefly overlap.
Previously,
stop_first
was the only supported behavior. The behavior ofstart_first
could be emulated by temporarily scaling up the service, so for awhile we hesitated to add it. However, there are some cases where it is awkward to change the number of replicas in a service just to support temporary overlap during updates. It doesn't work for some new features we have in mind like automatic preemption.This PR adds
Order
to the API, and--update-order
/--rollback-order
flags to the CLI. It adds an integration test that performs a rolling update in the newstart_first
mode.cc @aluzzardi @dongluochen