-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add jobs support to CLI #2262
Add jobs support to CLI #2262
Conversation
fc6a2cd
to
d34f273
Compare
Added |
@dperny I see that a job will have to be force updated to run again, will it be possible to have a So that we don’t have to worry if the job should be created or force updated (when running a database migration job for example) Alternatively, would it be possible to create a job service without starting it. and have a command to run this job when we like ? The benefits here would be that the job configuration stays always the same and can be included easily in a |
@mathroc for whatever reason, I had not considered the possibility of an Second, to create a job without starting it, you can just set |
initializing a service job with the problem with the database migration in my exemple was not that it could run twice, but that I thought I would have to query docker to see if the service job already exists and then either create the service or force update it. but with thx @dperny |
Removed WIP. The support for jobs upstream was merged. |
@thaJeztah Is there some change / way this can be expected in the next release? I'm presuming there will be a docker-v20.04 ? |
67d4da0
to
e1dabab
Compare
Rebased to hopefully fix merge conflicts. |
The code itself looks good to me, but I need to take it for a spin to check the UX and the whole feature 👍 |
Hello! Is there an ETA for this feature? |
Hi @mathroc, curious about which migration strategy you settled on. I'm thinking about something similar to what you've suggested:
|
|
||
```bash | ||
$ docker service create --name mythrottledjob \ | ||
--mode replicated-job \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe --kind=job --mode=replicated
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively this could be docker job create
even if the API is for a service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While docker job create
is probably a clean(er); advantages;
- separate subcommand
- we could hide/remove flags that don't apply to jobs
Downside:
- given that they're both backed by a service, we need to either filter out jobs from
docker service ls
(etc) and vice-versa. - that might become confusing if we don't apply the same filter very strictly (
docker service rm <job service>
could otherwise remove a job) - also think of
docker service create myjob
, which could show an error thatservice myjob already exists
, butdocker service ls
wouldn't show it
Reviewing this together with @silvin-lubecki I'll post notes along the way (sorry for the extra noise) I tried running the example you included in the documentation; docker service create --name myjob \
--mode replicated-job \
bash "true" Output looks good to me job progress: 1 out of 1 complete [==================================================>]
active tasks: 0 out of 0 tasks
1/1: complete [==================================================>]
job complete What I think is slightly confusing, is that "REPLICAS" shows
Thinking if I can come with a better presentation for that 🤔 Trying with more replicas: docker service rm myjob
docker service create --name myjob --mode replicated-job --replicas=4 bash "true"
vbtoewcdxz17hfa14p2kua96r
job progress: 4 out of 4 complete [==================================================>]
active tasks: 0 out of 0 tasks
1/4: complete [==================================================>]
2/4: complete [==================================================>]
3/4: complete [==================================================>]
4/4: complete [==================================================>]
job complete docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
vbtoewcdxz17 myjob replicated job 0/4 (4/4 completed) bash:latest |
Slightly confusing: $ docker service scale myjob=2
myjob: scale can only be used with replicated mode The job is replicated, so perhaps we should change this to "cannot be used with jobs" instead of mentioning the |
Looks like the compose schema (or validation) needs some updating; using this compose file: version: "3.9"
services:
job:
image: bash
command: "true"
deploy:
mode: "replicated-job"
replicas: 6 I get an error: docker stack deploy -c docker-compose.yml mystack
Creating network mystack_default
service job: Unknown mode: replicated-job I tried updating the compose code: diff --git a/cli/compose/convert/service.go b/cli/compose/convert/service.go
index da182bbf..9ce91b90 100644
--- a/cli/compose/convert/service.go
+++ b/cli/compose/convert/service.go
@@ -609,12 +609,12 @@ func convertDeployMode(mode string, replicas *uint64) (swarm.ServiceMode, error)
serviceMode := swarm.ServiceMode{}
switch mode {
- case "global":
+ case "global", "global-job":
if replicas != nil {
return serviceMode, errors.Errorf("replicas can only be used with replicated mode")
}
serviceMode.Global = &swarm.GlobalService{}
- case "replicated", "":
+ case "replicated", "replicated-job", "":
serviceMode.Replicated = &swarm.ReplicatedService{Replicas: replicas}
default:
return serviceMode, errors.Errorf("Unknown mode: %s", mode) After that, docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
uq7b0h3v6ghf mystack_job replicated 0/6 bash:latest
|
Tried with multiple replicas: $ docker service create --name test --replicas=10 --max-concurrent=2 --mode=replicated-job bash true
idja6562brblp995gnq7ps78i
job progress: 10 out of 10 complete [==================================================>]
active tasks: 0 out of 0 tasks
1/10: complete [==================================================>]
2/10: complete [==================================================>]
3/10: complete [==================================================>]
4/10: complete [==================================================>]
5/10: complete [==================================================>]
6/10: complete [==================================================>]
7/10: complete [==================================================>]
8/10: complete [==================================================>]
9/10: complete [==================================================>]
10/10: complete [==================================================>]
job complete Then tried a $ docker service ps test
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
tti15rwuoglx test.1 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago
u0ef31o2mgyq test.2 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago
avvo2zxxysu8 test.3 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago
l022qhhmz148 test.4 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago
u3sc4nht7va9 test.5 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago
zsgfzezhit0y test.6 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago
t1dx8jej05lq test.7 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago
teon4syxhul9 test.8 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago
e7qb4wn8ew7h test.9 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago
lankjaond9iy test.9h335bnxjxm95yf2n8uuz4kk3 hello-world:latest 4d35eab424f4 Complete Complete 7 minutes ago This part of the UI/UX looks nice so far 👍 |
Trying with a long running container as job; creating (as expected) continues waiting for it to complete, so I had to docker service create --mode=replicated-job --name=longy nginx:alpine
qx9ol95pxk0i7ztz5291wspc8
job progress: 0 out of 1 complete [> ]
active tasks: 1 out of 1 tasks
1/1: running [=============================================> ]
^COperation continuing in background. After that, I killed the container: docker kill fdad641d54a6 Looking at docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
qx9ol95pxk0i longy replicated job 1/1 (0/1 completed) nginx:alpine I can see a new task was created for the service
Is it expected that a new task is created if one fails, or should it terminate the job, and mark it as "failed" ? |
c8wgl7q4ndfd frontend replicated 5/5 nginx:alpine | ||
dmu1ept4cxcf redis replicated 3/3 redis:3.0.6 | ||
iwe3278osahj mongo global 7/7 mongo:3.3 | ||
hh08h9uu8uwr job replicated-job 1/1 (3/5 completed) nginx:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 1/1
correct here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It implies that 1 task is still running, 3 tasks are completed, and 5 tasks are desired. This would imply the job is running 5 iterations one after another.
|
||
Jobs are a special kind of service designed to run an operation to completion | ||
and then stop, as opposed to running long-running daemons. When a Task | ||
belonging to a job exits successfully (return value 0), the Task is marked as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment; do we want failed tasks to be started / replaced / tried again by default? Or should it have --restart-condition=none
?
One thing I'm thinking of; should we have an alias (on the CLI) for |
I'm overall good with the current UX. I think that fully separating "jobs" from regular services would not be possible (because they share the same constructs under the hood). That said; it would be possible to add Some things I think should be addressed:
|
Discussing with @tonistiigi @cpuguy83 - haven't checked yet, but we need to check what happens if I create a job with |
|
I'm opposed to the alias of |
* Added two new modes accepted by the `--mode` flag * `replicated-job` creates a replicated job * `global-job` creates a global job. * When using `replicated-job` mode, the `replicas` flag sets the `TotalCompletions` parameter of the job. This is the total number of tasks that will run * Added a new flag, `max-concurrent`, for use with `replicated-job` mode. This flag sets the `MaxConcurrent` parameter of the job, which is the maximum number of replicas the job will run simultaneously. * When using `replicated-job` or `global-job` mode, using any of the update parameter flags will result in an error, as jobs cannot be updated in the traditional sense. * Updated the `docker service ls` UI to include the completion status (completed vs total tasks) if the service is a job. * Updated the progress bars UI for service creation and update to support jobs. For jobs, there is displayed a bar covering the overall progress of the job (the number of tasks completed over the total number of tasks to complete). * Added documentation explaining the use of the new flags, and of jobs in general. Signed-off-by: Drew Erny <derny@mirantis.com>
Sorry for the delay (again)
I think it's ok to leave it out of compose for now; we should perhaps consider if we want it to be a separate "entity" inside compose files (instead of just an option for
There's something to be said for both sides; either I want (e.g.) a migration to run (but don't try to run it again if it failed), or have a guarantee that all my jobs will at least continue until completed. I think it's ok in the current implementation, as long as we're explicit about this in the documentation so that users are not caught by surprise
👍 mostly me thinking out loud; could also be easily added in future it there's a strong need for it, so no blocker from my perpective |
@dperny I see you pushed after my previous comment; were there specific things you addressed/changed? |
Yes. I fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot for that PR @dperny 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's merge; we can tweak docs later if needed 👍
thanks @dperny !
This is really good stuff! Question: I see some discussion w.r.t compose support, would that have its own PR too then? would it make it to 20.x.0 release? Thanks! |
@Ohtar10 compose support has not been added yet. Some discussion may be needed if we implement this as "mode" for services, or if a new "jobs" top-level property is added. Perhaps "mode" could be implemented as (temporary?) solution, but may need some work; #2262 (comment) (contributions should be welcome though!) |
Cool, thanks for the reply @thaJeztah. As an end-user, and after trying the feature from the test channel, I would say that if jobs will keep being part of In case the feature evolves to something more elaborate, e.g., cronjobs, additional configurable properties etc. then I think it would make sense to have a top-level property not only in the compose file but at docker CLI level as well, e.g., |
- What I did
Add support to the CLI for swarm jobs (moby/moby#40307).
Does not include compose support.
- How I did it
--mode
flagreplicated-job
creates a replicated jobglobal-job
creates a global job.replicated-job
mode, thereplicas
flag sets theTotalCompletions
parameter of the job. This is the total number of tasks that will runmax-concurrent
, for use withreplicated-job
mode. This flag sets theMaxConcurrent
parameter of the job, which is the maximum number of replicas the job will run simultaneously.replicated-job
orglobal-job
mode, using any of the update parameter flags will result in an error, as jobs cannot be updated in the traditional sense.docker service ls
UI to include the completion status (completed vs total tasks) if the service is a job.- How to verify it
Includes automated tests for all major changes.
- Description for the changelog
Added CLI support for swarm jobs.