-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mismatch desired state of tasks #1414
Comments
The scheduler internally computes whether an allocation is in terminal state or not by looking at both the desired state and client state. In the above case the allocation This is the current state and we are discussing internally if we can get rid of one of the states but it's not super high priority. Regarding the problem you are trying to solve, very soon there you should be able to summarize the state of the job by a new API and you won't have to look into all the allocations and determine how many allocations are running. Please take a look at #1340, hoping that would work for you? |
I am not sure that #1340 would really help me. I am looking basically for things that are supposed to have 2 instances running, but instead only have 1. I would not want to error on anything in a failed state, since that is somewhat expected. I really only want to alert of things that are supposed to be running, but are in (as you said) a "terminal" state.
All this ticket is about is transitioning the task to some other state when the job is re-submitted instead of waiting for it to be GC'd. If I run force/gc on nomad this allocation does get removed because of the "terminal" state that it is in. Ah which got me thinking. What I am doing is actually not safe at all. If I change the count to 2 and run the with
Which without prior knowledge of the group actually being for count 2 looks like everything is happy.
Maybe my original thought was wrong about the |
I don't think we'll change the meaning of desired, but I do think the root ask of this Issue is good. I'm going to close this in favor of #13053 since I think the new Issue has a bit less noise. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Operating system and Environment details
NA, but vagrant
Issue
Failed allocations with desired state of
run
stay as desired staterun
when job is resubmittedReproduction steps
Dockerfile
Job
Start job. Then kill it and its image
This means that the task will transition into the failed state, if the job is in the failed state
nomad status echoer-docker
showsdesired
state to berun
an the state asfailed
as expected.Wait until the job is dead, then run the docker build again to get the image back.
After this resubmit the job to nomad.
If you run
nomad status echoer-docker
at this point you see.I would expect that at least one of these tasks should have the
desired
state ofstop
since the job definition states only 1 should be running at a time.This is making it difficult for me to monitor nomad for partial jobs running. Where 1 of the tasks in a 2 task job is running, but the other has been marked as failed is not running anymore. Currently I have been looking for tasks with
desired
state ofrun
that are notrunning
The text was updated successfully, but these errors were encountered: