You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this test case, we have a host that cannot schedule containers since docker engine's cert is bad . But nomad retries only in the same host which is pointless in this case.
2017/08/09 02:28:25.025425 [ERR] driver.docker: failed pulling container registry:5000/org/fam/app/service:v0.4: API error (500): {"message":"Get https://registry:5000/v1/_ping: x509: certificate signed by unknown authority"}
our mesos/marathon cluster will select a different host (usually) and chances are the service will then be up an running on another host, as opposed to being down. This might be due to bin packing vs spread.
Perhaps there could be a restart mode for this behavior:
restart {
mode = "relocate" # this is a specialization of "delay".
}
If constrained to the host, then this is equivalent to delay. This will allow nomad to still schedule successfully even when hosts are not fully functional for some reason.
The text was updated successfully, but these errors were encountered:
samart
changed the title
Reschedule container that won't start on a host
Reschedule and relocate container that won't start on a host
Aug 12, 2017
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
nomad 0.6
In this test case, we have a host that cannot schedule containers since docker engine's cert is bad . But nomad retries only in the same host which is pointless in this case.
2017/08/09 02:28:25.025425 [ERR] driver.docker: failed pulling container registry:5000/org/fam/app/service:v0.4: API error (500): {"message":"Get https://registry:5000/v1/_ping: x509: certificate signed by unknown authority"}
our mesos/marathon cluster will select a different host (usually) and chances are the service will then be up an running on another host, as opposed to being down. This might be due to bin packing vs spread.
Perhaps there could be a restart mode for this behavior:
If constrained to the host, then this is equivalent to delay. This will allow nomad to still schedule successfully even when hosts are not fully functional for some reason.
The text was updated successfully, but these errors were encountered: