-
Notifications
You must be signed in to change notification settings - Fork 842
Health status not reflected in REST API until some time after start #1270
Comments
Just an update to this issue - reducing the reconciliation_interval on the marathon masters reduced the duration of time that the tasks were without health check results, but did not completely eliminate the problem. |
Hi Scott, in 0.8.1 we fixed a bug which pretty much sounds like what you describe: health check results are lost, although health checks are running, until reconciliation when a second health check copy is started. Then the health check results are processes correctly. Please take a look at the latest 0.8.1 rc build. I am pretty confident that the problem will disappear with the new release. Regards,
|
Here is the old bug: #1082 |
Thanks @sttts - I've deployed 0.8.1 RC2 and confirmed that it does fix the issue. Thank you! |
Hey all,
I'm seeing an issue on my Marathon 0.8.0/Mesos 0.21.1 cluster during deployment of a new version of a running application (or, sometimes, during deployment of a new application).
When I do the deployment, I see that Marathon waits until the new task becomes healthy before killing the old task, but there is a period of time when the new task reports that it has no successful health checks even though it's the only one running. I'll give an example timeline below:
08:45: Deployment of new application. The "tasks" section of /v2/apps/ returns this:
As you can see, there's a new task on mesosnode5 that doesn't have any health check results yet.
08:48:23 - App becomes healthy according to marathon logs:
However, at 08:48:35, a call to /v2/apps/app/identityportal returns this for the "tasks" section:
As you can see, there is no information in the health check results. It's not until 08:49:16 that the API call gets health check information:
This is causing problems because we have a process that synchronizes our load balancer with Marathon, and that process thinks that there are no healthy tasks available for a period of a couple of minutes. I have not yet tried 0.8.1, but I didn't immediately see any issues in the fix list that would apply to this situation.
The text was updated successfully, but these errors were encountered: