Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Watchmen Down. #78

Open
nysky1 opened this issue Nov 15, 2016 · 4 comments
Open

Watchmen Down. #78

nysky1 opened this issue Nov 15, 2016 · 4 comments

Comments

@nysky1
Copy link

nysky1 commented Nov 15, 2016

I'm using the docker version of watchmen. Love it! But every few months, watchmen seems to have a problem. The service begins marking almost all sites down (currently have 12) when they are not. They are marked active again after about a minute (interval check) only to be then remarked as down shortly thereafter. I generally just reboot the server but today, that didn't do it. Can I get you a log or anything else that might help both of us?! Issue is active as we speak.

@nysky1
Copy link
Author

nysky1 commented Nov 15, 2016

Running 3.3.1.

watchmenserver_1 | HWL check failed!. Error: {"code":"ETIMEDOUT","connect":true}
watchmenserver_1 | HWL is still down!. Error: {"code":"ETIMEDOUT","connect":true}
watchmenserver_1 | FFP (API) check failed!. Error: {"code":"ESOCKETTIMEDOUT","connect":false}
watchmenserver_1 | FFP (API) down!. Error: {"code":"ESOCKETTIMEDOUT","connect":false}
watchmenserver_1 | FFP (Mobile) check failed!. Error: {"code":"ESOCKETTIMEDOUT","connect":false}
watchmenserver_1 | FFP (Mobile) down!. Error: {"code":"ESOCKETTIMEDOUT","connect":false}
watchmenserver_1 | MM (Registration) check failed!. Error: {"code":"ESOCKETTIMEDOUT","connect":false}
watchmenserver_1 | MM (Registration) down!. Error: {"code":"ESOCKETTIMEDOUT","connect":false}
watchmenserver_1 | MC Prod check failed!. Error: {"code":"ETIMEDOUT","connect":false}
watchmenserver_1 | MC Prod down!. Error: {"code":"ETIMEDOUT","connect":false}

@nysky1
Copy link
Author

nysky1 commented Nov 16, 2016

After watching the time to respond for each site within the docker machine, most sites were taking MUCH longer than normal which could have been attributed to slow DNS lookups. Perhaps there were not enough threads available, thus causing ECOCKETTIMEDOUT messages? Any guidelines on optimal thread configuration for Ubuntu? And would it be best to configure threads inside the run-monitor-server.js?

Something like...
....

var WatchMenFactory = require('./lib/watchmen');
var sentinelFactory = require('./lib/sentinel');

process.env.UV_THREADPOOL_SIZE = 10; //<-- Insert?

var RETURN_CODES = {
  OK: 0,
  BAD_STORAGE: 1,
  GENERIC_ERROR: 2
};

....

Thanks!

@elboletaire
Copy link

I'm having this same issue, but not after a few months. With a fresh install watchmen states that some services are down (randomly) when they are not.

The error messages tend to be ETIMEDOUT errors.

@nysky1
Copy link
Author

nysky1 commented Apr 11, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants