Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Scheduler silently fails on malformed ZK URLs #950

Open
PerilousApricot opened this issue Sep 4, 2018 · 0 comments
Open

Scheduler silently fails on malformed ZK URLs #950

PerilousApricot opened this issue Sep 4, 2018 · 0 comments

Comments

@PerilousApricot
Copy link

Describe the bug
In a couple places [1] [2], the user is instructed to postfix the ZK connection string with a directory (zk node?) /cook. If the user does this, the scheduler for some reason will never connect to the mesos master.

[1] https://github.com/twosigma/Cook/blob/master/scheduler/docs/configuration.adoc
[2] https://github.com/twosigma/Cook/blob/master/scheduler/example-prod-config.edn#L15

To Reproduce
Download the latest Cook, build, and manually set the :zookeeper {: connection} config option to have a trailing /cook. The scheduler will begin some preparatory work, then seemingly hang, just periodically writing heartbeat messages to the log. I can turn this failure mode on and off by adding/removing that suffix.

Expected behavior
I'd expect an explicit crash in this case. I presume that the scheduler can't attempts to perform master election and fails because of the invalid ZK hostname. Since I never saw an error, and one of the final lines in the log is from Cook trying to find the mesos scheduler, I tried debugging that interaction, when the true failure was elsewhere.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants