Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for a ZooKeeper Master Detector #1

Open
tarnfeld opened this issue Jul 21, 2014 · 5 comments
Open

Support for a ZooKeeper Master Detector #1

tarnfeld opened this issue Jul 21, 2014 · 5 comments

Comments

@tarnfeld
Copy link
Contributor

Just getting to grips with things, but I assume it's just a case of implementing one of those in pesos.detector ?

@wickman
Copy link
Owner

wickman commented Mar 28, 2015

Implemented at a835b12.

@tarnfeld
Copy link
Contributor Author

Just giving this a go now on a staging cluster, actually. I'll close it if it seems to work fine.

@tarnfeld
Copy link
Contributor Author

In general it seems to work OK (the zookeeper group aspect) but I think my tested around ZK also falls down with #15. In the event of an identical appointment, presumably the code that continues to re-connect to the known master should kick in? I think that bit is broken.

2015-03-28 03:21:01,841[pesos.detector] FutureMasterDetector.detect no-op because previous same as leader: None
2015-03-28 03:21:01,843[pesos.detector] FutureMasterDetector.appoint accepting appointment master@192.168.33.2:5050
2015-03-28 03:21:01,843[pesos.scheduler] New master detected: master@192.168.33.2:5050
2015-03-28 03:21:01,843[pesos.scheduler] Registering framework: framework {
  user: "tom"
  name: "xxx"
  hostname: "1.0.0.127.in-addr.arpa"
}

2015-03-28 03:21:01,844[pesos.scheduler] Setting transition watch from previous master: master@192.168.33.2:5050
2015-03-28 03:21:01,844[pesos.detector] FutureMasterDetector.detect no-op because previous same as leader: master@192.168.33.2:5050
2015-03-28 03:21:01,919[x.scheduler] Framework 20150328-031924-35760320-5050-1308-0000 registered to http://vagrant-ubuntu-trusty-64:5050
2015-03-28 03:21:01,961[x.scheduler] Handling 1 offers
2015-03-28 03:21:03,844[pesos.scheduler] Skipping registration because we are either connected or there is no appointed master.
2015-03-28 03:21:07,354[x.scheduler] Handling 1 offers
2015-03-28 03:21:13,358[x.scheduler] Handling 1 offers
2015-03-28 03:21:19,362[x.scheduler] Handling 1 offers
2015-03-28 03:21:23,611[compactor.context] Received disconnection from master@192.168.33.2:5050 but no stream found.
2015-03-28 03:21:30,659[pesos.detector] FutureMasterDetector.appoint skipping identical appointment master@192.168.33.2:5050
2015-03-28 03:21:34,061[pesos.detector] FutureMasterDetector.appoint skipping identical appointment master@192.168.33.2:5050

@wickman
Copy link
Owner

wickman commented Mar 28, 2015

thanks for the report. I'll take a closer look.

@tarnfeld
Copy link
Contributor Author

Simply removing the check from here seems to do the trick, but I don't think that's the real solution.

Edit: I also added the following method to the scheduler;

def exited(self, pid):
  if pid == self.master:
    log.info('Disconnected from current master: %s' % pid)
    self.context.delay(self.MASTER_DETECTION_RETRY_SECONDS, self.pid, 'detect')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants