Support for a ZooKeeper Master Detector #1

tarnfeld · 2014-07-21T13:23:43Z

Just getting to grips with things, but I assume it's just a case of implementing one of those in pesos.detector ?

The text was updated successfully, but these errors were encountered:

wickman · 2015-03-28T00:28:17Z

Implemented at a835b12.

tarnfeld · 2015-03-28T03:01:46Z

Just giving this a go now on a staging cluster, actually. I'll close it if it seems to work fine.

tarnfeld · 2015-03-28T03:24:16Z

In general it seems to work OK (the zookeeper group aspect) but I think my tested around ZK also falls down with #15. In the event of an identical appointment, presumably the code that continues to re-connect to the known master should kick in? I think that bit is broken.

2015-03-28 03:21:01,841[pesos.detector] FutureMasterDetector.detect no-op because previous same as leader: None
2015-03-28 03:21:01,843[pesos.detector] FutureMasterDetector.appoint accepting appointment master@192.168.33.2:5050
2015-03-28 03:21:01,843[pesos.scheduler] New master detected: master@192.168.33.2:5050
2015-03-28 03:21:01,843[pesos.scheduler] Registering framework: framework {
  user: "tom"
  name: "xxx"
  hostname: "1.0.0.127.in-addr.arpa"
}

2015-03-28 03:21:01,844[pesos.scheduler] Setting transition watch from previous master: master@192.168.33.2:5050
2015-03-28 03:21:01,844[pesos.detector] FutureMasterDetector.detect no-op because previous same as leader: master@192.168.33.2:5050
2015-03-28 03:21:01,919[x.scheduler] Framework 20150328-031924-35760320-5050-1308-0000 registered to http://vagrant-ubuntu-trusty-64:5050
2015-03-28 03:21:01,961[x.scheduler] Handling 1 offers
2015-03-28 03:21:03,844[pesos.scheduler] Skipping registration because we are either connected or there is no appointed master.
2015-03-28 03:21:07,354[x.scheduler] Handling 1 offers
2015-03-28 03:21:13,358[x.scheduler] Handling 1 offers
2015-03-28 03:21:19,362[x.scheduler] Handling 1 offers
2015-03-28 03:21:23,611[compactor.context] Received disconnection from master@192.168.33.2:5050 but no stream found.
2015-03-28 03:21:30,659[pesos.detector] FutureMasterDetector.appoint skipping identical appointment master@192.168.33.2:5050
2015-03-28 03:21:34,061[pesos.detector] FutureMasterDetector.appoint skipping identical appointment master@192.168.33.2:5050

wickman · 2015-03-28T03:27:27Z

thanks for the report. I'll take a closer look.

tarnfeld · 2015-03-28T03:32:24Z

Simply removing the check from here seems to do the trick, but I don't think that's the real solution.

Edit: I also added the following method to the scheduler;

def exited(self, pid):
  if pid == self.master:
    log.info('Disconnected from current master: %s' % pid)
    self.context.delay(self.MASTER_DETECTION_RETRY_SECONDS, self.pid, 'detect')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for a ZooKeeper Master Detector #1

Support for a ZooKeeper Master Detector #1

tarnfeld commented Jul 21, 2014

wickman commented Mar 28, 2015

tarnfeld commented Mar 28, 2015

tarnfeld commented Mar 28, 2015

wickman commented Mar 28, 2015

tarnfeld commented Mar 28, 2015

Support for a ZooKeeper Master Detector #1

Support for a ZooKeeper Master Detector #1

Comments

tarnfeld commented Jul 21, 2014

wickman commented Mar 28, 2015

tarnfeld commented Mar 28, 2015

tarnfeld commented Mar 28, 2015

wickman commented Mar 28, 2015

tarnfeld commented Mar 28, 2015