Watcher does not detect unavailable server. #34

0x7f · 2014-10-26T21:07:24Z

When creating a simple watcher like this:

var Etcd = require("node-etcd");
var etcd = new Etcd("127.0.0.1", 4001);
var watcher = etcd.watcher("/x/y/z", null, { recursive: true });
watcher.on("change", console.log);
watcher.on("error", console.error);

When the etcd server goes down, the watcher will not throw an error or notify about unavailable server in any way. I would expect the on error callback to be called.

The text was updated successfully, but these errors were encountered:

stianeikeland · 2014-10-27T07:53:53Z

You can probably notice it listening to the 'reconnect' event, but yeah, right now it will just try to reconnect forever and never fail (unless it receives some data it didn't expect).

I guess a configureable max retry would be in order here, and then have it fail with an 'error' event. Maybe try to unify this with the recent changes made for cluster setups.

0x7f · 2014-10-27T09:00:54Z

Yes, the retry would improve the API here as well.

0x7f · 2014-10-27T09:05:45Z

Hm, you say he is retrying forever. But why is he not catching up again when the whole cluster was down and then restarted again? Can you reproduce this?

stianeikeland · 2014-10-27T10:07:19Z

Hmm, seems to catch up here for me:

E = require 'node-etcd'
e = new E ['127.0.0.1:4001', '127.0.0.1:4002', '127.0.0.1:4003']

w = e.watcher 'a'
w.on 'error', (e) -> throw e
w.on 'reconnect', console.log
w.on 'change', (d) ->
  process.stdout.write '.'
  # console.log d.node.value, d.node.modifiedIndex

setLoop = () ->
  e.set 'a', Math.random(), setLoop

setLoop()

Screen cap: http://static.eikeland.se.s3.amazonaws.com/etcd-kill.mp4

Sometimes it takes a few seconds because of timeouts, etc, but I havn't been able to fool it by killing nodes.

0x7f · 2014-10-28T08:55:45Z

Thank you for even creating a video! :)

Hm, I retried it here and yes, It works. Obviously, I was just not waiting long enough for the consumer to catch up again. Still, the consumer is slower in catching up again than the producer (a couple of minutes vs a couple of seconds). The difference in my test setup is that the consumer and producer are in separate processes. Don't know if it makes any difference. Could you maybe also retry it with separate processes and check whether the producer is catching up faster than the consumer?

Here is the consumer code i used:

var Etcd = require("node-etcd");
var etcd = new Etcd([
  "127.0.0.1:4001",
  "127.0.0.1:4002",
  "127.0.0.1:4003",
  "127.0.0.1:4004",
]);
var watcher = etcd.watcher("/x/y/z", null, { recursive: true });
watcher.on("error", function(err) { throw err; });
watcher.on("change", function() { process.stdout.write("."); });
watcher.on("reconnect", console.log);

and the producer:

var Etcd = require("node-etcd");
var etcd = new Etcd([
  "127.0.0.1:4001",
  "127.0.0.1:4002",
  "127.0.0.1:4003",
  "127.0.0.1:4004",
]);
function setLoop() {
  var key = "/x/y/z/a" + Math.random();
  etcd.set(key, "foo", null, function(err, result) {
    if (err) { throw err; }
    process.stdout.write(".");
    setTimeout(setLoop, 1/50);
  });
}
setLoop();

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watcher does not detect unavailable server. #34

Watcher does not detect unavailable server. #34

0x7f commented Oct 26, 2014

stianeikeland commented Oct 27, 2014

0x7f commented Oct 27, 2014

0x7f commented Oct 27, 2014

stianeikeland commented Oct 27, 2014

0x7f commented Oct 28, 2014

Watcher does not detect unavailable server. #34

Watcher does not detect unavailable server. #34

Comments

0x7f commented Oct 26, 2014

stianeikeland commented Oct 27, 2014

0x7f commented Oct 27, 2014

0x7f commented Oct 27, 2014

stianeikeland commented Oct 27, 2014

0x7f commented Oct 28, 2014