Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watcher does not detect unavailable server. #34

Open
0x7f opened this issue Oct 26, 2014 · 5 comments
Open

Watcher does not detect unavailable server. #34

0x7f opened this issue Oct 26, 2014 · 5 comments

Comments

@0x7f
Copy link

0x7f commented Oct 26, 2014

When creating a simple watcher like this:

var Etcd = require("node-etcd");
var etcd = new Etcd("127.0.0.1", 4001);
var watcher = etcd.watcher("/x/y/z", null, { recursive: true });
watcher.on("change", console.log);
watcher.on("error", console.error);

When the etcd server goes down, the watcher will not throw an error or notify about unavailable server in any way. I would expect the on error callback to be called.

@stianeikeland
Copy link
Owner

You can probably notice it listening to the 'reconnect' event, but yeah, right now it will just try to reconnect forever and never fail (unless it receives some data it didn't expect).

I guess a configureable max retry would be in order here, and then have it fail with an 'error' event. Maybe try to unify this with the recent changes made for cluster setups.

@0x7f
Copy link
Author

0x7f commented Oct 27, 2014

Yes, the retry would improve the API here as well.

@0x7f
Copy link
Author

0x7f commented Oct 27, 2014

Hm, you say he is retrying forever. But why is he not catching up again when the whole cluster was down and then restarted again? Can you reproduce this?

@stianeikeland
Copy link
Owner

Hmm, seems to catch up here for me:

E = require 'node-etcd'
e = new E ['127.0.0.1:4001', '127.0.0.1:4002', '127.0.0.1:4003']

w = e.watcher 'a'
w.on 'error', (e) -> throw e
w.on 'reconnect', console.log
w.on 'change', (d) ->
  process.stdout.write '.'
  # console.log d.node.value, d.node.modifiedIndex

setLoop = () ->
  e.set 'a', Math.random(), setLoop

setLoop()

Screen cap: http://static.eikeland.se.s3.amazonaws.com/etcd-kill.mp4

Sometimes it takes a few seconds because of timeouts, etc, but I havn't been able to fool it by killing nodes.

@0x7f
Copy link
Author

0x7f commented Oct 28, 2014

Thank you for even creating a video! :)

Hm, I retried it here and yes, It works. Obviously, I was just not waiting long enough for the consumer to catch up again. Still, the consumer is slower in catching up again than the producer (a couple of minutes vs a couple of seconds). The difference in my test setup is that the consumer and producer are in separate processes. Don't know if it makes any difference. Could you maybe also retry it with separate processes and check whether the producer is catching up faster than the consumer?

Here is the consumer code i used:

var Etcd = require("node-etcd");
var etcd = new Etcd([
  "127.0.0.1:4001",
  "127.0.0.1:4002",
  "127.0.0.1:4003",
  "127.0.0.1:4004",
]);
var watcher = etcd.watcher("/x/y/z", null, { recursive: true });
watcher.on("error", function(err) { throw err; });
watcher.on("change", function() { process.stdout.write("."); });
watcher.on("reconnect", console.log);

and the producer:

var Etcd = require("node-etcd");
var etcd = new Etcd([
  "127.0.0.1:4001",
  "127.0.0.1:4002",
  "127.0.0.1:4003",
  "127.0.0.1:4004",
]);
function setLoop() {
  var key = "/x/y/z/a" + Math.random();
  etcd.set(key, "foo", null, function(err, result) {
    if (err) { throw err; }
    process.stdout.write(".");
    setTimeout(setLoop, 1/50);
  });
}
setLoop();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants