reporting node health #27

flavianmissi · 2018-12-02T09:31:13Z

How is node health reported? Looked around in the code and couldn't see anything obvious.

centrifugo's api has no healthcheck endpoint, while investigating how to create one I noticed that health would need to be reported by each node. Is that correct?

To start simple we could just ping the configured engine, then report "ok" or "not ok".

I'm willing to send a PR, but would probably need a little guidance to get me started.

Great project btw.
Thanks!

FZambia · 2018-12-02T20:39:47Z

Hello, could you provide a bit more background on why do you need this? Is this for Kubernetes or smth similar?

Also do you need this per node or per full setup (i.e. switch to another Centrifugo setup on balancer for example)?

Not sure at moment which information I should provide to you as I don't understand what do you mean under node health in this moment.

flavianmissi · 2018-12-03T10:13:49Z

Hey, yes of course.

Is this for Kubernetes or smth similar?

Exactly, I deployed centrifugo into multiple kubernetes clusters and I'm currently using the admin / endpoint as healthcheck, but it's not suitable for all my envs and it isn't a very trust worthy one either.

Also do you need this per node or per full setup (i.e. switch to another Centrifugo setup on balancer for example)?

I was envisioning something more like per container (so per node I suppose?), so if a container would for example lose access to redis for any reason, the healthcheck would start failling and kubernetes (or any other similar tool) could find this out via healthcheck then re-kick the problematic container.

Hope this is more clear now, thanks for the quick answer.

FZambia · 2018-12-05T06:30:46Z

First idea is adding health status change here - this is a place where node tries to send information about itself to other nodes. This is done periodically and if this operation fails - engine is not working.

But this won't properly work in case when Redis sharding used at moment as we try to deliver Control message using each Redis shard in order, so actually some of them can return error.

Looks like if we want to check all shards this feature requires new Health method in engine which will return health status taking into account all shards and any engine specific things.

flavianmissi · 2018-12-18T10:07:39Z

But this won't properly work in case when Redis sharding used at moment as we try to deliver Control message using each Redis shard in order, so actually some of them can return error.

Is this really something this healthcheck should be concerned about thought? On a cluster installation (a redis cluster in this case), node failures are acceptable as long as the whole cluster is still operational.

Looks like if we want to check all shards this feature requires new Health method in engine which will return health status taking into account all shards and any engine specific things.

Checking the health of each and every shard sounds like something redis should be doing, not centrifuge. There's even a command for that on redis-cli: redis-cli --cluster check 127.0.0.1:7000. I think an exposed ping (or health) method in the engine should be enough. Then the Node would expose a status check method.

Thoughts?

FZambia · 2018-12-18T12:35:47Z

Is this really something this healthcheck should be concerned about thought? On a cluster installation (a redis cluster in this case), node failures are acceptable as long as the whole cluster is still operational.

Centrifuge uses application-level sharding (not Redis cluster). Because PUB/SUB does not scale well in Redis cluster - redis/redis#2672 and Redis mailing list. So we have to manually check each shard. Does it make sense?

Yep, the Node must have Health method too of course. Btw - look at related pr I opened recently - it waits for this method to be implemented in this lib.

flavianmissi · 2019-01-07T14:45:09Z

Centrifuge uses application-level sharding (not Redis cluster). Because PUB/SUB does not scale well in Redis cluster - antirez/redis#2672 and Redis mailing list. So we have to manually check each shard. Does it make sense?

Oh that explains it, it does make sense, thank you.

Yep, the Node must have Health method too of course. Btw - look at related pr I opened recently - it waits for this method to be implemented in this lib.

Great news! I see the PR is merged now, do you have any kind of time line for when this should be generally available?

Thanks for making it happen btw 🎉

FZambia · 2019-01-07T15:39:35Z

It's already available in Centrifugo v2.1.0 - but it just provides health endpoint without doing extra checks inside engine - i.e. here is a healthcheck handler. If you need more than that we still have to do this inside this lib first.

flavianmissi · 2019-01-08T08:19:49Z

This satisfies my current needs, thank you!

FZambia · 2019-01-08T11:15:37Z

OK, great! Will close this issue then because my current needs also satisfied with that simple health checker, if you will need more from it feel free to reopen!

This was referenced Dec 14, 2018

health probe centrifugal/centrifugo#252

Closed

Skeleton to support healthcheck endpoint centrifugal/centrifugo#253

Merged

FZambia closed this as completed Jan 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reporting node health #27

reporting node health #27

flavianmissi commented Dec 2, 2018

FZambia commented Dec 2, 2018

flavianmissi commented Dec 3, 2018 •

edited

Loading

FZambia commented Dec 5, 2018 •

edited

Loading

flavianmissi commented Dec 18, 2018

FZambia commented Dec 18, 2018 •

edited

Loading

flavianmissi commented Jan 7, 2019

FZambia commented Jan 7, 2019

flavianmissi commented Jan 8, 2019

FZambia commented Jan 8, 2019

reporting node health #27

reporting node health #27

Comments

flavianmissi commented Dec 2, 2018

FZambia commented Dec 2, 2018

flavianmissi commented Dec 3, 2018 • edited Loading

FZambia commented Dec 5, 2018 • edited Loading

flavianmissi commented Dec 18, 2018

FZambia commented Dec 18, 2018 • edited Loading

flavianmissi commented Jan 7, 2019

FZambia commented Jan 7, 2019

flavianmissi commented Jan 8, 2019

FZambia commented Jan 8, 2019

flavianmissi commented Dec 3, 2018 •

edited

Loading

FZambia commented Dec 5, 2018 •

edited

Loading

FZambia commented Dec 18, 2018 •

edited

Loading