Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed mode #1259

Closed
1 task done
mabed-fr opened this issue Feb 4, 2022 · 27 comments
Closed
1 task done

Distributed mode #1259

mabed-fr opened this issue Feb 4, 2022 · 27 comments
Labels
area:core issues describing changes to the core of uptime kuma area:deployment related to how uptime kuma can be deployed feature-request Request for new features to be added

Comments

@mabed-fr
Copy link

mabed-fr commented Feb 4, 2022

⚠️ Please verify that this feature request has NOT been suggested before.

  • I checked and didn't find similar feature request

🏷️ Feature Request Type

New Monitor

🔖 Feature description

Is it possible to have several instances of uptime-kuma controlled by a central point?
In distributed mode?
Connected by wirguard ?

Regards,

✔️ Solution

Is it possible to have several instances of uptime-kuma controlled by a central point?
In distributed mode?
Connected by wirguard ?

❓ Alternatives

No response

📝 Additional Context

Congratulations for this project that I will support if one of my skills can help you.

@mabed-fr mabed-fr added the feature-request Request for new features to be added label Feb 4, 2022
@mamiu
Copy link

mamiu commented Feb 8, 2022

I like the idea of a "distributed mode" or HA mode (high availability mode), multi instance mode, multi hosts mode, fail safe mode, etc. (a few keywords so that this ticket can be found easily).

But what is "wirguard"? If you mean wireguard: You don't need a VPN tunnel to achieve something like that. Additional instances could be added via a private token (similar to how nodes are added to a Kubernetes cluster).

@adyanth
Copy link

adyanth commented Mar 5, 2022

A distributed install definitely makes sense for something that monitors uptime for other software. Would not want it to go down along with the other apps.

@jesse2point0
Copy link

jesse2point0 commented May 26, 2022

I would love this as a feature, if I could have a small instance running on a site and relaying to a master instance somewhere.

For example, say you are an MSP, and you have a few line of business applications you want to monitor inside the network of reach customer without exposing the endpoints directly or vpns. Then the local instance relays or reports the stats to a central instance. Each client site may have and internal status page, but the MSP could have those status pages published centrally for all sites and customers.

@onedr0p
Copy link

onedr0p commented Aug 7, 2022

It's kind of strange that an application to monitor other applications wouldn't support running in high availability but maybe that's not part of the scope of this project. Uptime Kuma would need to support a external DB for data and something like redis for session cache. Also I'm not aware if uptime kuma writes anything else to disk but if so that would be to be changed as well to run HA.

@mabed-fr
Copy link
Author

The project is brand new compared to what is on the market, it takes time to develop.

the main idea for my part was to have satellites in several countries but the HA is also possible.

If you want this functionality do not hesitate to comment.

@officiallymarky
Copy link

Yes!

@snth
Copy link

snth commented Nov 10, 2022

We just started using uptime-kuma and it's awesome! Thank you so much for creating this and making it available!

Like many others in this thread, the thought naturally arose of "who will watch the watchers"? A distributed/high-availability configuration would be the Bee's Knees.

Until then we're thinking about having uptime-kuma monitored by BetterUptime or healthchecks.io, which given that it's a single service should fit in the free tier.

@MaxamServices
Copy link

This would be awesome! and if it would be possible to make the nodes agree certain instance is down and then send the notification

@cheuklam
Copy link

It would be great! And if possible, better create a config that allow the notification to be sent "if 2/3 of the depoloyment detail downtime".

I just got a case yesterday that the kuma non stop sending notification (timeout every minute), but when I access the application (which host on AWS and has CloudWatch), it is completly fine. I guess there is some routing issue in between. Only 2 out of 50 application monitored by kuma has such issue.....but then it keep me awaking since 5am in the morning....

@Computroniks Computroniks mentioned this issue Feb 13, 2023
1 task
@wokawoka
Copy link

I agree, it would be great

@Computroniks
Copy link
Contributor

Just going to link #84 here as it looks similar

@louislam louislam closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2023
@simcmoi
Copy link

simcmoi commented Oct 21, 2023

it would be great. I have 1 server and 1 nas. If i can install 2 uptime in HA it will be awesome !

@cheuklam
Copy link

cheuklam commented Oct 22, 2023

Distributed across avaliable zone maybe a difficult task, but I think we can do it in a simple way.
My request for distributed mode is of 3 reasons:

  1. current uptime kuma (UK) node is down, it will think all my monitoring site was down and up again when the UK service back online, which didn't looks good; It is not the site down but the UK service down
  2. We wanna use UK becoz we wanna ensure every service is up and running, and we will have emergency plan for such cases. When the service is down itself, our alert is gone. We can do HA / MultiAZ avaliable for the website but not for the monitoring service, which is a bit weird I would said
  3. Network issue which makes the site down in part of the world. Sometimes due to CDN service of network operator, the site maybe avaliable at Europe but stopped working at US. Personally I run some lowest level VM on cloud in different region (using free tier) to check such cases.

We can fix the above issue running multiple instance on different server, but the data is not united. That's why I am thinking of the following suggestion, which should be very simple to implement and fix all the above issue:

  • Multi instance data sync
    Steps
    a) Allow an instance name for each instance
    b) add one more column to the report table, besides the status changes, mark down which instance its from
    c) In the notification channel, add one more option which is "Uptime Kuma", so we can sent the status changes to other UK instance.
    d) When the service is up, check with other "allied instance" and migrate the missing data if there is any. (Not very important but good to have)

  • Client only deployment
    a tiny nodejs / python piece of code, that will ask the primary UK instance for the list to check, and return the result. We can run this piece of code on Lamda / Function based cloud service or docker, so we can just deploy in a very low cost / no cost to address Issue 3) I mentioned above

P.S. HA mode sounds fancy but it is hard to do HA across multiAZ without a lot of virtual IP, SDWAN which involved a lot of Infra thing. I think the method I mentioned above can minimze the dependency on network infra yet fixed the issue I listed. HA setup only means to keep the servie up and running, I dont think we need ot make things too complicated as DB cluster and heartbeat service together will already be more complicated then the whole project. I like UK for the simplicity yet achieve the purpose.

@officiallymarky
Copy link

It's really not difficult, all the commercial services do this. You have multiple agents that report back, and only when x number agents fail do you report a failure.

@cheuklam
Copy link

It's really not difficult, all the commercial services do this. You have multiple agents that report back, and only when x number agents fail do you report a failure.

This is not difficult but also not HA, once the main service is down, all client have no where to report. But as I mentioned in solution point 2, it did solved some other issue.

@snth
Copy link

snth commented Oct 23, 2023

I would also really like this feature because I just had my node with Uptime on it go down the other day and while most of my things don't require HA, it would be good to have that in a monitoring solution.

I don't know much high availability setups or Uptime's internal architecture but can't you push the difficult distributed consensus problem into some other component? For example whatever your underlying storage layer is, for things like Redis, Postgres, SQLite, ... there are usually already high availability solutions available so can't you perhaps leverage that?

@snth
Copy link

snth commented Nov 10, 2023

I thought about this again and I think it might really not be that difficult, at least a basic High Availability mode that would be sufficient for my purposes.

Since uptime-kuma already comes with at docker-compose.yml file, my HA setup would be:

Since GlusterFS says it's fully POSIX compliant that should work fine. If a node goes down, Docker Swarm should redeploy uptime on another node and the data backend should be available there thanks to GlusterFS.

WDYT?


It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage, this will have to do.

@CommanderStorm
Copy link
Collaborator

CommanderStorm commented Nov 10, 2023

It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage

Actually, v2 does support (external+internal) mariadb next to sqlite and therefore also more complex setups like mariadb-galera see the progress here: https://github.com/louislam/uptime-kuma/milestone/24

For Postgres as a data backend see #959

@snth
Copy link

snth commented Nov 10, 2023

Thanks @CommanderStorm . That's great to hear.

Where can I read more about the sqlite setup? Is the connection string for that configurable because then I could probably just use Dqlite for the backend. That would be great because I would really like to avoid the GlusterFS route if possible.

@CommanderStorm
Copy link
Collaborator

I don't know what you need. The sqlite database is stored at db/kuma.db.
SQLite does not really have a connection string I know of... you just point at the file and go..

We have never looked into if dqlite is a possibility or if this should be a thing we should support (currently, I would argue that mariadb is enough, but I am not a maintainer)
=> currently not officially supported
=> we won't consider changes to this part of the system breaking

Here is our contribution guide https://github.com/louislam/uptime-kuma/blob/5b6522a54edad9737fccf195f9eaa25c6fb9d0f6/CONTRIBUTING.md

@officiallymarky
Copy link

I thought about this again and I think it might really not be that difficult, at least a basic High Availability mode that would be sufficient for my purposes.

Since uptime-kuma already comes with at docker-compose.yml file, my HA setup would be:

Since GlusterFS says it's fully POSIX compliant that should work fine. If a node goes down, Docker Swarm should redeploy uptime on another node and the data backend should be available there thanks to GlusterFS.

WDYT?

It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage, this will have to do.

Unless it is located geographically on a different Internet connection it really doesn’t improve the situation much.

@babytof

This comment was marked as spam.

@CommanderStorm
Copy link
Collaborator

There have not been any news in the last four months.
We are still working out the kinks of V2.0

@CommanderStorm CommanderStorm added area:deployment related to how uptime kuma can be deployed area:core issues describing changes to the core of uptime kuma labels Apr 21, 2024
@JaneX8
Copy link

JaneX8 commented May 5, 2024

I would love to see this feature. It would be great if multiple nodes of Uptime Kuma can be linked. And that for each check you add there is an option to select which nodes this check should run on. And also use it as a fail condition. As in "report if all fail", "report if N fails". A syncing of tasks would be better, because this way each node can keep running in standalone mode if another is down. Which makes it kind of a distributed network of individual instances that can work standalone as well as cooperate, rather than for example workers that still depends on a master to be online.

This way I would add Uptime Kuma on many of my geographically separated servers and simply make sure my checks work on all of them, without having to configure many different individual instances.

@CommanderStorm
Copy link
Collaborator

@JaneX8
You can subscribe to #84 for updates.
Currently, our priorities are on different items such as #4500 and refactoring the monitoring items for better maintainability.

@pareis
Copy link

pareis commented Oct 6, 2024

I think linking 2 nodes might not be sufficient, typically, 3 nodes are required for 2 remaining nodes to be able to figure out which node is disconnected and which are still "live". It's like distributed systems work if you want it reliable.

I've been thinking maybe we don't need this distributed mode in uptime-kuma itself, the same can be achieved by let's say running 2 kumas in different regions with the same checks, both alerting via webhooks or similar into an alerting tool that is able to combine the different state using an OR or an AND operation. Like: source A says down, source B says up => up (depending on what you want). A sustained source B down could still be used to trigger a slower alert. This would be more in the context of a on-call system where such system is in use. In the hobbyist space where we use kuma to send alerts via email for example, this wouldn't be possible easily.

@officiallymarky
Copy link

officiallymarky commented Oct 6, 2024

Ideally it would allow n nodes, but it's clear from the comments that this isn't a feature that will be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core issues describing changes to the core of uptime kuma area:deployment related to how uptime kuma can be deployed feature-request Request for new features to be added
Projects
None yet
Development

No branches or pull requests