Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Availability #520

Closed
calebmeyer opened this issue Aug 20, 2015 · 30 comments
Closed

High Availability #520

calebmeyer opened this issue Aug 20, 2015 · 30 comments

Comments

@calebmeyer
Copy link

Can this app be set up for high availability? If you have a cluster, can you have a load balancer, at least two application nodes, and several mongo nodes?

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@Sing-Li
Copy link
Member

Sing-Li commented Aug 20, 2015

We need to try it in a lab environment. Configure a mongo replica set with op log. Load balancer needs to support sticky sessions if you have old browsers and/or older mobile clients. @calebmeyer -- do you have access to test facilities for HA configurations? If issues are found in such configurations - we can fix them. (note - some optional packages, such as file-upload to local filesystem, are by design not HA compatible)

@calebmeyer
Copy link
Author

I have access to some testing facilities. I will be trying this out. Can you recommend a load balancer?

@Sing-Li
Copy link
Member

Sing-Li commented Aug 20, 2015

haproxy? Other community members may have more experience with other software load balancers. Please keep us posted on how it goes.

@geekgonecrazy
Copy link
Contributor

I enjoy playing with these things. I'll give this a go.

So things wanted tested?

  • cluster mongodb (couple nodes)
  • rocket.chat (couple instances. Pointing to the master of the mongo cluster)
  • load balancer (haproxy or nginx)

Anything else? I guess rattle it a bit and see what bolts fall out? :D

@Sing-Li
Copy link
Member

Sing-Li commented Aug 22, 2015

My 2c

  • variety of client browser/devices (especially IE versions)
  • force-fail mongo instance; force-fail meteor instance; observe interruption in user experience - expected to see some UI anomalies due to delayed session recovery/fail-over, but no data lost

@Sing-Li
Copy link
Member

Sing-Li commented Sep 7, 2015

@calebmeyer any update on experiments with clusters of more than 4 RC instances? Thanks.

@calebmeyer
Copy link
Author

I have logged a JIRA for access to my organization's openstack cluster, I'm just waiting on a response. I'll keep you posted.

@Sing-Li
Copy link
Member

Sing-Li commented Sep 7, 2015

Thanks, @calebmeyer ! 👍

@Sing-Li
Copy link
Member

Sing-Li commented Sep 11, 2015

#769 <---- where are the HA guys when we really need them?! 😀

@geekgonecrazy
Copy link
Contributor

We need to do some testing this weekend

@leefaus
Copy link

leefaus commented Sep 15, 2015

@geekgonecrazy @Sing-Li

I have an HA solution up and running at Google Cloud. Here is the private Gist that describes what I did. I still need to tweak some of the docs, but it is running famously at http://130.211.152.251/.

@geekgonecrazy
Copy link
Contributor

@leefaus sweet! Come across any issues?

Also did you use anything in front of your nodes?

@leefaus
Copy link

leefaus commented Sep 15, 2015

@geekgonecrazy

The only issue is trying to automate setting up the MongoDB Replica Set. I am digging more into that. And on Google Cloud, the HTTP Load Balancing doesn't support websockets so you need to use Network Load Balancing. Other than that, everything went according to plan.

@Sing-Li
Copy link
Member

Sing-Li commented Sep 15, 2015

Thanks for the update! Automated provisioning of failed mongo would be cool ... k8s can do that for sure. But websocket load balance front end is going to take some testing and tinkering. I think. We need to come up with some way to blast it from the front, then introduce different failures at the back and see how the thing behaves 😸

What is NLB? Is it round robin?

Please keep us in the 'loop' :-)

@marceloschmidt marceloschmidt added this to the Roadmap milestone Sep 21, 2015
@calebmeyer
Copy link
Author

@leefaus Thanks for the awesome starting point!
@Sing-Li I got access to our internal open stack. I'll be looking into setting up a cluster with:
1 HAProxy node
2 Rocket.Chat meteor nodes
3 MongoDB nodes

I'm new to setting all this up, so it will likely take some time while I learn the technologies (or at least how to configure them).

Can anyone recommend testing tools for hitting the frontend, or should I just script something?

@Sing-Li
Copy link
Member

Sing-Li commented Sep 21, 2015

@calebmeyer thanks for the update! Two suggestions:

  1. Can you please 'up' the number of Rocket.Chat meteor nodes from 2 to either 6 or 10? The reason being the 24 x 7 demo chat that we have running now is already 4 nodes - and we see no problem at that tier so far
  2. For front-end loading, I'd suggest scripting something yourself. We will have a Rocket.Chat specific scalable 'load test tool' in the future.

@geekgonecrazy
Copy link
Contributor

@calebmeyer for load i'd recommend taking a look at Asteroid

Could use multiple instances of a script using this and hit the server with as many messages as you want.

@calebmeyer
Copy link
Author

@Sing-Li will I need to set up sticky sessions?

@leefaus
Copy link

leefaus commented Sep 29, 2015

@calebmeyer

Yes, you need sticky sessions.

@calebmeyer
Copy link
Author

Hey all, sorry for the long downtime. I got one of the openstack admins to help me. Didn't get my 6-10 app nodes, but I have it running behind a proxy on our corporate network.

I used @leefaus gist to help with the mongo replicaset (thank you very much!), and I have my haproxy balancing via source.

However, I noticed that avatars are stored on the filesystem in the administration/accounts section. Is that intended? How did we get around that limitation for the demo?

@jonbaer
Copy link

jonbaer commented Nov 13, 2015

I am interested in this topic as well, I have haproxy pointing to nodes managed by PM2 w/ only a single Mongo cluster @ the moment, the thing I am mainly interested in though is how to stress test + what should be tested (as far as Meteor itself goes), it's my understanding that the DDP channel would be the bottleneck (do you test w/ something like https://github.com/observing/thor in that case). Id like to get a report where I can determine how much AWS/Rackspace requirements are needed for N active users + connections w/ Rocket.Chat. I would be interested in (documentation) what the good haproxy configuration would be (including SSL setup) ...

@Sing-Li
Copy link
Member

Sing-Li commented Nov 14, 2015

Guys. Please open additional issues for capacity planning and load testing discussions.

There is enough complexity in just getting HA testing going alone.

@calebmeyer - so how many nodes do you have running? In Administration-> API, there is setting to change avatarStore_type to GridFS. This will store the avatars in mongo for your test scenarios.

@calebmeyer
Copy link
Author

@Sing-Li I have 6 nodes: 1 proxy, 2 app, 3 mongo. I see the setting for Avatar Storage Type, do I also need to clear the Avatar Storage Path? Currently it says /var/www/rocket.chat/uploads/avatar/.

You're right about HA being different from load testing, which is why I was surprised we'd need so many nodes for it. I figured taking down one node should leave the app running for everyone, taking down both would take it down (in a high availability scenario).

@Sing-Li
Copy link
Member

Sing-Li commented Nov 16, 2015

@calebmeyer I don't think you need to clear the path - but might as well.

And yes. It is just that our demo-server production is already at 4 app-nodes, and it is basically stable. However, it does not usually encounter the barrage of anomalies that you will be subjecting your cluster to 😄

So now, we'll just sit back and await your 'breakage reports' 👍

@calebmeyer
Copy link
Author

Time to bring in the chaos monkey :) Thanks for the quick response. Turns out the avatar issue I was seeing went away when I cleared my cache, so I can leave the GridFS settings as they are.

@rodrigok rodrigok modified the milestones: Roadmap, Important Feb 23, 2016
@richardwlu
Copy link

What do we configure the MONGO_URL and MONGO_OPLOG_URL environment variables on a secondary instance of the app with a mongo replica set of 3 members? I noticed in this gist https://gist.github.com/leefaus/fd55eee32f1dc5918220 that there is a list for MONGO_URL but nothing for MONGO_OPLOG_URL.

Is the gist for MONGO_URL accurate and should the second instance MONGO_OPLOG_URL point to the primary mongodb or mongodb://localhost:27017/local?

We're currently running on CentOS and have one instance of the app running and am currently installing and bring up another Rocket.Chat instance of the app up on another server. We will be adding 3 members of the replica set, one on the primary Rocket.chat instance, one on the secondary Rocket.chat instance, and another on a separate mongo server.

@engelgabriel
Copy link
Member

Please see #1867

@engelgabriel
Copy link
Member

@engelgabriel engelgabriel modified the milestone: Important Dec 6, 2016
@benyanke
Copy link

Not sure the status of this, but I'm willing to do whatever's needed to test on my docker cluster at home.

@geekgonecrazy
Copy link
Contributor

See referenced issue on docs repo right above as well as our documentation. Read through various docker installs and you'll get a variaty of ways to do your setup

HappyTobi pushed a commit to HappyTobi/Rocket.Chat that referenced this issue Jul 10, 2018
Peym4n pushed a commit to redlink-gmbh/Rocket.Chat that referenced this issue Apr 4, 2019
…thread-users

[Threading] Configure limit of max invited users
Shailesh351 pushed a commit to Shailesh351/Rocket.Chat that referenced this issue Feb 16, 2021
…4fab4c

[Upstream Catchup] Merge RC:master to develop_pwa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants