-
Notifications
You must be signed in to change notification settings - Fork 0
Handling Failures
Our major point of failure is the signalling server. To recap, the signalling server handles co-ordination of peers and the meeting rooms. We have hosted 3 signalling servers on amazon ec2 instances. We have set up the servers to do a fail over - if one fails, the next one takes over.
If a signalling server goes down, before a peer to peer connection is established, then the users that load the page won't be able to see any meeting rooms. If the server goes down afterwards, then there's no problem as the connection between peers has already been established.
As part of project 2, we updated our architecture to handle signalling server failures as follows:
- We put a load-balancer in front of our signalling servers to allow us to scale along the x-axis infinitely. The client-side code now only has to be aware of a single server: the load-balancer.
- A client now always connects to a load-balancer unless it is passed a "server-ip" in the querystring of its url, e.g. http://54.149.63.219/?serverIp=54.148.97.8&roomId=4e28-99dc. This is a generated url address that is given to participants who want to join a call.
For more information on this architecture, please see the Architecture section of the wiki.
Our architecture now allows any of our signalling servers to go down without affecting connection intiation for clients. There are a few caveats:
-
The load-balancer itself might go down. If this happens, we're hooped. In the future, we might be able to commission the AWS to have a fall-back load-balancer ready to go.
-
The signalling server in question goes down after a url has been generated and sent to participants. In this case, the initiating client has no choice but to refresh the page and generate a new url and send it to the participants. We could make our application more aware of this by automatically notifying our client when this happens, and giving them new generated url via pop-up without having them reload the page. This is a UX change.