Final Project

Welcome to the Schrödinger's cat wiki page!

Team members: Nawaz, Prashanth and Soumya

Our First Project: foodcode.com

Access the website at http://129.114.16.182:31012/Homepage/
Please ping the team members for credentials to access below services.
- Access Kubernetes Dashboard at : https://129.114.16.187:31099/#!/login
- Jenkins build pipeline at http://129.114.16.120:8080
- RabbitMq dashboard at http://129.114.16.187:32004/

What is FoodCode.com?

FoodCode is a highly available, fault tolerant and scalable website where you can browse for recipes, upload your own creations and also check-out what your friends are cooking! Search based on ingredients want or time you have to check a new dish. Just Sign up!

User Stories

They can filter the recipes on the basis of ingredients needed or time taken to prepare the food.
Allows the user to register themselves up.
Registered user will be able to upload their recipes and manage their profile where they can keep track of the recipes they have liked and uploaded.
Registered Users can see what their friends have recently tried cooking.
The website will also recommend various recipes to the user based on their preferences.

Quick Links

Our Idea behind this project was :

We have created an online catalog website for searching and adding recipes. Our motivation to create this application was the lack of an online directory of recipes. Moreover, users in our portal can be our friends, family members and everyone around, add their recipes as per cuisine genre.
Often, we do not have hours of time to cook. Especially being a student, either we are in a hurry to cook or we may not have as many ingredients available with us. For such scenarios, we need recipes that can be prepared in less time, utilizing only the available resources.

Functionality Provided

Registered and guest users can create their account in the website and search for any particular recipe filtering them on the basis of time needed to prepare and ingredients used.
Users can register into the website to create new recipes.
Registered users can see their account details which will include their personal details along with the number of recipes with the recipe id's they have added.
Guest users can search for recipes that have already been added by other users but cannot add recipes.
Users can also see all the user details who have an account in the website.

Architecture

Napkin Diagram

Initial idea to build the application.
We decided on providing the user with below functionalities.
Must Have services:
- Login and user details management
- Search and save recipe service Search
- Front End
- Routing Server

Architecture Diagram

Architecture of deployed micro-services.

Number of Micro-services

We have developed five micro-services to support the application. This enabled us to de-couple the code and implement scalability concepts.
Each micro-service performs a specific task so that the system works efficiently providing a high response to the user.
Micro-services are:
- Login and User Detail Management service: Developed in Java using Spring, Spring MVC, spring-boot, PostgresSql. Has SwaggerUI enabled. Provides stateless authentication to the user.
- Front End: Developed in React JS. Light-weight and fluid in design. A user session is maintained at the browser side.
- Routing Server: Developed in NodeJS. This was a design decision to set up a routing server. There are two sets of databases to be updated on adding a recipe. This transaction is handled at the nodeJs server so that the client side processing is lightweight and fast.
- Search service: This service is implemented in Django using PostgreSQL. This service manages all the recipe data. Stores them in the DB and retrieves on demand by the user.
- Send Mail Service: This service notifies the user on a successful user creation. This is developed on spring architecture as well however, this service is not exposed publicly, listening to the RabbitMq server as a consumer. It sends out an email to the user if the queue is populated with right key and email id.

Deployment Diagram

Current deployment on tacc.jetstream-cloud.org using Kubernetes.
Pods with the same color run on one service which spans across multiple nodes.
Pods are created using deployments. Deployments create replica sets which in turn create pods.
This enables rolling updates and roll-back in case of faulty updates.
All the deployment is taken care by Jenkins. Unless there is a failure on deployment, the developer does not interfere into the cluster.

Implementation

We have implemented the following solutions in our project :

Kubernetes: For Load Balancing, Scaling and Service Discovery.

Using Kubernetes we created a cluster which holds various nodes. Our project holds 1 master node and 3 slave nodes. Each node contains pods which run various services in them. The services are assigned to pods by Kubernetes such a way that if one node goes down our system will still be up and running because of other active pods.

RabbitMQ: For Message Passing.

RabbitMQ communicates with Java login service. As soon as a request for Sign Up is sent Java login server sends the request to RabbitMQ, which then communicates with the consumer and then sends out an email from Java email service.

Jenkins: For Continuous Deployment.

We have created Jenkins file for each of our services. This facilitates one-click deployment of the services. As soon as we click the build now option, the service gets deployed following the steps written in the Jenkins file. This way user won't have to perform each step while testing the deployment hence saving the errors that may happen because of manual errors.

TravisCI: For Continuous Integration.

This makes testing our code easy before we containerize our code. To check the current build status we have attached build passing tag to each branch.

How to set it up

Automated setup.

This application is configured with one click deployment. Please open the git repository and change the branch to "releaseBranch_Assignment4". A commit to any file in this branch will build the project from scratch. You access the application at this URL (129.114.16.182:31012/).

Manual setup

To setup this project, a user will need a working kubernetes cluster and github cli. To install and setup a kubernetes cluster, please follow the instructions in kubernetes official page. Once the kubernetes cluster is up and running, perform the following steps.

git clone https://github.com/airavata-courses/Schrodinger-s-cat.git
git checkout feature-login_authentication
kubectl create -f deployment-login-db.yaml
kubectl create -f service-login-db.yaml
kubectl create -f deployment-login.yaml
kubectl create -f service-login.yaml
git checkout feature-nodejs_server
kubectl create -f deployment-node-server.yaml
kubectl create -f service-node-server.yaml
git checkout feature-react_ui
kubectl create -f deployment-front-react.yaml
kubectl create -f service-front-react.yaml
git checkout feature-search_recipes
kubectl create -f deployment-search-db.yaml
kubectl create -f service-search-db.yaml
kubectl create -f deployment-search-server.yaml
kubectl create -f service-search-server.yaml
git checkout feature-smtp_rabbitmq
kubectl create -f deployment-smtp-rabbitmq.yaml
kubectl create -f service-smtp-rabbitmq.yaml
kubectl create -f deployment-smtp-consumer.yaml

Evolution of FoodCode.com

Project 1

As part of project 1, we analyzed the functionality and designed a system with four micro-services. Four micro-services are namely front end, search service, login service, and node service.
Search Service:
- The search service will handle all the search requests from the user. This is developed in Python using Django rest framework. This micro-service will store all its data into a database called "searchDb". searchDb is a PostgresSql database. We have used physical IP addresses of the machine to establish a connection between search service and database.
Login Service:
- The login service will handle all the login requests and maintain user data. This is developed in Java using Spring Rest architecture and hibernate frameworks. This micro-service will store all its data into another database called "loginDb". loginDb is a Postgres database. We have used physical IP addresses of the machine to establish a connection between search service and database.
Node Service:
- The Node service is used to route the requests among micro-services. Any request from user interface manipulated by node micro-service. It is also responsible for routing the requests among micro-service. This micro-service is developed in the ExpressJs framework.
Front End:
- The service is used as a front end to the system. It is developed using React JS which is very easy to develop and manipulate. Apart from providing a front end, this service also maintains user sessions and provides security to the application's endpoints.

Continuous Integration

We implemented continuous integration using TravisCI. This helped us in simplifying the code testing before we pack our code into containers. We have attached a build passing tag to each branch to show its current build status.

Project 2

Though we have a working system with functionality, there were a good number of steps and requirements to be followed to get the system up and running. Upon that, we deployed the system manually after every development cycle.
This will be an overhead for a person to build the system from scratch. Goals of this project were to achieve continuous deployment, one-click deployment, and containerization.

Continuous and one-click deployment:

We used Jenkins to automate deployment.
Jenkins has good user community and functionality to expand its functionality using plug-ins. We implemented Jenkins with one master and one slave configuration. These two hosts can be differentiated by labels. This was done to distribute services among several hosts.
We have developed Jenkins jobs and pipeline scripts specific to the micro-service. These pipeline scripts will have stages. Each stage will do a part of the deployment. If all these stages work fine, the deployment is marked complete and successful.
Each job will listen to commit events from GitHub and build the micro-service in the host specified by the labels.

Containerization:

As we are using three different programming languages to develop micro-services, each micro-service had a set of installation requirements.
It is not necessary that a host would have all the required environments to run our application. In order to avoid these installations, we containerized the applications.
We developed docker containers specific to each application. All the installation requirements, startup steps are defined in the containers. These individual containers can be used to start the micro-service.

Issues and Using Docker Swarm

Initially we used the host's network to manage communication among these micro-services. As the functionality grew, it became tedious to manage the cluster. We depended on the host IP address for communication among containers.
To avoid this problem, we used swarm cluster to deploy these containers as services in a docker swarm cluster.
Swarm cluster has several network types with respective limitations. We have used swarm's overlay network. This kind of network will enable communication among the containers and its respective host. We implemented a swarm cluster with a two master swarm nodes in one cluster network.
With this, we achieved automatic service discovery and load balancing. We attained this using two slave nodes and one master nodes. We also used dual-master mode but ran into authority problems when there were two masters working to manage a cluster.
Swarm cluster has a DNS service that managed the IP addresses of the containers. We have assigned a name to each of our services. This name will be discoverable by containers that are deployed in the same network.

Project 3:

In the previous implementation, we are running all our containers in their respective containers and each micro-service has only one copy of the application serving the requests. Having only one copy will limit the capability of the functionality.
In order to scale this application, we have used swarm's orchestrator. Swarm has the facility to specify the physical definition of containers and the total number of required replications.
Physical definitions include CPU, memory, disk etc. We used this facility of the swarm to replicate services. All these containers will be identified by service name.
Swarm distributes the load in a round-robin fashion. We have conducted load tests on this setup using Jemeter. We installed JMeter external to the cluster and conducted load tests. The results of load tests can be found in the wiki page for assignment 3.

Issues :

On running load test using Jmeter, we discovered many implementation flaws on retrieving requested data. This was identified and resolved in later PRs.

Project 4:

The goal of project 4 is to implement the above architecture in Kubernetes and solve a distributed system problem. We have used one master and three children nodes to set up our application.
This configuration will help in rapid scaling of micro-services. Moreover, we achieved rolling updates, high availability, scalability and fault tolerance.
We have deployed our micro-services as deployments. Deployments in Kubernetes are entities which create replica sets which manage replication of the pods.
The pods deployed are exposed to the outer world using services. Services are entities which are used to map internal ports to node and cluster level ports. Using these specific service names, the micro-services can network with each other.
Deployment files consists of details of the docker image, replication factor and the label associated with the pod. Service files contain the name of the service and selector. Selectors and labels will be used to map deployments and services.

Approaches and Issues

No LoadBalancer provided with OpenStack. You may argue on defining our own load-balancer using zookeeper, but hear our view below in the issue.

Major Issues:

We think the most amount of learning took place in resolving the issues. We have summarized the important ones below.

Login and User Management Service

Building the login container

When we shifted to docker for building and containerizing our code, building the java application's turned out to be a very big challenge.
We decided to pull out the MySQL server (later changed to PostgresSql server) from the docker image and made two images for each functionality. ( Design Decision)
Unlike python or javascript based application, Java based application needs to be packed as a jar if it has to be deployed in a server. We faced a lot of issues in containerizing the code. Now, we have a fully deployable login service image which has http, curl, maven commands imported into the image. And yet it is just 230 MB.
To over come the issue, we first built the maven image, and then using this image, we built our java image and packed it in another image. We were suggested to take a pre built maven image, but we were storing db config files in the container and had to build a custom maven image.

From State-full to Stateless

There were clear improvements on using a stateless implementation of login service than a stateful one.
Using the stateless login service, we could replicate the service at our wish. Issues were with managing the token and handling it at front end and at the routing server.
Future scope for this is to keep a check at the routing server to avoid re-routing of used tokens. Moreover, we can completely eliminate the login functionality from the java based service and implement it at node server.

Replicating the storage server

We learnt that replicating the server to handle high influx of requests is a tough task. We tried with replicating the server, but faced incongruent data across replicas.
One fix we tried to avoid this was to write backend jobs to update the data.
However, we dropped this implementation in final release because there was backend synchronization added extra overhead on the backend which was not manageable on autoscaling.

Front end

Form Headers

We had manually set some headers in http responses. This was done to enable the browser to accept cross site server requests. Though react js is a an independent web application, it doesn't make post requests. It uses browser to make http requests.
Any post request written in this application will be performed by the browser. This brings in a complexity here.
Applications that are directly talking to UI will have to be load balanced by using load balancer server from the cloud provider.

Kubernetes implementation

Follow the official guide (very important)

Installing kubernetes correctly is very important.
Apart from following the installation guide (install from here), execute this command sysctl net.bridge.bridge-nf-call-iptables= 1 on every node before joining it in the cluster. This should be used only if you are using Flannel network.

Max number of children = total number of worker nodes

All our kubernetes services are of type "nodeport". Node port services maps the application's port with a port on the node. Hence, a port on one pod will occupy a port on the respective node. Hence, it is required to increased the total number slave nodes to replicate the deployments

Database Replication:

This application uses the database as containers. If we replicate these containers, they run as individual database load balanced by the orchestration framework. However, as per the needs of the application, all these replicas should be in the same state and share data. We have explored solutions to maintain state.

The first solution uses one of the replicas as master and other replicas as slaves. If one of the master goes down, one of the slaves will become master. All the insert and update requests will be handled by the master while slaves take care of read requests. Master will be responsible for updating the replica pods with the changes in data.
The second solution uses a storage server to share the data among pods. A storage server should be mounted on all the Kubernetes nodes. In this approach, the performance of database pods is limited by the performance of the storage server.
Various frameworks are being developed in Kubernetes to automate the scaling of databases. One such framework for Postgres database is "Postgres operator". The current version of this framework is not stable. Developers are working on developing this framework.

Jenkins Master-Slave configuration:

A Jenkins master node will need a successful ssh connection with its slave node to communicate. Though ssh connection between ssh master and slave node works perfectly, Jenkins master was unable to communicate with the slave node. We have observed this issue when we used public keys and username/password for authentication

Will be updated periodically