This document illustrate the steps to configure & install the required
dependencies for running the vlabs-analytics-service
application
Here we use the setuptools
module from the standard lib, to make a
setup.py
file, which installs all the python library dependencies required
to run the application.
from setuptools import setup
requires = [
'flask',
'flask-cors',
'flask-testing',
'requests',
'pyyaml',
'GitPython',
'gunicorn'
]
setup(
name='vlabs-analytics-service',
version='v0.0.1',
install_requires=requires
)
- The Web Server Gateway Interface (WSGI) is a specification for simple and universal interface between web servers and web applications or frameworks for the Python programming language.
- This application runs behind
nginx
webserver. - Following code snippet in
wsgi.py
makes the connection betweennginx
andflask
python’s micro framework.
import sys, os
sys.path.insert(0, "/usr/share/nginx/html/")
from runtime.rest.app import create_app
from runtime.config import flask_app_config as config
application = create_app(config)
description "Gunicorn application server runninng anlytics-service"
start on runlevel [2345]
stop on runlevel [!2345]
respawn
setuid root
setgid www-data
chdir /usr/share/nginx/html/deployment
exec gunicorn --workers 3 --bind unix:analytics-service.sock -m 007 wsgi
server {
listen 80;
server_name localhost;
location / {
include proxy_params;
proxy_pass http://unix:/usr/share/nginx/html/deployment/analytics-service.sock;
}
}
- Install
pip
andnginx
serversudo apt-get update sudo apt-get install python-pip python-dev nginx
- Create a virtual environment for python
virtualenv analytics-service
- Install
virtualenv
packagesudo pip install virtualenv
- Activate the virtual environment
source analytics-service/bin/activate
- Clone the repository
git clone https://github.com/vlead/vlabs-analytics-service
- Checkout branch to
develop
and build the sourcescd vlabs-analytics-service git checkout develop make readtheorg=true
- Install pre-requisites inside virutal environment
cd build/code/deployment python setup.py install
- export
PYTHONPATH
tobuild/code
to run the applicationcd build/code export PYTHONPATH=$(pwd)
- Configure application variables in
runtime/config/system_config.py
# Application URL APP_URL = "http://localhost:5000"
# Configure key KEY = "defaultkey"
# Lab Data Service URL LDS_URL = "http://lds.vlabs.ac.in"
# Analytics database (i.e elasticsearch) URL ANALYTICS_DB_URL = "http://192.168.33.3"
# Analytics database (i.e elasticsearch) indexes & doc_types to store the # analytics data ## Index to store vlabs analytics VLABS_USAGE = "vlabs"
## Types to store openedx & nonopenedx usages OPENEDX_USAGE = "openedx_usage" NONOPENEDX_USAGE = "nonopenedx_usage"
# PATH to analytics of nonopenedx labs file which is copied from # stats.vlabs.ac.in server NONOPENEDX_USAGE_INFO_FILE_PATH = "/home/sripathi/output.txt"
###Credentials to analytics-db USER="username" PASSWORD="password"
- Run flask application server
cd build/code/runtime/rest python app.py
- Access application on browser
firefox http://localhost:5000
vlabs-analytics-service
aggregates all the analytics data from different
services. This is achieved by setting up cronjob
on all services to push
the analytics to analytics-db
via REST APIs of vlabs-analytics-service
for every regular interval of time
- This server contains all the analytics of the labs (usage, hits and visits)
running on
nonopenedx
platform - Usage, hits and visits of labs running on nonopendx platform are processed by erlang program and output statistics results were stored into **output.txt** file on **stats.vlabs.ac.in** server and this will happen for every 2 hrs regular interval of time
- Every line in
output.txt
file has the following formatlab_id, lab_name, hits, visits, usages
- location of
output.txt
on servercd /root/
- Setup cron job to copy the source path file
/root/output.txt
ofstats.vlabs.ac.in
server to destination path/root/nonopenedx-usage.txt
ofvlabs-analytics-service
server for every3hrs
of interval* 3 * * * root rsync -avz /root/output.txt root@vlabs-analytics.vlabs.ac.in:/root/nonopenedx-usage.txt
- Ensure that value of configuration variable
NONOPENEDX_USAGE_INFO_FILE_PATH
at link is same as step(5) destination path
Openedx-platform VM running vlabs (http://vlabs.ac.in)
java
version 8
is the pre-requisite to install elasticsearch
sudo apt-add-repository ppa:webupd8team/java -y sudo apt-get update -y echo 'oracle-java8-installer shared/accepted-oracle-license-v1-1 select true' | sudo debconf-set-selections sudo apt-get install oracle-java8-installer -y
Download and install the Public Signing Key:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
You may need to install the apt-transport-https package on Debian before proceeding:
sudo apt-get install apt-transport-https
Run sudo apt-get update and the repository is ready for use. You can install it with:
sudo apt-get update && sudo apt-get install logstash
To run logstash
service
service logstash start
- Configuration to dump the
login
andlogout
nginx
server logs into theanalytics-db
(i.e elasticsearch) database service - Copy below code snippet into
/etc/logstash/conf.d/analytics.conf
input { file { path => "/home/sripathi/test-logs.log" start_position => "beginning" } } filter { grok { match => ["message", "%{IP:clientip} \- \- \[%{MONTHDAY:day}/%{MONTH:month}/%{YEAR:year}\:%{TIME:time} \+%{INT:zone}\] \"%{WORD:method} %{URIPATHPARAM:api_endpoint} %{URIPROTO:protocal}/%{NUMBER:version}\" %{INT:status_code} %{INT:byte} %{NUMBER:byte1} \"%{URI:referrer}"] } geoip { source => "clientip" } if [month] == "Jan" { mutate { replace => { "month" => "01" } } } else if [month] == "Feb" { mutate { replace => { "month" => "02" } } } else if [month] == "Mar" { mutate { replace => { "month" => "03" } } } else if [month] == "Apr" { mutate { replace => { "month" => "04" } } } else if [month] == "May" { mutate { replace => { "month" => "05" } } } else if [month] == "Jun" { mutate { replace => { "month" => "06" } } } else if [month] == "Jul" { mutate { replace => { "month" => "07" } } } else if [month] == "Aug" { mutate { replace => { "month" => "08" } } } else if [month] == "Sep" { mutate { replace => { "month" => "09" } } } else if [month] == "Oct" { mutate { replace => { "month" => "10" } } } else if [month] == "Nov" { mutate { replace => { "month" => "11" } } } else { mutate { replace => { "month" => "12" } } } mutate { add_field => { "date" => "%{year}-%{month}-%{day}" } remove_field => ["year", "month", "day", "path", "host"] } if [api_endpoint] == "/dashboard" { if [status_code] != "200" { drop {} } else if [referrer] != "https://vlabs.ac.in/login?next=/dashboard" { drop {} } } else if [api_endpoint] == "/logout" { if [status_code] != "302" or [referrer] == "https://vlabs.ac.in/" { drop {} } } else { drop {} } } output { elasticsearch { hosts => "192.168.33.3:80" user => "user" password => "pswd" index => "vlabs" document_type => "openedx_user_session_analytics_%{date}" } }
- This scripts gets the user analytics (registred, active and inactive) from openedx mysql database and forms the json record.
- Also it pushes obtained json data in step(1) into analytics database (i.e elasticsearch)
#!/usr/bin/python
import MySQLdb
import json
import datetime
import requests
cursor = None
db = None
analytics_db_url = "http://192.168.33.3"
analytics_db_user = "<username>"
analytics_db_password = "<password>"
mysql_db_url = "localhost"
mysql_user = "<username>"
mysql_password = "<password>"
mysql_db = "edxapp"
def connect_db():
try:
global db
global cursor
db = MySQLdb.connect(mysql_db_url, mysql_user, mysql_password, mysql_db)
cursor = db.cursor()
except Exception as e:
print "Exception = %s" % (str(e))
exit(1)
def dis_connect_db():
try:
db.close()
except Exception as e:
print "Exception = %s" % (str(e))
exit(1)
def get_users_count(query):
try:
cursor.execute(query)
results = cursor.fetchall()
for row in results:
users_count = row[0]
return int(users_count)
except:
print "Error: unable to fecth data"
exit(1)
def push_data_to_analytics_db(data_dict):
index = "vlabs"
doc_type = "openedx_user_analytics"
date = data_dict['date']
analytics_db_api = "%s/%s/%s/%s" % \
(analytics_db_url, index, doc_type, date)
auth = (analytics_db_user, analytics_db_password)
headers = {'Content-Type': 'application/json'}
try:
r = requests.post(analytics_db_api, auth=auth, \
data=json.dumps(data_dict), headers=headers)
if r.status_code == 201 or r.status_code == 200:
print "data_dict is added = %s" % (data_dict)
else:
print "Error in adding data_dic = %s" % (data_dict)
except Exception as e:
print "Exception = %s" % (str(e))
exit(1)
if __name__== "__main__":
### Connect to mysql database
connect_db()
### Form SQL Queries
active_users_query = "SELECT count(*) FROM auth_user WHERE is_active=1"
inactive_users_query = "SELECT count(*) FROM auth_user WHERE is_active=0"
registered_users_query = "SELECT count(*) FROM auth_user"
### Get users count
active_users = get_users_count(active_users_query)
inactive_users = get_users_count(inactive_users_query)
registered_users = get_users_count(registered_users_query)
### Form json dict
today_date = str(datetime.datetime.today()).split()[0]
data_dict = {
"date" : today_date,
"registered_users" : registered_users,
"active_users" : active_users,
"inactive_users" : inactive_users
}
### disconnect database connection
dis_connect_db()
### push data to anlytics db
push_data_to_analytics_db(data_dict)
print "deployment package"
<<logged_in_users>>
<<openedx_user_analytics>>