e-mission is a project to gather data about user travel patterns using phone apps, and use them to provide an personalized carbon footprint, and aggregate them to make data available to urban planners and transportation engineers.
It has two components, the backend server and the phone apps. This is the backend server - the phone apps are available in the e-mission-phone repo
The backend in turn consists of two parts - a summary of their code structure is shown below.
-
The webapp supports a REST API, and accesses data from the database to fulfill
the queries. A set of background scripts pull the data from external sources, and
preprocessing results ensures reasonable performance.
-
Install Mongodb
-
Windows: mongodb appears to be installed as a service on Windows devices and it starts automatically on reboot
-
OSX: You want to install homebrew and then use homebrew to install mongodb. Follow these instruction on how to do so ---> (http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-x/)
-
Ubuntu: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/
-
Start it at the default port
$ mongod
We will use a distribution of python that is optimized for scientific computing. The anaconda distribution is available for a wide variety of platforms and includes the python scientific computing libraries (numpy/scipy/scikit-learn) along with native implementations for performance. Using the distribution avoids native library inconsistencies between versions.
The distribution also includes its own version of pip, and a separate package management tool called 'conda'.
Make sure you install Python 2.7 because some libraries used in this code repository do not support Python 3.5 yet.
After you install the anaconda distribution, please ensure that it is in your
path, and you are using the anaconda versions of common python tools such as
python
and pip
, e.g.
$ which python
/Users/shankari/OSS/anaconda/bin/python
$ which pip
/Users/shankari/OSS/anaconda/bin/pip
$ pip install -r requirements.txt
# If you are running this in production over SSL, copy over the cherrypy-wsgiserver
$ cp api/wsgiserver2.py <dist-packages>/cherrypy/wsgiserver/wsgiserver2.py
Run "bower install" instead if you are prompted password for 'https://github.com' after running "bower update".
$ cd webapp
$ bower update
In order to test out changes to the webapp, you should make the changes locally, test them and then push. Then, deployment is as simple as pulling from the repo to the real server and changing the config files slightly.
Here are the steps for doing this:
-
On OSX, start the database (Note: mongodb appears to be installed as a service on Windows devices and it starts automatically on reboot).
$ mongod
-
Copy the following sample files. You should also configure the servers and keys in them if you wish to test the associated features, but can leave them filled with dummy values if you don't.
# For the location -> name reverse lookup. Client will lookup if not populated. $ cp conf/net/ext_service/nominatim.json.sample conf/net/ext_service/nominatim.json # Store entries to the stats database. Currently required, dependency should be removed soon $ cp conf/net/int_service/giles_conf.json.sample conf/net/int_service/giles_conf.json # Game integration. $ cp conf/net/ext_service/habitica.json.sample conf/net/int_service/habitica.json
-
Start the server
$ ./e-mission-py.bash emission/net/api/cfc_webapp.py
-
Test your connection to the server
- Using a web browser, go to http://localhost:8080
- Using the iOS emulator, connect to http://localhost:8080
- Using the android emulator:
- change
server.host
inconf/net/api/webserver.conf
to 0.0.0.0, and - connect the app to the special IP for the current host in the android emulator - 10.0.2.2
- change
You may also want to load some test data.
-
Sample timeline data from the data collection eval is available in the data-collection-eval repo.
-
You can choose to load either android data
results_dec_2015/ucb.sdb.android.{1,2,3}/timeseries/*
or iOS dataresults_dec_2015/ucb.sdb.ios.{1,2,3}/timeseries/*
-
Data is loaded using the
bin/debug/load_timeline_for_day_and_user.py
. It requires a timeline file and a user that the timeline is being loaded as. If you wish to view this timeline in the UI after processing it, you need to login with this email.$ cd ..../e-mission-server $ ./e-mission-py.bash bin/debug/load_timeline_for_day_and_user.py /tmp/data-collection-eval/results_dec_2015/ucb.sdb.android.1/timeseries/active_day_2.2015-11-27 shankari@eecs.berkeley.edu
-
Note that loading the data retains the object IDs. This means that if you load the same data twice with different user IDs, then only the second one will stick. In other words, if you load the file as
user1@foo.edu
and then load the same file asuser2@foo.edu
, you will only have data foruser2@foo.edu
in the database. This can be overwritten using the--make-new
flag - e.g.$ ./e-mission-py.bash bin/debug/load_timeline_for_day_and_user.py -n /tmp/data-collection-eval/results_dec_2015/ucb.sdb.android.1/timeseries/active_day_2.2015-11-27 shankari@eecs.berkeley.edu
You may need a larger or more diverse set of data than the given test data supplies. To create it you can run the trip generation script included in the project.
The script works by creating random noise around starting and ending points of trips.
You can fill out options for the new data in emission/simulation/input.json. The different options are as follows
- radius - the number of kilometers of randomization around starting and ending points (the amount of noise)
- starting centroids - addresses you want trips to start around, as well as a weight defining the relative probability a trip will start there
- ending centroids - addresses you want trips to end around, as well as a weight defining the relative probability a trip will end there
- modes - the relative probability a user will take a trip with the given mode
- number of trips - the amount of trips the simulation should create
run the script with
$ python emission/simulation/trip_gen.py <user_name>
Because this user data is specifically designed to test our tour model creation, you can create fake tour models easily by running the make_tour_model_from_fake_data
function in emission/storage/decorations/tour_model_queries.py
Once you have loaded the timeline, you probably want to segment it into trips and sections, smooth the sections, generate a timeline, etc. We have a unified script to do all of those, called the intake pipeline. You can run it like this.
$ ./e-mission-py.bash bin/intake_stage.py
Once the script is done running, places, trips, sections and stops would have been generated and stored in their respective mongodb tables, and the timelines for the last 7 days have been stored in the usercache.
We also do some modelling on the generated data. This is much more time-intensive than the intake, but also does not need to run at the same frequency as the intake pipeline. So it is pulled out to its own pipeline. If you want to work on the modelling, you need to run this pipeline as well.
$ ./e-mission-py.bash bin/model_stage.py
-
Make sure that the anaconda python is in your path
$ which python /Users/shankari/OSS/anaconda/bin/python
-
Run all tests.
$ ./runAllTests.sh
-
If you get import errors, you may need to add the current directory to PYTHONPATH.
$ PYTHONPATH=. ./runAllTests.sh
Several exploratory analysis scripts are checked in as ipython notebooks into
emission/analysis/notebooks
. All data in the notebooks is from members of the
research team who have provided permission to use it. The results in the
notebooks cannot be replicated in the absence of the raw data, but they can be
run on data collected from your own instance as well.
The notebooks are occasionally modified and simplified as code is moved out of them into utility functions. Original versions of the notebooks can be obtained by looking at other notebooks with the same name, or by looking at the history of the notebooks.
From the webapp directory
$ npm install karma --save-dev
$ npm install karma-jasmine karma-chrome-launcher --save-dev
Write tests in www/js/test If you're interested in having karma in your path and globally set, run
$ npm install -g karma-cli
To run tests if you have karma globally set, run
$ karma start my.conf.js
in the webapp directory. If you didn't run the -g command, you can run tests with
$ ./node_modules/karma/bin/karma start
in the webapp directory
-
If a python execution fails to import a module, make sure to add current directory to your PYTHONPATH.
-
If starting the server gives a CONNECTION_ERROR, make sure MongoDB is actively running when you attempt to start the server.
-
After running MongoDB, if you get an error that says
dbpath does not exist
(on Windows) orData directory /data/db not found
(on Mac), make sure to manually create the data directory as follows.on Windows % md c:\data\db\
or
on Mac (the user account running mongod must have read and write permissions for the data directory)
$ mkdir -p /data/db
$ chmod 777 /data/db
This site is currently designed to support commute mode tracking and aggregation. There is a fair amount of backend work that is more complex than just reading and writing data from a database. So we are not using any of the specialized web frameworks such as django or rails.
Instead, we have focused on developing the backend code and exposing it via a simple API. I have maintained separation between the backend code and the API glue so that we can swap out the API glue later if needed.
The API glue is currently Bottle, which is a single file webapp framework. I chose Bottle because it was simple, didn't use a lot of space, and because it wasn't heavy weight, could easily be replaced with something more heavyweight later.
The front-end is javascript based. In order to be consistent with the phone, it also uses angular + ionic. javascript components are largely managed using bower.
If you want to use this for anything other than deployment, you should really run it over SSL. In order to make the development flow smoother, if the server is running over HTTP as opposed to HTTPS, it has no security. The JWT basically consists of the user email in plain text. This means that anybody who knows a users' email access can download their detailed timeline. This is very bad.
If you are using this to store real, non-test data, use SSL right now
If you are running this in production, you should really run it over SSL. We use cherrypy to provide SSL support. The default version of cherrypy in the anaconda distribution had some issues, so I've checked in a working version of the wsgiserver file.
TODO: clean up later
$ cp api/wsgiserver2.py <dist-packages>/cherrypy/wsgiserver/wsgiserver2.py
Also, now that we decode the JWTs locally, we need to use the oauth2client, which requires the PyOpenSSL library. This can be installed on ubuntu using the python-openssl package, but then it is not accessible using the anaconda distribution. In order to enable it for the conda distribution as well, use
$ conda install pyopenssl
Also, installing via requirements.txt
does not appear to install all of the
requirements for the google-api-client. If you get mysterious "Invalid token"
errors for tokens that are correctly validated by the backup URL method, try to
uninstall and reinstall with the --update option.
$ pip uninstall google-api-python-client
$ pip install --upgrade google-api-python-client
If you are running the server on shared, cloud infrastructure such as AWS, then note that the data is accessible by AWS admins by directly looking at the disk. In order to avoid this, you want to encrypt the disk. You can do this by:
- using an encrypted EBS store, but this doesn't appear to allow you to specify your own encryption key
- using a normal drive that is encrypted using cryptfs (http://sleepyhead.de/howto/?href=cryptpart, https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_a_non-root_file_system). The standard AWS ubuntu AMI appears to have LUKS enabled, so you can follow the instructions with LUKS.
ubuntu@ip-10-203-173-119:/home/e-mission$ cryptsetup --help | grep luks
for luksFormat
-M, --type=STRING Type of device metadata: luks, plain,
...
In either of these cases, you need to reconfigure mongod.conf to point to data and log directories in the encrypted volume.
If you are associated with the e-mission project and will be integrating with our server, then you can get the key files from: https://repo.eecs.berkeley.edu/git/users/shankari/e-mission-keys.git
If not, please get your own copies of the following keys:
- Google Developer Console (stored in conf/net/keys.json)
- iOS key (
ios_client_key
) - webApp key (
client_key
)
- iOS key (
- Parse (coming soon)