SNACES (Social Network Algorithm Contained Experiment System) is a Python library for downloading and analyze Twitter data, by making use of the Tweepy library and the Twitter API.
The last update of this document is completed in Jan 10, 2023.
In order to make use of the Twitter API, and the tweepy package, you will need credential for the twitter API.
To retrieve these credentials sign up for a developer twitter account here.
Getting access may take several days. Once your application is approved,
you will get four keys: consumer key
, consumer secret
, access token
, and
access token secret
. Enter these four values into the file
./conf/credentials.py
.
You will need MongoDB for data storage.
- Clone Git repository to your workspace
- Run the install script
./scripts/install.sh
- Run
python ./setup.py
to setup the Pipfile - Run
pipenv shell
to start a pip environment using the pip file
Note that you might need to do the following in the pip environment:
- pip install python-dateutil
- pip install matplotlib
- create /core/log folder
The default path we are using is:
/src/scripts/config/create_social_graph_and_cluster_config.yaml
A lot of work has been done on the clustering trial branch which focuses on core detection.
See clustering_trial_requirements.txt
for the required packages for the clustering trial branch. Note: The pygraphviz package may require additional installation steps. See here.
There may be some problems with the pipenv shell above. An alternative is to use a conda environment with Python 3.9. After installing conda and activating your environment, run pip install -r clustering_trial_requirements.txt
to install the required packages.
If there are some issues with pip conflicting, while in the conda environment, try the following to create a virtual environment and install the required packages:
python -m venv env
source env/bin/activate
pip install -r clustering_trial_requirements.txt
Once the required packages are installed, you can run the core detection algorithm by executing
python detect_core_jaccard.py -n "hardmaru" -act "user retweets"
-n
stands for the seed user and -act
represents the chosen activity set. The available activity sets are user retweets
, friends
, and user retweets ids
.
- The main program can be started by running
python ./SNACES.py
. - This will trigger the main program to loop, which will then prompt you
to input options for which process to trigger:
Download
downloads information from twitterRaw Tweet Processing
processes raw tweetsWord Frequency
performs word frequency operations on collected dataSocial Graph
constructs a social graph from downloaded friendsClustering
performs clustering algorithms on dataCommunity Expansion
performs community expansion on dataCommunity Expansion - Create Graph
produce graphs for results in community expansion
The only one we are currently using is Community Expansion
For core detection, please checkout branch clustering_trials
for the latest version of core detection algorithm
The dao module includes all the getter and setters that connects the data storage with our program.
The model module includes the instances such as User and Tweets that are used in our program
Utility Functions tell us activities of a user in a community, and they take a huge part in our analysis. We are actively exploring new utility functions.
The implementations of utility rankers are located in
/src/process/ranking/consumption_utility_ranker.py
The main code for community expansion is in
/src/process/community_expansion/
You will also use code in the following path for data analysis:
/src/process/data_analysis/
Please checkout branch clustering_trials
for the latest version of core detection algorithm.