Tweet Verification

A deep learning approach to Tweet authorship verification, based on siamese networks. Project currently under development for a deep learning MSc class @ CIn-UFPE.

Datasets

The Tweet dataset used in this project is openly provided by the University of Victoria, and you may read more about it in the following reference.

Marcelo Luiz Brocardo, Issa Traore. “Continuous Authentication using Micro-Messages”, Twelfth Annual International Conference on Privacy, Security and Trust (PST 2014), Toronto, Canada, July 23-24, 2014.

Executing the Code

First, you should have a working Tweet Dev account. Then, substitute the placeholders for your credentials in the tweet_fetcher.py file and execute it. This file will fetch tweets from the aforementioned dataset and store them in an appropriate directory. You must also download the GloVe Twitter 27B 50d embeddings:

$ wget http://nlp.stanford.edu/data/glove.twitter.27B.zip && unzip glove.twitter.27B.zip glove.twitter.27B.50d.txt -d embeddings/

Next, execute the file save_tweets_to_csv.py, which will unify the downloaded tweets into a single csv file.

Afterwards, execute the dataset_creator.py file. You may modify its parameters, such as the number of authors and tweets per author, in the main function.

Finally, you may execute the main.py file from the root of this repository, which will load the data, perform every pre-processing step and then train our model together with a comparable one. e.g.

$ python src/main.py tweet_data/ embeddings/glove.twitter.27B.50d.txt data/datasets/same_authors_train.csv data/datasets/same_authors_val.csv data/datasets/same_authors_test.csv --variation=none

For more information on the accepted parameters and possible variations, execute it with the -h parameter.

Please note that while we offer support for our baseline models through the svm.py file, we do not train them in the main pipeline.

Experiments

All experiments for these models have been conducted using Google Collab.

Authors

João Pedro Magalhães

Vinícius Cousseau

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
twokenize.py		twokenize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet Verification

Datasets

Executing the Code

Experiments

Authors

About

Releases

Packages

Contributors 2

Languages

vinimoraesrc/tweetverification

Folders and files

Latest commit

History

Repository files navigation

Tweet Verification

Datasets

Executing the Code

Experiments

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages