This repository contains datasets we use to evaluate the implementation of the kafka-salsa project. Kafka-salsa is an in-memory, graph-based tweet recommender system implemented on Kafka-Streams, which uses an interaction graph (e.g., likes, writes, retweets) between users and tweets to recommend new tweets to a given user. We crawled our own bipartite graph dataset of user-tweet-interactions to evaluate the performance and quality of different implementation approaches. This repository contains our dataset, which we crawled from the Twitter API using our twitter-cralwer project. The dataset is a CSV in form of user_id, tweet_id, interaction
. Find the full documentation, evaluation metrics and results in the central repository kafka-salsa. Find a description of the dataset and our crawling strategy inside the dataset v1/README.
This repository is part of a larger project. Here is a list of all related repositories:
- kafka-salsa: Reference implementation and project documentation.
- kafka-salsa-evaluation: Evaluation suite for kafka-salsa.
- twitter-cralwer: Twitter API crawler for user-tweet-interaction data.
- twitter-dataset: Crawled datasets of user-tweet-interactions used in evaluation.
- Clone the repository:
git clone git@github.com:philipphager/twitter-dataset.git
or download individual files directly from Github. - If you use the dataset to evaluate kafka-salsa, please see the instructions at kafka-salsa-evaluation.