Skip to content

Latest commit

 

History

History
14 lines (12 loc) · 2.05 KB

README.md

File metadata and controls

14 lines (12 loc) · 2.05 KB

Twitter Interaction Datasets

Abstract

This repository contains datasets we use to evaluate the implementation of the kafka-salsa project. Kafka-salsa is an in-memory, graph-based tweet recommender system implemented on Kafka-Streams, which uses an interaction graph (e.g., likes, writes, retweets) between users and tweets to recommend new tweets to a given user. We crawled our own bipartite graph dataset of user-tweet-interactions to evaluate the performance and quality of different implementation approaches. This repository contains our dataset, which we crawled from the Twitter API using our twitter-cralwer project. The dataset is a CSV in form of user_id, tweet_id, interaction. Find the full documentation, evaluation metrics and results in the central repository kafka-salsa. Find a description of the dataset and our crawling strategy inside the dataset v1/README.

Repository Overview

This repository is part of a larger project. Here is a list of all related repositories:

Installation

  1. Clone the repository: git clone git@github.com:philipphager/twitter-dataset.git or download individual files directly from Github.
  2. If you use the dataset to evaluate kafka-salsa, please see the instructions at kafka-salsa-evaluation.