Overview

Parse Reddit's /r/soccer to associate adjectives with soccer teams. Given an archive of comments, find out what adjectives best describe teams.

Usage

Install Ruby.
Install bundler: http://bundler.io/
From the root directory, run ./scripts/run.rb --input-file INPUT-FILE --config-file CONFIG-FILE [--phases PHASES] [--debug]. The input file must be a .csv with comment body in the first column and comment id in the second.

Example: ./scripts/run.rb --input-file input/sample.csv --config-file config/teams.yaml. Note that the output for it is likely to be empty because there are too few adjectives in the sample input, and they are likely to be excluded by the popularity filter.

Real data can be downloaded from https://bigquery.cloud.google.com/dataset/fh-bigquery:reddit_comments

Query tables with SELECT body, id FROM <table-name> WHERE subreddit = 'soccer'.

Algorithm

Phase 1

Count team name/adjective pairs used in the same sentence.

Phase 2

Filter out blacklisted adjectives (nationalities, colors, ...).
Exclude N most popular adjectives: they are too generic.
Score adjectives. Promote somewhat unusual words.
Keep only M adjectives per team.

Phase 3

Export results to .csv files per league.

Results and Publications

Original post on Reddit

BBC Article

Mirror Article

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
config		config
input		input
lib/entity_adjectives		lib/entity_adjectives
scripts		scripts
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.rubocop_todo.yml		.rubocop_todo.yml
.travis.yml		.travis.yml
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Usage

Algorithm

Phase 1

Phase 2

Phase 3

Results and Publications

About

Releases

Packages

Languages

ainzzorl/soccer-team-adjectives

Folders and files

Latest commit

History

Repository files navigation

Overview

Usage

Algorithm

Phase 1

Phase 2

Phase 3

Results and Publications

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages