tl;dr:
python metaclip/build_metadata.py
metaclip/build_metadata.py:wordnet_synsets
pip install nltk
python -m nltk.downloader wordnet
python -m nltk.downloader omw-1.4
metaclip/build_metadata.py:wiki_unigram
Keep unigrams more than 100
occurences.
metaclip/build_metadata.py:wiki_bigrams
Computing pointwise mutual information more than 30.
metaclip/build_metadata.py:wiki_title
Keep view frequency more than 70
.
We randomly sample 25 days of pageviews from past 5 years.