can download directly https://github.com/kiwicopple/current-events
- install requirements
pip install -r requirements.txt
- install spacy
pip install -U pip setuptools wheel
pip install -U 'spacy[apple]'
python -m spacy download en_core_web_sm
- set up spacy entity linker
To install the package run:
pip install spacy-entity-linker
Afterwards, the knowledge base (Wikidata) must be downloaded. This can be done by calling
python -m spacy_entity_linker "download_knowledge_base"
This will download and extract a ~500mb file that contains a preprocessed version of Wikidata
- Crawling Wikipedia Pages
- Notebooks, extract events.
python -m src.crawler.twitter_crawler
- structure crawled tweets into json files, save to the folder
data/output/preprocessed/restructured
python -m src.preprocessor.restructure_data
- Convert json into dataframes, save to the folder
data/output/preprocessed/dataframes
=> for populating KB
python -m src.preprocessor.dict2df
- Preprocessing tweets, save to the folder
data/output/preprocessed/final
python -m src.preprocessor.preprocessing
- merge all the preprocessed dataframes
data/output/preprocessed/final/all.csv
=> for entity linking.
python -m src.preprocessor.merge_dataframes
- entity linking for event text
python -m src.entity_linking.wikidata_linker
python src/event_detection/event_tweet_assoc.py
https://andrewhalterman.com/post/event-data-in-30-lines-of-python/