Natural Language Processing (NLP) project.
Using Term Frequency-Inverse Document Frequency (TF-IDF) and non-negative matrix factorization (NMF), I performed a topic modelling on the tweets of Donald Trump from July 19th to December 16th 2020. The dataset is originated from Kaggel.
Here are the steps that I followed:
- Step 1: Reading the data
- Step 2: Exploring the data
- Step 3: Cleaning the dataset
- Step 4: Tokenization
- step 5: Stopwords removal
- step 6: Lemmatization
- step 7: Topic modelling
Please see the Jupiter Notebook: Trump_Tweets_Topic_modelling.ipynb