- Louis Amaudruz
- Andrej Janchevski
- Timoté Vaucher
In this project, we take a look at the binary classification of tweets. We need to predict whether the original tweet contained a positive or a negative emoji. To this end, we first use state-of-the-art data preprocessing, identify task-specific important features and devise four models: a classic ML baseline, a GRU model using GloVe embeddings and two transfer-learning models based on ULMfit and BERT respectively. The best classifier we found is the BERT model which yields a 0.904 accuracy and F-1 score on the test set in the competition.
To run our final model for the evaluation, please proceed to the BERT model README to get the setup and information. If you wish to consult other models, please proceed to their corresponding folders.
Link to the Competition leaderboard. Our team finished 2nd out of 37 participating teams / indivduals.
Model | Accuracy | F1-score |
---|---|---|
Classic ML | 0.770 | 0.783 |
GloVe + GRUs | 0.881 | 0.883 |
ULMfit | 0.885 | 0.886 |
BERT (bert-base-uncased) | 0.904 | 0.904 |