The first task is to fetch the complete paragraph using only the first and last few words of it. The second task is classifying the topics of the now-fetched paragraphs using Machine Learning models (a multilabel classification problem).
The models used are (initially) Random Forest and BERT.
The main notebook and the finalized dataframe are main.ipynb
and to_fill_finalized_BERT.csv
respectively.
The topic_classification_BERT.ipynb
notebook contains the full training code and predictions of the BERT model.