Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 590 Bytes

File metadata and controls

10 lines (6 loc) · 590 Bytes

Transcriptions EDA, Data Cleaning and their Topics Classification

The first task is to fetch the complete paragraph using only the first and last few words of it. The second task is classifying the topics of the now-fetched paragraphs using Machine Learning models (a multilabel classification problem).

The models used are (initially) Random Forest and BERT.

The main notebook and the finalized dataframe are main.ipynb and to_fill_finalized_BERT.csv respectively.

The topic_classification_BERT.ipynb notebook contains the full training code and predictions of the BERT model.