Transcriptions EDA, Data Cleaning and their Topics Classification

The first task is to fetch the complete paragraph using only the first and last few words of it. The second task is classifying the topics of the now-fetched paragraphs using Machine Learning models (a multilabel classification problem).

The models used are (initially) Random Forest and BERT.

The main notebook and the finalized dataframe are main.ipynb and to_fill_finalized_BERT.csv respectively.

The topic_classification_BERT.ipynb notebook contains the full training code and predictions of the BERT model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Transcriptions EDA, Data Cleaning and their Topics Classification

Files

README.md

Latest commit

History

README.md

File metadata and controls

Transcriptions EDA, Data Cleaning and their Topics Classification