Group 14 -- H2_B21_14 (IIT-Indore)
Awarded Silver Medal in the final result.
For technical details about the project, refer to Technical Report.pdf
and Presentation.pdf
.
For Headline Generation (Task3 and Evaluation), first run the scripts Pegasus_FineTune.ipynb
and T5_FineTune.ipynb
to generate savedmodels for inference.
The two output files, Output1 and Output2 are stored in the folder Outputs
within the base directory.
Of the four models we stated in our report for headline generation, we went with the Pegasus standalone model, by the virtue of better metrics across majority of the metrics.
Task 1 takes 4.18 minutes (Theme Classification)
Task 2 takes 4.25 minutes (Aspect based Sentiment Classification)
Task 3 takes 33 minutes (Headline Generation)
Preprocessing may take a variable amount of time, depending upon the number of requests to the Google Translation API by a given IP Address, and may be impacted by unavailibility of service due to high amount of API requests (in the scenario of running the code multiple times).
{For stable runtimes of the translation process, the paid version of the API can be used. We, however, have used the open source version of the same, in accordance to the rules of the competition.}
The entire folder is arranged in the following manner:
There are five notebook scripts located in the base directory. A short description about them is as follows:
- Task1&2.ipynb: Used to preprocess the dataset, and train distilBERT classifier and implement ABSA code for mobile brand and its corresponding sentiment analysis.
- Task3.ipynb: Used to obtain metrics for different Headline Generation models, via Fine-tuned saved models.
- Pegasus_FineTune.ipynb: Used to fine tune Pegasus for summarization task on the given dataset.
- T5_FineTune.ipynb: Used to fine tune T5 model for summarization task on the given dataset.
- Evaluation.ipynb: Used for evaluation of the testing data. Contains code for all three tasks.
Consists of two output files: Output1 and Output2. A description about them is as follows:
- Output1.csv: Contains TextID, Predicted Labels, and generated headlines.
- Output2.csv: Contains TextID, Predicted Labels and Extracted Brands and their sentiments.
- Preprocessed_Text.csv: Consists of text data post preprocessing and translation. Located in the base directory
There are three folders containg saved models finetuned on the training dataset. They are:
- T5: Contains T5 finetuned model for Headline Generation.
- DistilBERT: Contains DistilBERT finetuned model for Theme Classification
- Pegasus: Contains Pegasus finetuned model for Headline Generation.
The requirements folder comprises of five txt
files, containing required libraries version for each of the five notebooks mentioned above.
Contains training data as provided under the problem statement.
Comprises of evaluation dataset as provided.
Contains position standings for the entire event.
Team Leader - Aryan Verma
Aryan Rastogi with Vardhan Paliwal coded the project, created the Presentation and contributed in the Technical Report.
Tarun Gupta served as the principal advisor for the entire project.
Aryan Verma and Smit Patel wrote the Technical Report.
Manav Trivedi, Gadge Prafull, Yash Kothekar, Siddhesh Shelke and Shivprasad Kadam contributed to the Technical Report.