This contains EDA and pipeline formation for cyberbullying dataset (youtub_parserd_dataset)

To access the pipeline go to pipeline.ipynb to test the pipeline on other dataset for "youtube_parsed_dataset".

Changes

Another file FinalPipeline2.ipynb is made in which we used numerical_columns as test parameters instead of processed_text so as to evaluate Accuracy and AUC scores on it.

FinalPipeline_Report.ipynb and FinalPipeline2_Report.ipynb are added in which I tried to provide a good explaination of code and its output for both FinalPipeline.ipynb and FinalPipeline2.ipynb respectively.

Test Resuts Random forest : Achieved AUC score of around 0.99 and outperformed every other model in both cases. Logistic Regression: Achieved an accuracy of approximately 91.9% and an AUC score of about 97.1%, indicating good per formance in distinguishing between cyberbullying and non-bullying instances. XGBoost: Outperformed logistic regression with higher accuracy (~94.8%) and AUC score (~98.8%), suggesting superior predictive capabilities. Decision Tree: Although providing acceptable performance with an accuracy of around 90.1% and an AUC score of about 90.4%, it lags behind logistic regression and XGBoost.

Change 2

LSTM model is integrated in a pipeline. The function is saved inside LSTM_pipeline.ipynb. AUC of 96.5 is obtained using LSTM model. Code description is also provided in the end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This contains EDA and pipeline formation for cyberbullying dataset (youtub_parserd_dataset)

To access the pipeline go to pipeline.ipynb to test the pipeline on other dataset for "youtube_parsed_dataset".

Another file FinalPipeline2.ipynb is made in which we used numerical_columns as test parameters instead of processed_text so as to evaluate Accuracy and AUC scores on it.

FinalPipeline_Report.ipynb and FinalPipeline2_Report.ipynb are added in which I tried to provide a good explaination of code and its output for both FinalPipeline.ipynb and FinalPipeline2.ipynb respectively.

LSTM model is integrated in a pipeline. The function is saved inside LSTM_pipeline.ipynb. AUC of 96.5 is obtained using LSTM model. Code description is also provided in the end.

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
cyberbullying		cyberbullying
EDA copy.ipynb		EDA copy.ipynb
EDA.ipynb		EDA.ipynb
FinalPipeline.ipynb		FinalPipeline.ipynb
FinalPipeline2.ipynb		FinalPipeline2.ipynb
FinalPipeline2_Report.ipynb		FinalPipeline2_Report.ipynb
FinalPipeline_Report.ipynb		FinalPipeline_Report.ipynb
LSTM_pipeline.ipynb		LSTM_pipeline.ipynb
README.md		README.md

polarisdhruv/GSOC_CALEL_TEST

Folders and files

Latest commit

History

Repository files navigation

This contains EDA and pipeline formation for cyberbullying dataset (youtub_parserd_dataset)

To access the pipeline go to pipeline.ipynb to test the pipeline on other dataset for "youtube_parsed_dataset".

Another file FinalPipeline2.ipynb is made in which we used numerical_columns as test parameters instead of processed_text so as to evaluate Accuracy and AUC scores on it.

FinalPipeline_Report.ipynb and FinalPipeline2_Report.ipynb are added in which I tried to provide a good explaination of code and its output for both FinalPipeline.ipynb and FinalPipeline2.ipynb respectively.

LSTM model is integrated in a pipeline. The function is saved inside LSTM_pipeline.ipynb. AUC of 96.5 is obtained using LSTM model. Code description is also provided in the end.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages