The purpose of the project was to familiarize us with the basic steps of the process followed for applying data mining techniques, namely: collection, preprocessing / cleaning, conversion, application of data mining techniques and evaluation. Implementation was done in the Python programming language using the SciKit Learn and Keras tool. The thesis consists of two (2) tasks related to categorization, duplication detection.
Assignment directions are available in BigData-2020-2021-english.pdf
Two (2) separate competitions have been created for the requirements of the job on the Kaggle platform.
https://www.kaggle.com/c/bigdata2021duplicatedetection/leaderboard https://www.kaggle.com/c/bigdata2021classification/leaderboard