This repository contains the completed code from COMP701 Data Mining and Knowledge Engineering 2020 Assignment 1. Short brief of what the code does: DrugLib.py contains pre-processing steps and 2 algorithms trained on a 75 - 25 train test split that predicts the drug side-effect by using a combined reviews column.
!!NOTE: You will need to download the data set from https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Druglib.com%29 in order for DrugLib.py to read the data set. You will also need to download the following libraries from nltk and textblob.
numpy pandas string textblob = TextBlob nltk.corpus = stopwords nltk.stem = WordNetLemmatizer