StackoverflowQuestions

How to use:

Download the training and testing data from the competition and place it in a 'data' folder (create one if it does not exist).
Run scripts/deduplicate.py to remove duplicate samples of the training data set and store the repeated testing set indices.
Run 'python trainer.py generatePreprocess' to create the tfidf and cv models.
Run scripts/calculate_distribution.py to create the inverse tag ordering/mapping.
Run 'python trainer.py' to generate the model.
Run 'python predictor.py' to generate predictions on the non-repeated test samples.
Run scripts/zipper.py to generate the final submission file.

Provide feedback