On no! Since its creation, the famous school of wizards, Hogwarts, had never known such an offense. The forces of evil have bewitched the Sorting Hat. It no longer responds, and is unable to fulfill his role of sorting the students to the houses.
The new academic year is approaching. Gladly, the Professor McGonagall was able to take action in such a stressful situation, since it is impossible for Hogwarts not to welcome new students. . . She decided to call on you, a muggle "datascientist" who is able to create miracles with the tool which all muggles know how to use: a "computer". Despite the intrinsic reluctance of many wizards, the director of the school welcomes you to his office to explain the situation. You are here because his informant discovered that you are able to recreate a magic Sorting Hat using your muggle tools.
You explain to him that in order for your "muggle" tools to work, you need students data. Hesitantly, Professor McGonagall gives you a dusty spellbook. Fortunately for you, a simple "Digitalis!" and the book turned into a USB stick.
Which Hogwarts course has a homogeneous score distribution between all four houses ?
- -d: Display all the histograms.
- -f: Show histogram of the feature 'n'.
What are the two features that are similar ?
- -f1: precise the first feature to use.
- -f2: precise the second feature to use.
- n1 and n2: index of the features to use
Some features are homogenous or coherant with other ones, so there existance is not necessary for training the model and can give use= a complex hypothesis that will cause 'Overfitting' Our choice was to remove:
- Arithmancy: Homogenous
- Astronomy: Similar to 'Defense Against the Dark Arts'
- Transfiguration: Semi similar to 'History of Magic'
- Potions: Semi homogenous
- Care of Magical Creatures: Semi homogenous
- -BGD: Batch Gradient Descent Algorithm
- -SGD: Stochastic Gradient Descent Algorithm
Output a file named: ./weights.csv that contain the weights of the model.
In the end of the training the program output the: (using the training set)
- Accuracy of the model 98.06%
- Confusion Matrix
- F1 Score
- Balanced Accuracy 98.71%
- -p: Print the result with the students names in the stdout
Output a file named: ./houses.csv that contain the Indexs and the predicted house affected to the students.
pip3 install pandas
pip3 install matplotlib
pip3 install seaborn
- βοΈ 125 [ Accuracy: (training data: 98.06%) (evaluation data: 99%) ]
- Miss McGonagall is very happy for the results ππ₯³
- Oussama Belouche 1337
- Anas Elouargui 1337