Finding the Higgs Boson - Machine Learning challenge

The Higgs boson is a particle that gives other particles their mass and its discovery is crucial, indicating the existence of new physics principles. Hence, the problem of classifying if a particle is a Higgs boson (signal) or some other process/particle (background) is undoubtedly significant. Improvements in the current solutions are still needed and highly valuable.

We explore various methods and propose a binary classifier based on logistic regression, which has achieved an accuracy of 80% on the AICrowd platform.

Model structure

All the provided data from CERN is in folder /data, where we have 2 files:
train.csv - Training set of 250000 events. The file starts with the ID column, then the label column, and finally 30 feature columns.
test.csv - The test set of around 568238 events the same as above mentioned except the label is missing.

The dataset was downloaded from -> https://github.com/epfml/ML_course/tree/master/projects/project1/data.
In the folder /docs you can find the description of the project (project1_description.pdf) and our report (report.pdf).
The folder /pretrained_data contains files with the weights obtained while training on the best set of parameters.
The folder /scripts has the following files:
data_processor.py - All the preprocessing and refining of the raw data. Here we have methods that standardize the data, scale, split data into different sets (based on jet numbers or to train and test set), feature expansion, etc.
model.py - Contains methods for training and validating the model. Also, the predictions, final evaluations, and creating submission is implemented here.
implementation.py - Here are the 6 methods used for classification: linear regression using gradient descent, linear regression using stochastic gradient descent, least squares regression using normal equations, ridge regression using normal equations, logistic regression using GD or SGD, and regularized logistic regression using GD or SGD. Also, the file contains additional methods such as losses and gradients computations.
run.py - The model runner, it runs the model with the best parameters.
plots.py - Methods for plotting the data.
proj1_helpers.py - Helpers received for the project.

Running the model

To run the code you first need to have Python installed. Also, the only external library for the model used in numpy, but if you want to generate plots you should also install matplotlib.

The needed libraries are stated in requirements.txt, to install them run: pip install -r requirements.txt (Python 2), or pip3 install -r requirements.txt (Python 3).

To run the model with our best results simply run the python script run.py, which will run the best model based on logistic regression for a small number of iterations using pretrained weights because training from scratch takes long. However, to train from scratch, set USE_PRETRAINED=False in the run.py file.

Authors

Irina Bejan: irina.bejan@epfl.ch
Nevena Drešević: nevena.dresevic@epfl.ch
Marija Katanić: marija.katanic@epfl.ch

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.idea		.idea
docs		docs
output_files		output_files
pretrained_data		pretrained_data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finding the Higgs Boson - Machine Learning challenge

Model structure

Running the model

Authors

About

Releases

Packages

Contributors 3

Languages

IrinaMBejan/Higgs_Bosson_Project

Folders and files

Latest commit

History

Repository files navigation

Finding the Higgs Boson - Machine Learning challenge

Model structure

Running the model

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages