Kaggle Datathon: DataHub 2.0
Music is as much a powerful form of human expression as it is an entertainment. Over time, it has developed from the earliest calls and rhythms, into a huge variety of different genres. The high contrast between the simplicity of folk songs to the complexities of classical symphonies and the hypnotic rhythms of dance music suggests that we can effectively assign a category to each song based on various elements. For example, genres can be defined by the use of specific instruments. If the piece was being played in a certain style using orchestral instruments, then we could classify it as classical music. Similarly, if the instruments were highly distorted guitars, we would classify them as rock or heavy metal. Likewise, drum and bass use a very fast bpm and is primarily electronic.
Currently, many music aggregator applications rely on machine learning to power their recommendation engine, and curate playlists.
In this challenge, you are expected to develop a machine learning model with the given dataset which classifies music into genres, taking into account relevant features.
Your goal is to predict the correct genre of each music record, given their respective features!
The evaluation metric for this competition is Categorization Accuracy - the percentage of predictions that are correct.
Sample_submission.csv
Use Kaggle API command, given below, to download the dataset
>_ kaggle competitions download -c datahub-2021
OR Use Git to clone this repository
$ git clone https://github.com/Alpha-github/Kaggle_Competition_Datahub2021.git
Files | Description |
---|---|
train.csv |
The training set |
test_x.csv |
The test set |
Sample_submission.csv |
A sample submission file in the correct format |
metaData.csv |
Supplemental information about the data |
Output File | Description |
---|---|
submission.csv |
This files contains id of test data and its respective prediction |
The program involves preprocessing of data using pandas and building a predictive categorical model.
As accuracy is the key, 3 models have been built; Two models using Sci-kitLearn, GaussianNB and DecisionTreeClassifier, and the last one using Tensorflow Keras Deep Learning model. The model which gives the highest accuracy will be opted.
Replace PATH_TO_TRAIN_CSV
and PATH_TO_TEST_CSV
with the path of your train.csv and test.csv files.
Feel free to play with the Deep Learning Model by tweeking hyperparameters, number of layers, Optimization and Loss functions, etc.
Important: Beware to not overfit your model, else it won't perform well on the test dataset.
The final prediction on the Test data is stored in submission.csv
NOTE: As training Neural Networks is hardware intensive, its better to run the model using Google Colab.
- An easy to pick up programming language and fun to play with.
- Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- Tensorflow Keras - The core open source library to help you develop and train ML models.
- Simple and efficient tools for predictive data analysis
- Accessible to everybody, and reusable in various contexts
- Built on NumPy, SciPy, and matplotlib
- Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.
- NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
-
- Python has a built-in module logging which allows writing status messages to a file or any other output streams.
-
- Seaborn is a data visualization library built on top of matplotlib and closely integrated with pandas data structures in Python.
Install Tensorflow using pip
System requirements :-
Python 3.6–3.9 Python 3.9 support requires TensorFlow 2.5 or later. Python 3.8 support requires TensorFlow 2.2 or later.
Important: For more information regarding proper installation and Setting up GPU. Click here
pip install tensorflow
Install Numpy using pip
pip install numpy
Install Pandas using pip
pip install pandas
Install Numpy using pip
pip install numpy
Install Seaborn using pip
pip install seaborn
To download python, Click on the thumbnail below to be redirected to Python Downloads page