The main focus of this work lies in the evaluation and comparison of the predictive ability of various models within the specific context of the provided datasets. The ultimate purpose is the selection of the methodology that demonstrates optimal performance when faced with unseen data in the test set.
To achieve this goal, a detailed description of the work process will be carried out, supported by the use of graphics that provide a clear visualization of the decisions made and the results obtained. The aim is not only to present the final results but also to provide a deep understanding of the methodological choices, adopted approaches, and any key considerations influencing the interpretation of the results.
We will address two different problems, each associated with a specific dataset. We will use the datasets train_ap1_mcp_23_24_train.csv
and test_ap1_mcp_23_24_test.csv
to perform the regression task, specifically predicting the popularity of songs. Additionally, we will employ the dataset nacimientos_2016_2021.csv
to make time series predictions for births.
The organization of the work will revolve around these datasets, with the first two sections dedicated to the regression problem and the last one focused on the time series.
All code, CSV files, and any other necessary files for executing the code are available in the 'code' folder of this repository. The code has been modified for ease of execution in the Google Colab environment. A comprehensive report of the work undertaken is available in Spanish (considered to be translated to English).