Data Scientist
With an interdisciplinary background, my interests lie at the intersection of mathematics, statistics, and computer science, which naturally drew me towards the captivating realm of data science.
Throughout my journey in data science, I have been extensively exploring all facets of developing effective business solutions. My process begins by thoroughly understanding the underlying business problem and continues with the meticulous development of innovative model solutions, culminating in their successful deployment.
I am really excited about the possibilities that data science offers and I look forward to contributing my skills and knowledge in different projects.
Contacts:
In this project several machine learning models were trained to classify when a fire extinguishing system using sound waves can extinguish the fire. A publicly available database was used, which provides the result of several experiments carried out with sound wave fire extinguishing systems. For this project, the R language and packages such as the Caret package and the tidyverse were used.
After intense experiments and tests, we were able to obtain an exceptional result: the best trained model achieved an impressive accuracy of 96.6%. This result is highly promising and could have a significant impact on the safety and efficiency of fire extinguishing systems.
I developed a project with the objective of building Machine Learning models to predict the energy consumption of electric cars. I used a real dataset, applying techniques such as Linear Regression, Random Forest, Decision Trees and SVM. The Linear Regression model with Ridge regularization showed the best performance in terms of prediction accuracy, measured by RMSE.
The project was implemented using the R language and the caret and tidyverse libraries, allowing a comprehensive approach from data preparation to model evaluation. The choice of model was based on the RMSE, which highlights significant discrepancies between predictions and actual results.
Which traffic incidents occur most frequently? What age group is most involved in traffic incidents? What is the most common event in incidents? Are passengers or pedestrians the main victims of incidents?
These and other questions are the focus of analysis in this project, where it is provided insights through the examination of publicly available real data.
Apache Spark was used to read and process data efficiently, simulating data processing in a distributed cluster of computers. Apache SQL was employed to manipulate and query the data for further analysis. In addition, the power of Python libraries such as pandas, seaborn, Plotly and Matplotlib were used to create informative and visually appealing graphs to visualize the findings.
In this project, I developed a Machine Learning model able to predict 1115 stores sales for the next 6 week sales, with an average 11% MAPE. using XGBoost regressor, with informations over 1115 stores trough 942 days.
considering the model error, in the best scenario, 287,609,874.99 sales are forecast.
I also created a Telegram chatbot. Which return the expected, worst and best sales scenarios daily sales evolution graphs, for each store in the next 6 weeks.