Skip to content

You can find here the implementation of my knowledge of basic machine learning algorithms and metrics, data analysis, and applying hyperparametric optimization techniques to improve model performance.

Notifications You must be signed in to change notification settings

maria-snarava/portfolio-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Engineer Portfolio

Welcome to my portfolio! This repository showcases projects demonstrating my skills and experience in various machine-learning techniques and applications.

This project demonstrates the process of predicting customer churn using machine learning techniques. Customer churn is when customers stop doing business with a company.

Overview

Customer churn, or customer attrition, is critical for many businesses. Predicting churn can help businesses take proactive steps to retain customers. I'll start with Logistic Regression as our baseline model. It's a great starting point as it's easy to implement and interpret. Using Logistic Regression, I've achieved an accuracy of 79.29% in predicting customer churn. This serves as baseline performance. Then, using Feature Engineering and Hyperparameter Tuning, I improved this model. Then I experimented with more advanced algorithms with hyperparameter tuning to improve this baseline. Then I compared all model metrics and chose the best one. By comparing all models to our Logistic Regression baseline, I can assess the better performance. The best model is XGBClassifier with an accuracy of 95.737% and a precision for a True value of 0.91 on the independent test set.

Key Features

  • Data preprocessing
  • Feature engineering
  • Implementation of logistic regression using scikit-learn
  • Model evaluation
  • Hyperparameter Tuning
  • Decision Tree
  • XGBoost
  • Support Vector Machines

Results

The best model is XGBClassifier with an accuracy of 95.737% and a precision for a True value of 0.91 on the independent test set.

Technologies Used

  • Python
  • Pandas and NumPy for data manipulation
  • Scikit-learn for model building and evaluation
  • Matplotlib and Seaborn for data visualization
  • Imblearn
  • XGBoost

Overview

This project uses a dataset Tweets about the Top Companies from 2015 to 2020, featuring tweets related to major NASDAQ-listed companies, that was posted between 01-01-2019 and 31-12-2019. Sentiment analysis was performed using the NLTK library, along with a financial dictionary to better capture the nuances of financial terms. Engagement metrics were computed based on the number of likes, retweets, and comments each tweet received. You can find the preprocessing steps and code on the project’s GitHub repository.

Key Features

  • Number of tweets by company plot (Histogram, pie chart, timeline)
  • Engagement plot (Histogram, pie chart, timeline)
  • Detailed analysis for a selected company:
  • Number of Tweets by Sentiment for selected company (Histogram, pie chart, timeline)
  • Number of tweets about Apple by sentiment vs stock price for selected company
  • Word cloud by sentiment for selected company
  • Random tweet about a selected company by sentiment
  • Top 5 Most Engaging Tweets About selected company by sentiment

Technologies Used

  • Python
  • Pandas for data manipulation
  • Streamlit
  • NLTK library

Upcoming Projects

  • Recommender System for online store using deep learning and content-based filtering
  • Product classification by Image with Convolutional Neural Networks
  • ...

About Me

I am a Backend Developer with 6 years of expertise in web development and e-commerce with Magento 2, using PHP, SQL, and JS. I participated in creating and maintaining some of the most popular Magento extensions used in more than 40,000 stores worldwide.

My Computer Science degree from Belarusian State University gave me a solid foundation in programming, algorithms, and software engineering principles. In addition, right now I am expanding my skill set into the field of Data Science. I am currently studying Machine Learning, with a focus on Deep Learning, to use it in my work. This knowledge will allow me to integrate data science and machine learning with backend development to create smarter, more efficient applications.

Contact

I am looking for new challenges and collaboration. If you're seeking a backend developer who combines technical expertise with a passion for innovation, feel free to contact me via direct messages or by e-mail.

About

You can find here the implementation of my knowledge of basic machine learning algorithms and metrics, data analysis, and applying hyperparametric optimization techniques to improve model performance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published