The repository contains a portfolio of data science projects I completed for academic, self-learning, and hobby purposes. They are presented in the form of iPython Notebooks, Python codes, and R markdown files.
For a more visually pleasing experience for browsing the portfolio, check out abdo.tech.
The R portfolio is coming soon!
Note: Data used in the projects (accessed under the Dataset directory) is only for demonstration purposes. The datasets are Free Public datasets available in Kaggle.
-
-
- Predicting Adult Income: A model to precisely predict individuals’ income using Adult data Set collected from the UCI machine learning repository. Our goal with this implementation is to build a model that accurately predicts whether an individual makes more than $50,000.
- Predicting Cars Classes: Experiment with KNN machine learning algorithm to predict your Class label based on your selected data. Use default KNN configurations and try at least two different values of k. Try conduct also with custom KNN configurations with at least 5 fold cross-validation.
- Credit Card Approval: Use the KNN machine learning algorithm to help banks decide whether they should approve or reject giving the credit card to each customer. The most critical indicator among the indicators of confusing matrices will be an indicator called specificity or another name - true negative rate (TNR).
-
- Predicting Cars Classes: Experiment with Decision Tree machine learning algorithm to predict your Class label based on your selected data.
-
- Predicting Cars Classes: Experiment with Naive Bayes machine learning algorithm to predict your Class label based on your selected data.
-
- Predicting Cars Classes: Experiment with SVM machine learning algorithm to predict your Class label based on your selected data.
- Credit Card Approval: Use the SVM machine learning algorithm to help banks decide whether they should approve or reject giving the credit card to each customer. The most critical indicator among the indicators of confusing matrices will be an indicator called specificity or another name - true negative rate (TNR).
-
- Predict Diabetes Progression: Build a linear regression model using least squares or Gradient Descent to predict diabetes progression.
-
Tools: scikit-learn, Pandas, Seaborn, Matplotlib, Numpy.
-
-
- Cluster Automotives: Cluster Automotives using Agglomerative Clustering based on MPG and displacement.
-
Tools: scikit-learn, Pandas, Seaborn, Matplotlib, Numpy.
-
-
- Association Rules for Covid Symptoms: Association rules using the apriori algorithm that help to show the probability of relationships between different covid-19 symptoms.
- Market Basket Analysis for Online Retail: Determine which products are most often bought in combination with each other to identify how customers' figure size will affect the purchasing pattern to have a better insight into inventory planning and better stock management.
-
Tools: Mlxtend, Pandas, Seaborn, Matplotlib, Numpy.
-
- Dresses Sales Forecast for ModCloth Using ARIMA : Forecast dresses sales for ModCloth online retail. The first model parameters are determined by testing the stationary of the time series; the p and q values are determined by observing the ACF plot "Autocorrelation function "and PACF plot "autocorrelations" sequentially. The second model (p,d,q) parameters will be determined using Auto ARIMA.
- Dresses Sales Forecast for ModCloth Using Xgboost : Dresses Sales Forecast for ModCloth using Xgboost. Two different models based on two different learning rates will be implemented. We will calculate MAE, MSE, and RMSE for each model. Finally, we will forecast 20 periods with Xgboost (20 Months).
- Aircrafts Crashes Forecast Using Xgboost: Aviation Causality Forecasting using Xgboost to analyze and research historical airplane crashes and fatalities data and forecast future causalities for 20 periods (20 Months).
Tools: Statsmodels, xgboost, Scikit-learn, Pandas, Seaborn, Matplotlib, Numpy.
-
-
- Passion Fruit Classification using Convolutional Neural Network: Build a CNN architecture to perform classification between several cultivars of Passion Fruit (Markisa), notably the following cultivars of Markisa, Sweet Passion Fruit (Markisa Manis), Yellow Passion Fruit (Markisa Kuning), Purple Passion Fruit (Markisa Ungu), and Big Passion fruit (Markisa Besar).
- Classification of COVID-19 from Chest X-ray images using Transfer Learning: Build a CNN-based model with DenseNet201 transfer learning to detect coronavirus, Lung Opacity and Viral Pneumonia infected patients using chest X-ray radiographs and gives a classification accuracy of training accuracy of 94.5%, validation accuracy of 96.49 %, and validation AUC of 99.39%. The results demonstrate that transfer learning proved effective, showed robust performance, and was an easily deployable approach for COVID-19 detection.
-
Tools: Tensorflow, keras, Colab, Pandas, Seaborn, Matplotlib, Numpy.
-
-
- Twitter Sentiment Analysis for Cryptocurrency Price Prediction: Build a system that Connect to Twitter API V2 to collect relative posts related to BTC and store them into MySQL database, while using Hugging Face is an NLP library to provide sentiment analysis. The output from the library provides either "Positive," "Negative," or "Neutral," indicating the sentiment and store them back into the MySQL database.
-
Tools: Transformers, HuggingFace, Colab,Twitter API,Pandas, Numpy, MySQL.
I also immerse in all other kinds of technologies. You can find a general portfolio here.
If you liked what you saw and want to chat with me about the portfolio, work opportunities, or collaboration, shoot an email at a7med.abdu@gmail.com.
If this portfolio inspired you, gave you ideas for your portfolio, or helped you, please consider following my Github profile for newly updated content.