Skip to content

Repository containing portfolio of data science and machine learning projects. Presented in the form of iPython Notebooks

Notifications You must be signed in to change notification settings

mch-fauzy/data-science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Handbook

Repository containing portfolio of Data Science and Machine Learning projects.

It is presented in the form of iPython Notebooks and PDF.

Notes

Fundamentals

No Notebook Description
1 NumPy Overview Overview of how to use numpy
2 Pandas Overview Overview of how to use pandas
3 Matplotlib Overview Overview how to use matplotlib data visualization
4 Seaborn Overview Overview of how to use seaborn data visualization

EDA - Data Preparation and Preprocessing

No Notebook Description
1 Feature Engineering: Variable Types & Characteristics Collections of variables type and characteristics, such as MNAR, MCAR, MAR, cardinality, distributions, linear model assumptions, outliers, and variable magnitude
2 Feature Engineering: Univariate Missing Data Imputation Collections of univariate missing data imputation technique, such as mean median mode, aribitrary, end of distribution, random sample, and many more
3 Feature Engineering: Multivariate Missing Data Imputation KNN and MICE multivariate missing data imputation
4 Feature Engineering: Categorical Encoding Collection of categorical encoding techniques, such as rare label encoding, one hot encoding, woe encoding, and other monotonic relationship encoding
5 Feature Engineering: Variable Transformation Collection of variable transformation techniques to transform non-gaussian distribution for linear model, such as log transformer, box-cox transformer, yeo-johnson transformer
6 Feature Engineering: Discretization Collection of discretization methods, such as equal width discretization, equal frequency discretization, K-means discretization, and many more
7 Feature Selection: Filter Methods Collection of feature selection filter methods, such as constant, quasi-constant, duplicated features pair, multi-collinearity, mutual information, ANOVA, and many more

Modelling and Analysis

No Notebook Report Dasbhoard Description
1 E-Commerce Sales Performance and Customer RFM Behavior Analysis PDF Tableau Dashboard Story E-Commerce companies want to know sales performance and customer behavior. This analysis goals are to understand customer behavior and what recommendations can be made to increase sales and customer satisfaction
2 Credit Default Risk_Home Credit_Light GBM PDF - Credit Default Risk classification and Debtors Grading with SHAP model explainability using Light GBM
3 Book Recommendation System_Content and Item-based Collaborative Filtering PDF - Build a book recommendation system to help users choose their books based on the books they have purchased
4 Article Topic Classification_Kumparan_Light GBM PDF - Build a model to classify article topics based on their content using TF-IDF vectorization
5 Airplane Passengers_SARIMA Forecasting - - Number of plane passengers seasonal forecasting using Walk-Forward Validation
6 Sales Advertising_Linear Regression - - Sales prediction based on advertising amount

About

Repository containing portfolio of data science and machine learning projects. Presented in the form of iPython Notebooks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published