GitHub - ashish-kamboj/Data-Science: EDA and Machine Learning Models in R and Python (Regression, Classification, Clustering, SVM, Decision Tree, Random Forest, Time-Series Analysis, Recommender System, XGBoost)

Packages/Libraries used for data analyis and building Machine learning models

Data Operations	Python	R
Data Manipulation	Pandas	dplyr, plyr, tidyr, stringr, data.table, lubridate(for date manipulation)
Data Visualization	matplotlib, seaborn	ggplot2, cowplot, ggthemes, scales
Recommender Model		recommenderlab
Text Mining	nltk, spaCy	tm, tidyverse
ML Models	scikit-learn, PyCaret	randomForest, rpart, caret, lm, glm, forecast, tseries, kernlab

ML and EDA Projects

Projects	Algorithms	Programming Languages
Abnormal Blood Pressure Classification	Logistic Regression, Decision Trees, Random Forest, XGBoost, LightGBM and other classification algorithms	Python
AirBnB Price Prediction	XGBoost	R
Amaze Payment Solution EDA	EDA	R
Amazon Marketplace Best Sellers Identification	K-Means	Python
Beer Recommendation System	Collaborative Filtering, Content-Based Filtering	R
Breast Cancer Prediction	AdaBoost	Python
Car Pricing Model	Linear Regression	R
Credit Card Defaulter	Random Forest	R, Python
Credit Risk Analysis	EDA	R
Credit Worthiness For Rural India	Linear/Lasso/Ridge/Elastic Net Regression, Decision Tree Regressor, Random Forest Regressor and other Regression algorithms	Python
Customer Segmentation	K-Means, hierarchical clustering	R
Digital Media Company Viewership Prediction		Python
Email Classification	Linear SVM	R, Python
Employee Attrition Model	Logistic Regression	R
Global Investment Trends	EDA	R
Handwritten Digit Recognition	SVM (Linear and RBF)	R
Heart Disease Classification	Decision Tree	Python
Housing Price Prediction	Linear Regression(OLS)	Python
Letter Recognition	SVM (Linear and RBF)	Python
Loan Defaulter-EDA	EDA	R
Monthly Income	Decision Tree	R
Movie Recommendation System	Collaborative Filtering, Content-Based Filtering	R
RTO Prediction	Logistic Regression, Decision Trees, Random Forest, XGBoost, LightGBM and other classification algorithms	Python
SMS Classification	Multinomial and Bernoulli Naive Bayes	Python
Saavn_Ecomm_Ads_Segmentation	Clustering (k-prototype)	R
Sales and Demand Forecasting	Time-Series (ARMA, ARIMA)	R
Telecom Churn Model	Logistic Regression	R, Python
Transaction-data-analysis-and-prediction	Time-Series(ARIMA)	R
Uber Supply-Demand Gap	EDA	R

Additional Reading

Click to expand!

To know more about dummy variables (here)
Why it's necessary to create dummy variables (here)
Missing Values Imputation
When to Normalize or Standardize the variables?
- Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization
- Linear Regression :: Normalization (Vs) Standardization
Various scaling techniques (here)
Recursive Feature Elimination(RFE) - scikit-learn (here)
- Recursive feature elimination is based on the idea to repeatedly construct a model (for example an SVM or a regression model) and choose either the best or worst performing feature (for example based on coefficients), setting the feature aside and then repeating the process with the rest of the features. This process is applied until all features in the dataset are exhausted. Features are then ranked according to when they were eliminated. As such, it is a greedy optimization for finding the best performing subset of features. Read more at this link
Parametric v/s non parametric models in short and detailed
Regression guarantees interpolation of data and not extrapolation
- Interpolation basically means using the model to predict the value of a dependent variable on independent values that lie within the range of data you already have. Extrapolation, on the other hand, means predicting the dependent variable on the independent values that lie outside the range of the data the model was built on.
Optimization Methods (here)
Regularization in Machine Learning (here)
A brief overview of Feature Scaling (here)
When to standardise, when to normalise (here)
- When and Why to stardardize a variable
All about When and How to do train_test_split and pre_processing
- Things to know before train and test split
- Data Preparation without data leakage
Dimensionality Reduction Algorithmns (here)
Feature Selection (here)
Naive Bayes Classification explanation (here)
Factor Analysis
- Introduction to factor analysis
- Factor analysis Notes
- Theory and practice questions on factor analysis\
Implementing recommendation systems
- Recommender systems 101 – A step-by-step practical example in R
- A framework for developing and testing recommendation algorithms
- Netflix implementation of recommendation engine
Understanding ROC curve (here)
Feature Engineering and it's importance (here)
Explanation of linear or linearity in Linear Regression
- The term 'linear' in linear regression refers to the linearity in the coefficients, i.e. the target variable y is linearly related to the model coefficients. It does not require that y should be linearly related to the raw attributes or features. Feature functions could be linear or non-linear.
Techniques for handling Class Imbalance in Dataset
- 8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset
- 10 Techniques to deal with Imbalanced Classes in Machine Learning
- Class Imbalance - Different Perspective
XGBoost
- XGBoost Algorithm - Medium
- A Gentle Introduction to XGBoost for Applied Machine Learning
LightGBM
- What is LightGBM, How to implement it? How to fine tune the parameters?
- How to Develop a Light Gradient Boosted Machine (LightGBM) Ensemble
Logistic Regression (here)
Voting Ensembles
- ML|Voting Classifier using Sklearn
- How to Develop Voting Ensembles With Python
- How VOTing classifiers work!
Time-Series forecasting in Python(AR, MA, ARIMA, SARIMA and SARIMAX model) (here)
Multivariate time-series forecasting
- A Multivariate Time Series Guide to Forecasting and Modeling in Python
- Multivariate time series forecasting
Missing values Imputation
- 6 Different Ways to Compensate for Missing Values In a Dataset (Data Imputation with examples)
LightGBM Vs XGBoost
- Which algorithm takes the crown: Light GBM vs XGBOOST?
Gradient Descent
- Gradient Descent For Machine Learning
Gradient
- What Is a Gradient in Machine Learning?
Stochastic Gradient
- Stochastic Gradient Descent — Clearly Explained !!
- Stochastic Gradient Descent Algorithm With Python and NumPy
Clustering
- 10 Clustering Algorithms With Python
- Clustering Algorithm for data with mixed Categorical and Numerical features
- Understanding K-Means, K-Means++ and, K-Medoids Clustering Algorithms
- Clustering datasets having both numerical and categorical variables
- K-ModesClustering
KNN Overview and finding optimal value of K (here)
Which Classification metric to choose and when?
- The 5 Classification Evaluation metrics every Data Scientist must know
- Classification Metrics & Thresholds Explained
- 24 Evaluation Metrics for Binary Classification (And When to Use Them)
Ways of Encoding Categorical variables
- Smarter Ways to Encode Categorical Data for Machine Learning

Related Mathematics

Model Evaluation (here)

Regression
- R-squared/Adj. R-squared
- Root Mean Squared Error(RMSE) / Mean Squared Error
- Mean Absolute Error(MAE)
Classification (here)
- Accuracy, Precision, and Recall
- Log Loss/Binary Crossentropy
- Categorical Crossentropy
- Confusion Matrix
- F1 Score
- AUC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Packages/Libraries used for data analyis and building Machine learning models

ML and EDA Projects

Additional Reading

Related Mathematics

Model Evaluation (here)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 439 Commits
Abnormal Blood Pressure Classification		Abnormal Blood Pressure Classification
AirBnB Price Prediction		AirBnB Price Prediction
Amaze Payment Solution EDA		Amaze Payment Solution EDA
Amazon Marketplace Best Sellers Identification		Amazon Marketplace Best Sellers Identification
Beer Recommendation System		Beer Recommendation System
Breast Cancer Prediction		Breast Cancer Prediction
Car Pricing Model		Car Pricing Model
Credit Card Defaulter		Credit Card Defaulter
Credit Risk Analysis		Credit Risk Analysis
Credit Worthiness For Rural India		Credit Worthiness For Rural India
Customer Segmentation		Customer Segmentation
Digital Media Company Viewership Prediction		Digital Media Company Viewership Prediction
Email Classification		Email Classification
Employee Attrition Model		Employee Attrition Model
Global Investment Trends		Global Investment Trends
Handwritten Digit Recognition		Handwritten Digit Recognition
Heart Disease Classification		Heart Disease Classification
Housing Price Prediction		Housing Price Prediction
Letter Recognition		Letter Recognition
Loan Defaulter-EDA		Loan Defaulter-EDA
Monthly Income		Monthly Income
Movie Recommendation System		Movie Recommendation System
RTO Prediction		RTO Prediction
SMS Classification		SMS Classification
Saavn_Ecomm_Ads_Segmentation		Saavn_Ecomm_Ads_Segmentation
Sales and Demand Forecasting		Sales and Demand Forecasting
Telecom Churn Model		Telecom Churn Model
Transaction-data-analysis-and-prediction		Transaction-data-analysis-and-prediction
Uber Supply-Demand Gap		Uber Supply-Demand Gap
Credit Bureau data.csv		Credit Bureau data.csv
Demographic data.csv		Demographic data.csv
README.md		README.md
_config.yml		_config.yml

ashish-kamboj/Data-Science

Folders and files

Latest commit

History

Repository files navigation

Packages/Libraries used for data analyis and building Machine learning models

ML and EDA Projects

Additional Reading

Related Mathematics

Model Evaluation (here)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages