Data Operations | Python | R |
---|---|---|
Data Manipulation | Pandas | dplyr, plyr, tidyr, stringr, data.table, lubridate(for date manipulation) |
Data Visualization | matplotlib, seaborn | ggplot2, cowplot, ggthemes, scales |
Recommender Model | recommenderlab | |
Text Mining | nltk, spaCy | tm, tidyverse |
ML Models | scikit-learn, PyCaret | randomForest, rpart, caret, lm, glm, forecast, tseries, kernlab |
Projects | Algorithms | Programming Languages |
---|---|---|
Abnormal Blood Pressure Classification | Logistic Regression, Decision Trees, Random Forest, XGBoost, LightGBM and other classification algorithms | Python |
AirBnB Price Prediction | XGBoost | R |
Amaze Payment Solution EDA | EDA | R |
Amazon Marketplace Best Sellers Identification | K-Means | Python |
Beer Recommendation System | Collaborative Filtering, Content-Based Filtering | R |
Breast Cancer Prediction | AdaBoost | Python |
Car Pricing Model | Linear Regression | R |
Credit Card Defaulter | Random Forest | R, Python |
Credit Risk Analysis | EDA | R |
Credit Worthiness For Rural India | Linear/Lasso/Ridge/Elastic Net Regression, Decision Tree Regressor, Random Forest Regressor and other Regression algorithms | Python |
Customer Segmentation | K-Means, hierarchical clustering | R |
Digital Media Company Viewership Prediction | Python | |
Email Classification | Linear SVM | R, Python |
Employee Attrition Model | Logistic Regression | R |
Global Investment Trends | EDA | R |
Handwritten Digit Recognition | SVM (Linear and RBF) | R |
Heart Disease Classification | Decision Tree | Python |
Housing Price Prediction | Linear Regression(OLS) | Python |
Letter Recognition | SVM (Linear and RBF) | Python |
Loan Defaulter-EDA | EDA | R |
Monthly Income | Decision Tree | R |
Movie Recommendation System | Collaborative Filtering, Content-Based Filtering | R |
RTO Prediction | Logistic Regression, Decision Trees, Random Forest, XGBoost, LightGBM and other classification algorithms | Python |
SMS Classification | Multinomial and Bernoulli Naive Bayes | Python |
Saavn_Ecomm_Ads_Segmentation | Clustering (k-prototype) | R |
Sales and Demand Forecasting | Time-Series (ARMA, ARIMA) | R |
Telecom Churn Model | Logistic Regression | R, Python |
Transaction-data-analysis-and-prediction | Time-Series(ARIMA) | R |
Uber Supply-Demand Gap | EDA | R |
Click to expand!
- To know more about dummy variables (here)
- Why it's necessary to create dummy variables (here)
- Missing Values Imputation
- When to Normalize or Standardize the variables?
- Various scaling techniques (here)
- Recursive Feature Elimination(RFE) - scikit-learn (here)
- Recursive feature elimination is based on the idea to repeatedly construct a model (for example an SVM or a regression model) and choose either the best or worst performing feature (for example based on coefficients), setting the feature aside and then repeating the process with the rest of the features. This process is applied until all features in the dataset are exhausted. Features are then ranked according to when they were eliminated. As such, it is a greedy optimization for finding the best performing subset of features. Read more at this link
- Parametric v/s non parametric models in short and detailed
- Regression guarantees interpolation of data and not extrapolation
- Interpolation basically means using the model to predict the value of a dependent variable on independent values that lie within the range of data you already have. Extrapolation, on the other hand, means predicting the dependent variable on the independent values that lie outside the range of the data the model was built on.
- Optimization Methods (here)
- Regularization in Machine Learning (here)
- A brief overview of Feature Scaling (here)
- When to standardise, when to normalise (here)
- All about When and How to do train_test_split and pre_processing
- Dimensionality Reduction Algorithmns (here)
- Feature Selection (here)
- Naive Bayes Classification explanation (here)
- Factor Analysis
- Implementing recommendation systems
- Understanding ROC curve (here)
- Feature Engineering and it's importance (here)
- Explanation of linear or linearity in Linear Regression
- The term 'linear' in linear regression refers to the linearity in the coefficients, i.e. the target variable y is linearly related to the model coefficients. It does not require that y should be linearly related to the raw attributes or features. Feature functions could be linear or non-linear.
- Techniques for handling Class Imbalance in Dataset
- XGBoost
- LightGBM
- Logistic Regression (here)
- Voting Ensembles
- Time-Series forecasting in Python(AR, MA, ARIMA, SARIMA and SARIMAX model) (here)
- Multivariate time-series forecasting
- Missing values Imputation
- LightGBM Vs XGBoost
- Gradient Descent
- Gradient
- Stochastic Gradient
- Clustering
- KNN Overview and finding optimal value of K (here)
- Which Classification metric to choose and when?
- Ways of Encoding Categorical variables
Model Evaluation (here)
- Regression
- R-squared/Adj. R-squared
- Root Mean Squared Error(RMSE) / Mean Squared Error
- Mean Absolute Error(MAE)
- Classification (here)
- Accuracy, Precision, and Recall
- Log Loss/Binary Crossentropy
- Categorical Crossentropy
- Confusion Matrix
- F1 Score
- AUC