Skip to content

samaujs/Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Data Mining with Machine Learning (Scikit-Learn)

(A) Data Preprocessing and Feature Engineering (Exploratory Data Analysis)

  • Studying the feature statistics
  • Impute missing values (with mean, median, mode)
  • Aggregation
  • Sampling
  • Dimensionality reduction (PCA)
  • Feature subset selection
  • Feature creation
  • Discretization and binarization (with Gini Index / Entropy)
  • Variable transformation and binning

(B) Build Machine Learning Pipeline (eg. Scikit-Learn Fit and Transform)

  • ML hyper-parameters tuning / Optimization (eg. GridSearchCV)
  • K-fold cross validations
  • Regressors (Gradient Descent)
  • Decision Trees (Random Forest, XGBoost)
  • Support Vector Machines
  • Deep Learning (Keras, PyTorch with GPU)
  • Ensemble Learning (Bagging, Boosting)

(C) Postprocessing

  • Filtering patterns
  • Visualization
  • Pattern Interpretation
  • Predications

(D) Conclusions

  • Model Interpretations and performance evaluations
  • Documentations

References :
[1] Introduction to Machine Learning (2nd Ed.), by Ethem Alpaydin, The MIT Press, 2010
[2] Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2005.
[3] Feature Engineering for Machine Learning, by Alice Zheng, Amanda Casari.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published