(A) Data Preprocessing and Feature Engineering (Exploratory Data Analysis)
- Studying the feature statistics
- Impute missing values (with mean, median, mode)
- Aggregation
- Sampling
- Dimensionality reduction (PCA)
- Feature subset selection
- Feature creation
- Discretization and binarization (with Gini Index / Entropy)
- Variable transformation and binning
(B) Build Machine Learning Pipeline (eg. Scikit-Learn Fit and Transform)
- ML hyper-parameters tuning / Optimization (eg. GridSearchCV)
- K-fold cross validations
- Regressors (Gradient Descent)
- Decision Trees (Random Forest, XGBoost)
- Support Vector Machines
- Deep Learning (Keras, PyTorch with GPU)
- Ensemble Learning (Bagging, Boosting)
(C) Postprocessing
- Filtering patterns
- Visualization
- Pattern Interpretation
- Predications
(D) Conclusions
- Model Interpretations and performance evaluations
- Documentations
References :
[1] Introduction to Machine Learning (2nd Ed.), by Ethem Alpaydin, The MIT Press, 2010
[2] Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2005.
[3] Feature Engineering for Machine Learning, by Alice Zheng, Amanda Casari.