Data analysis task with fully data science lifecyle
Author: Davain Edwards
Please do all the notebooks in order. The only exception is notebook 2, which can be skipped.
Using datasets known.csv and unknown.csv, [notebooks](
1_business_understanding.ipynb,
2_data_mining.ipynb, <-- Can be skipped!
3_data_cleaning.ipynb,
4_data_exploration.ipynb,
5_feature_engineering.ipynb,
6_predictive_modelling_with_pycaret.ipynb, <-- Final model training, test evaluation and model saving!
6_predictive_modelling_with_sklearn.ipynb, <!-- Testing and error analysis
7_data_visualization.ipynb)
Shows a walk through all the steps of the Data Science Life Cycle. It thus contains:
-
- Business Understanding
-
- Data Mining
-
- Data Cleaning
-
- Exploratory Data Analysis
-
- Feature Engineering
-
- Predictive Modeling with Hyperparameter Tuning (Small Error Analysis)
-
- Data Visualization
- pyenv
- python==3.8.5
For this purpose you use following commands:
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
jupyter nbconvert --clear-output --inplace [NOTEBOOK.ipynb]