This notebook presents some Data Imputation Techniques using the training dataset provided by the WIDS Datathon 2021. We already performed an exploratory data analysis in this data. If you haven't seen it, you can find it here.
The main goal of this study was to investigate how mean and standard deviation are affected by different techniques of data imputation.
We explore two types of data imputation, univariate and multivariate.
The univariate techniques we explore are:
- Constant
- Mean
- Median
- Most Frequent
The multivariate methods are:
- KNN
- Iterative
We used SimpleImputer, KNNImputer and IterativeImputer from Sklearn in the imputation process. For the visualization, we adopted seaborn and matplotlib.
Hope you enjoy it ;)