Data Imputation Techniques

This notebook presents some Data Imputation Techniques using the training dataset provided by the WIDS Datathon 2021. We already performed an exploratory data analysis in this data. If you haven't seen it, you can find it here.

The main goal of this study was to investigate how mean and standard deviation are affected by different techniques of data imputation.

We explore two types of data imputation, univariate and multivariate.

The univariate techniques we explore are:

- Constant
- Mean
- Median
- Most Frequent

The multivariate methods are:

- KNN
- Iterative

We used SimpleImputer, KNNImputer and IterativeImputer from Sklearn in the imputation process. For the visualization, we adopted seaborn and matplotlib.

Hope you enjoy it ;)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Data Imputation Techniques.ipynb		Data Imputation Techniques.ipynb
README.md		README.md
TrainingWiDS2021.csv		TrainingWiDS2021.csv
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Imputation Techniques

About

Releases

Packages

Languages

chainao/Data-Imputation

Folders and files

Latest commit

History

Repository files navigation

Data Imputation Techniques

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages