Skip to content

Latest commit

 

History

History
48 lines (35 loc) · 2.28 KB

Day48.md

File metadata and controls

48 lines (35 loc) · 2.28 KB

Day 48 of #dailycoding challenge ⬇️

Today we are interested in #Mice Package: Multivariate Imputation by Chained Equations.

The #mice package implements a method to deal with missing data. The package creates multiple imputations (replacement values) for multivariate missing data. The method is based on Fully Conditional Specification, where each incomplete variable is imputed by a separate model. The MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. Mice function arguments:

 data : A data frame or a matrix containing the incomplete data. Missing values are coded as NA

 m : Number of imputation

 Maxit : Number of iterations

 Method : Can be either a single string, or a vector of strings with length length(blocks), specifying the imputation method to be used for each column in data. If specified as a single string, the same method will be used for all blocks. The methods that can be used are :

 PMM (Predictive Mean Matching): suitable for numeric variables

 logreg(Logistic Regression): suitable for categorical variables with 2 levels

 polyreg(Bayesian polytomous regression): suitable for categorical variables with more than or equal to two levels

 Proportional odds model: suitable for ordered categorical variables with more than or equal to two levels

 predictorMatrix: A numeric matrix of length(blocks) rows and ncol(data) columns, containing 0/1 data specifying the set of predictors to be used for each target column.

Happy Coding Learning !

library(mice)
library(missForest)
library(VIM)
data <- iris
head(iris)
#Add randomly  missing values in 10% of the data: prodNA package from the package missForest
iris.mis <- prodNA(iris, noNA = 0.1)
head(iris.mis)
## Missing data patterns: 
md.pattern(iris.mis)
## Number of NAs for each variable using sapply
sapply(iris.mis, function(x) sum(is.na(x)))
## Use the aggr function from the VIM package to visualize missing data
miss_plot <- aggr(iris.mis,
                  numbers=TRUE, sortVars=TRUE,
                  labels=names(iris.mis), cex.axis=.7,
                  gap=3, prop=T,ylab=c("Proportion of missing data","Combinations"))
imputed_Data <- mice(iris.mis, m=5, maxit = 50, method = 'pmm', seed = 500)
summary(imputed_Data)