This is a Pharo library for transforming data to manage missing values.
To install the project, go to the Playground (Ctrl+OW) in your Pharo image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):
Metacello new
baseline: 'AIDataImputers';
repository: 'github://pharo-ai/data-imputers/src';
load.
If you want to add a dependency on this project to your project, include the following lines into your baseline method:
spec
baseline: 'AIDataImputers'
with: [ spec repository: 'github://pharo-ai/data-imputers/src' ].
If you are new to baselines and Metacello, check out this wonderful Baselines tutorial on Pharo Wiki.
I can be used to fill the missing values of a collection like this:
| collection|
collection := #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ).
AISimpleImputer new
useMostFrequent;
fit: collection;
transform: collection "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"
I can also be used to fill missing values of a DataFrame
:
AISimpleImputer mostFrequent fitAndTransform: (DataFrame withRows: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ))
I am a simple imputer whose goal is to fill missing values in 2D collections.
To use me you need 3 steps. The first one is to define the value to replace the missing values with:
#useAverage
(Default value)#useMedian
#useMostFrequent
#useContant:
Then you need to use #fit:
to allow me to compute the missing value. Once it is done, you can use #statistics
to get those values.
Finally you can use #transform:
to fill the missing values of a 2D collection.
An alternative is to use #fitAndTransform:
if you want to fill the missing values using the same collection to compute them.
Example:
| collection|
collection := #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ).
AISimpleImputer new
useMostFrequent;
fit: collection;
statistics; "This methods allows to get the replacement values once the imputer is fitted. In this case => #( 7 2 5 6 )"
transform: collection "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"
or
AISimpleImputer new
useMostFrequent;
fitAndTransform: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ) "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"
I can also be used with a DataFrame
:
AISimpleImputer new
useMostFrequent;
fitAndTransform: (DataFrame withRows: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ))
It is also possible to change the missing value in case you want to replace something else than nil values:
AISimpleImputer new
useMostFrequent;
missingValue: false;
fitAndTransform: #( #( 7 2 5 6 ) #( 7 false 5 9 ) #( 10 2 false 6 ) ) "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"