Skip to content

This project contains transformers for missing value imputation

License

Notifications You must be signed in to change notification settings

pharo-ai/data-imputers

Repository files navigation

Data Imputers

CI Coverage Status Pharo version Pharo version Pharo version Pharo version License

This is a Pharo library for transforming data to manage missing values.

How to install it?

To install the project, go to the Playground (Ctrl+OW) in your Pharo image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):

Metacello new
  baseline: 'AIDataImputers';
  repository: 'github://pharo-ai/data-imputers/src';
  load.

How to depend on it?

If you want to add a dependency on this project to your project, include the following lines into your baseline method:

spec
  baseline: 'AIDataImputers'
  with: [ spec repository: 'github://pharo-ai/data-imputers/src' ].

If you are new to baselines and Metacello, check out this wonderful Baselines tutorial on Pharo Wiki.

Quick Start

I can be used to fill the missing values of a collection like this:

| collection|
collection := #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ).
	
AISimpleImputer new
	useMostFrequent;
	fit: collection;
	transform: collection "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"

I can also be used to fill missing values of a DataFrame:

AISimpleImputer mostFrequent fitAndTransform: (DataFrame withRows: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) )) 

Simple Imputer

I am a simple imputer whose goal is to fill missing values in 2D collections.

To use me you need 3 steps. The first one is to define the value to replace the missing values with:

  • #useAverage (Default value)
  • #useMedian
  • #useMostFrequent
  • #useContant:

Then you need to use #fit: to allow me to compute the missing value. Once it is done, you can use #statistics to get those values.

Finally you can use #transform: to fill the missing values of a 2D collection.

An alternative is to use #fitAndTransform: if you want to fill the missing values using the same collection to compute them.

Example:

| collection|
collection := #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ).
	
AISimpleImputer new
	useMostFrequent;
	fit: collection;
	statistics; "This methods allows to get the replacement values once the imputer is fitted. In this case => #( 7 2 5 6 )"
	transform: collection "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"

or

AISimpleImputer new
	useMostFrequent;
	fitAndTransform: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ) "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"

I can also be used with a DataFrame:

AISimpleImputer new
	useMostFrequent;
	fitAndTransform: (DataFrame withRows: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) )) 

It is also possible to change the missing value in case you want to replace something else than nil values:

AISimpleImputer new
	useMostFrequent;
	missingValue: false;
	fitAndTransform: #( #( 7 2 5 6 ) #( 7 false 5 9 ) #( 10 2 false 6 ) ) "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"