Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for missing value cleaner #11

Closed
rastala opened this issue Jun 6, 2017 · 1 comment
Closed

Add support for missing value cleaner #11

rastala opened this issue Jun 6, 2017 · 1 comment

Comments

@rastala
Copy link
Contributor

rastala commented Jun 6, 2017

Add an Estimator that computes a missing value replacement value, such as mean, median or mode, for training data. The Estimator then produces a Model that can be applied to replace missing values.

The missing value cleaner should support one or more input columns. Different types should be supported as follows:

  • Floating point numbers: mean, median, mode
  • Integer numbers: median, mode
  • Strings and categoricals: model
  • Vectors and other composite types: not supported

The missing value cleaner should be a PipelineStage so it is compatible with SparkML pipelines.

@drdarshan
Copy link
Contributor

Closing since the basic missing value cleaner transform is now merged in #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants