Outlier detection as a preprocessing step #135

Doppe1g4nger · 2018-11-20T01:39:50Z

I think it would be worth looking into adding the option to run an outlier detection algorithm (sklearn has some good ones) during the preprocessing stage. Based on the results we could throw out outliers that might affect performance or dynamically change the tpot accuracy metric to one that's more outlier resistant.

I thought of this because one of the datasets I'm working with has a few outliers and I think they are causing tpot to try really hard to find a model that improves performance drastically on those few when it should instead be finding a marginally better fit for the vast majority of the data.

ardunn · 2018-11-20T02:36:48Z

I like this idea, but we need to be careful that we aren't discarding outliers that people want. A lot of times in matsci predictions we are looking for outliers (which material is the hardest, most conductive, etc.), and being able to predict them is important.

For the Analytics part, the outlier analysis should mainly be looking at which predictions were farthest from their true values, and possibly why. We can also look at the outliers based on actual value.

Doppe1g4nger added the enhancement label Nov 20, 2018

ardunn removed the enhancement label Dec 11, 2018

ardunn closed this as completed Feb 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outlier detection as a preprocessing step #135

Outlier detection as a preprocessing step #135

Doppe1g4nger commented Nov 20, 2018

ardunn commented Nov 20, 2018 •

edited

Loading

Outlier detection as a preprocessing step #135

Outlier detection as a preprocessing step #135

Comments

Doppe1g4nger commented Nov 20, 2018

ardunn commented Nov 20, 2018 • edited Loading

ardunn commented Nov 20, 2018 •

edited

Loading