You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it would be worth looking into adding the option to run an outlier detection algorithm (sklearn has some good ones) during the preprocessing stage. Based on the results we could throw out outliers that might affect performance or dynamically change the tpot accuracy metric to one that's more outlier resistant.
I thought of this because one of the datasets I'm working with has a few outliers and I think they are causing tpot to try really hard to find a model that improves performance drastically on those few when it should instead be finding a marginally better fit for the vast majority of the data.
The text was updated successfully, but these errors were encountered:
I like this idea, but we need to be careful that we aren't discarding outliers that people want. A lot of times in matsci predictions we are looking for outliers (which material is the hardest, most conductive, etc.), and being able to predict them is important.
For the Analytics part, the outlier analysis should mainly be looking at which predictions were farthest from their true values, and possibly why. We can also look at the outliers based on actual value.
I think it would be worth looking into adding the option to run an outlier detection algorithm (sklearn has some good ones) during the preprocessing stage. Based on the results we could throw out outliers that might affect performance or dynamically change the tpot accuracy metric to one that's more outlier resistant.
I thought of this because one of the datasets I'm working with has a few outliers and I think they are causing tpot to try really hard to find a model that improves performance drastically on those few when it should instead be finding a marginally better fit for the vast majority of the data.
The text was updated successfully, but these errors were encountered: