[ML] Validate at least one feature is available for DF analytics #55876

dimitris-athanasiou · 2020-04-28T16:53:17Z

We were previously checking at least one supported field existed
when the _explain API was called. However, in the case of analyses
with required fields (e.g. regression) we were not accounting that
the dependent variable is not a feature and thus if the source index
only contains the dependent variable field there are no features to
train a model on.

This commit adds a validation that at least one feature is available
for analysis. Note that we also move that validation away from
ExtractedFieldsDetector and the _explain API and straight into
the _start API. The reason for doing this is to allow the user to use
the _explain API in order to understand why they would be seeing an
error like this one.

For example, the user might be using an index that has fields but
they are of unsupported types. If they start the job and get
an error that there are no features, they will wonder why that is.
Calling the _explain API will show them that all their fields are
unsupported. If the _explain API was failing instead, there would
be no way for the user to understand why all those fields are
ignored.

Closes #55593

We were previously checking at least one supported field existed when the _explain API was called. However, in the case of analyses with required fields (e.g. regression) we were not accounting that the dependent variable is not a feature and thus if the source index only contains the dependent variable field there are no features to train a model on. This commit adds a validation that at least one feature is available for analysis. Note that we also move that validation away from `ExtractedFieldsDetector` and the _explain API and straight into the _start API. The reason for doing this is to allow the user to use the _explain API in order to understand why they would be seeing an error like this one. For example, the user might be using an index that has fields but they are of unsupported types. If they start the job and get an error that there are no features, they will wonder why that is. Calling the _explain API will show them that all their fields are unsupported. If the _explain API was failing instead, there would be no way for the user to understand why all those fields are ignored. Closes elastic#55593

elasticmachine · 2020-04-28T16:53:19Z

Pinging @elastic/ml-core (:ml)

przemekwitek

LGTM

dimitris-athanasiou · 2020-04-29T06:23:45Z

@elasticmachine update branch

…#55876) (#55914) We were previously checking at least one supported field existed when the _explain API was called. However, in the case of analyses with required fields (e.g. regression) we were not accounting that the dependent variable is not a feature and thus if the source index only contains the dependent variable field there are no features to train a model on. This commit adds a validation that at least one feature is available for analysis. Note that we also move that validation away from `ExtractedFieldsDetector` and the _explain API and straight into the _start API. The reason for doing this is to allow the user to use the _explain API in order to understand why they would be seeing an error like this one. For example, the user might be using an index that has fields but they are of unsupported types. If they start the job and get an error that there are no features, they will wonder why that is. Calling the _explain API will show them that all their fields are unsupported. If the _explain API was failing instead, there would be no way for the user to understand why all those fields are ignored. Closes #55593 Backport of #55876

dimitris-athanasiou added >bug :ml Machine learning v8.0.0 v7.8.0 labels Apr 28, 2020

przemekwitek self-requested a review April 28, 2020 16:59

przemekwitek approved these changes Apr 28, 2020

View reviewed changes

Merge branch 'master' into validate-at-least-one-feature-available

56eaae4

dimitris-athanasiou merged commit 0666be5 into elastic:master Apr 29, 2020

dimitris-athanasiou deleted the validate-at-least-one-feature-available branch April 29, 2020 07:51

dimitris-athanasiou mentioned this pull request Apr 29, 2020

[7.x][ML] Validate at least one feature is available for DF analytics… #55914

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Validate at least one feature is available for DF analytics #55876

[ML] Validate at least one feature is available for DF analytics #55876

dimitris-athanasiou commented Apr 28, 2020

elasticmachine commented Apr 28, 2020

przemekwitek left a comment

dimitris-athanasiou commented Apr 29, 2020

[ML] Validate at least one feature is available for DF analytics #55876

[ML] Validate at least one feature is available for DF analytics #55876

Conversation

dimitris-athanasiou commented Apr 28, 2020

elasticmachine commented Apr 28, 2020

przemekwitek left a comment

Choose a reason for hiding this comment

dimitris-athanasiou commented Apr 29, 2020