-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] DFA job gets stuck when no field except the dependent variable is included in the analysis #55593
Comments
Pinging @elastic/ml-core (:ml) |
@blookot Could you please explain how you're indexing the data? |
I'm loading the csv file using data visualizer @dimitris-athanasiou |
Thank you @blookot. I have reproduced the issue. You have uncovered a bug that is caused because there are no features in this dataset. There is only the dependent_variable. I think there are 2 issues to fix here:
We'll proceed to fix them both. Once again, thank you for reporting this. It helps us make the feature better! |
Hi @dimitris-athanasiou |
PS. CPU is running at 100% (on my ML node) until I stop the job! |
Indeed, your use case is a time series analysis. You can use an anomaly detection job to model the data and then use the forecast feature in order to predict when the disk will be full. Having said that, we're planning to revisit
Thanks for the note! I noticed that too. We'll make sure to fix this issue. |
yes i've been playing (successfully) with single metric & forecast |
Just to add to this, the regression model we use isn't immediately well suited to extrapolation, as needed for forecasting. To get it to work in this fashion needs some explicit handling in inference and also judicious feature creation. As @dimitris-athanasiou says, using this functionality to enhance our forecasting capabilities (particularly to include additional explanatory variables) is definitely something on the roadmap. |
We were previously checking at least one supported field existed when the _explain API was called. However, in the case of analyses with required fields (e.g. regression) we were not accounting that the dependent variable is not a feature and thus if the source index only contains the dependent variable field there are no features to train a model on. This commit adds a validation that at least one feature is available for analysis. Note that we also move that validation away from `ExtractedFieldsDetector` and the _explain API and straight into the _start API. The reason for doing this is to allow the user to use the _explain API in order to understand why they would be seeing an error like this one. For example, the user might be using an index that has fields but they are of unsupported types. If they start the job and get an error that there are no features, they will wonder why that is. Calling the _explain API will show them that all their fields are unsupported. If the _explain API was failing instead, there would be no way for the user to understand why all those fields are ignored. Closes elastic#55593
) We were previously checking at least one supported field existed when the _explain API was called. However, in the case of analyses with required fields (e.g. regression) we were not accounting that the dependent variable is not a feature and thus if the source index only contains the dependent variable field there are no features to train a model on. This commit adds a validation that at least one feature is available for analysis. Note that we also move that validation away from `ExtractedFieldsDetector` and the _explain API and straight into the _start API. The reason for doing this is to allow the user to use the _explain API in order to understand why they would be seeing an error like this one. For example, the user might be using an index that has fields but they are of unsupported types. If they start the job and get an error that there are no features, they will wonder why that is. Calling the _explain API will show them that all their fields are unsupported. If the _explain API was failing instead, there would be no way for the user to understand why all those fields are ignored. Closes #55593
…#55876) (#55914) We were previously checking at least one supported field existed when the _explain API was called. However, in the case of analyses with required fields (e.g. regression) we were not accounting that the dependent variable is not a feature and thus if the source index only contains the dependent variable field there are no features to train a model on. This commit adds a validation that at least one feature is available for analysis. Note that we also move that validation away from `ExtractedFieldsDetector` and the _explain API and straight into the _start API. The reason for doing this is to allow the user to use the _explain API in order to understand why they would be seeing an error like this one. For example, the user might be using an index that has fields but they are of unsupported types. If they start the job and get an error that there are no features, they will wonder why that is. Calling the _explain API will show them that all their fields are unsupported. If the _explain API was failing instead, there would be no way for the user to understand why all those fields are ignored. Closes #55593 Backport of #55876
Elasticsearch version (
bin/elasticsearch --version
): 7.6.2JVM version (
java -version
): running on ESSDescription of the problem including expected versus actual behavior:
i'm runnng a regression data frame analytics job and it stops at 50% (loading data is 100% and analyzing is 0%)
can't understand why...
Steps to reproduce:
Here is an example of ml job:
logs don't tell anything:
disk_usage.txt
The text was updated successfully, but these errors were encountered: