-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Improve error handling when using transforms with inference processors #50135
Comments
Pinging @elastic/ml-core (:ml/Transform) |
Yes, 100%, I am surprised that this is not the case now.
At the inference processor level, this is not currently possible. When pipelines are created, there are no opportunities to asynchronously check an index. And we cannot pause the thread that is doing the creation as that effects the stack as a whole. The transform might be able to do this in a round about way. But that is conflating two different features. Transforms might be able to check the first couple of docs at |
In addition to validation, error handling of the bulk index failure should be improved, related to #50122. We already do this for script errors and circuit breaker exceptions. After that fix, such an error would immediately set the transform to failed without going into the retry loop. |
If a pipeline referenced by a transform does not exist, we should not allow the transform to be created. We do allow the pipeline existence check to be skipped with defer_validations, but if the pipeline still does not exist on `_start`, the pipeline will fail to start. relates: #50135
If a pipeline referenced by a transform does not exist, we should not allow the transform to be created. We do allow the pipeline existence check to be skipped with defer_validations, but if the pipeline still does not exist on `_start`, the pipeline will fail to start. relates: elastic#50135
If a pipeline referenced by a transform does not exist, we should not allow the transform to be created. We do allow the pipeline existence check to be skipped with defer_validations, but if the pipeline still does not exist on `_start`, the pipeline will fail to start. relates: #50135
If a pipeline referenced by a transform does not exist, we should not allow the transform to be created. We do allow the pipeline existence check to be skipped with defer_validations, but if the pipeline still does not exist on `_start`, the pipeline will fail to start. relates: elastic#50135
treat resource not found and illegal argument exceptions as irrecoverable error relates #50135
Enhancements made:
errors are booth in I wonder what is left? As @benwtrent mentioned we could fail at start if the pipeline is broken. However, I would use the @sophiec20 anything else? I suggest to close this issue and - if needed - open separate follow up issues for further enhancements. |
treat resource not found and illegal argument exceptions as irrecoverable error relates #50135
Agreed. Appropriate steps have been taken to improve error handling, so closing this ticket. |
Found in 7.6.0-SNAPSHOT
It is possible to create and update a transform that uses a pipeline that does not exist.
For example:
Returns
acknowledged: true
If you start this transform, then we log audit messages saying:
Transform encountered an exception: org.elasticsearch.xpack.transform.transforms.ClientTransformIndexer$BulkIndexingException: Bulk index experienced failures. See the logs of the node running the transform for details. Will attempt again at next scheduled trigger.
And by looking in the logs on disk, we see
[2019-12-12T11:49:24,655][DEBUG][o.e.a.b.T.BulkRequestModifier] [node3] failed to execute pipeline [_none] for document [transform-01/_doc/MTWq3Ia7HoVvarOSxUzsmioAAAAAAAAA] java.lang.IllegalArgumentException: pipeline with id [this-does-not-exist] does not exist
This continuous transform failed after several minutes with
task encountered more than 10 failures; latest failure:
Secondly, if the pipeline refers to a model that does not exist, then we have similar audit messages asking you to look in the log files, which have the error:
Caused by: org.elasticsearch.ResourceNotFoundException: Could not find trained model [this-model-does-not-exist]
This continuous transform also failed after several minutes with
task encountered more than 10 failures; latest failure:
Should we:
(although reindex does not do this.)
Regardless of whether we validate the pipeline and/or model exists, we should surface the errors as audit messages so that the user does not have to look at log files on disk.
(note,
if you use reindex with a pipeline that does not exist, it returns all docs as 400 failures
if you use reindex with a pipeline that exists but refers to a model that does not exist, it returns all docs as 404 failures
cc @hendrikmuhs @benwtrent
The text was updated successfully, but these errors were encountered: