-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Delete forecast API (#31134) #33218
[ML] Delete forecast API (#31134) #33218
Conversation
Pinging @elastic/ml-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Left some minor comments and a bigger one about whether we should handle deletion of multiple forecasts at once.
@@ -161,7 +161,8 @@ | |||
public static final String REST_JOB_NOT_CLOSED_REVERT = "Can only revert to a model snapshot when the job is closed."; | |||
public static final String REST_NO_SUCH_MODEL_SNAPSHOT = "No model snapshot with id [{0}] exists for job [{1}]"; | |||
public static final String REST_START_AFTER_END = "Invalid time range: end time ''{0}'' is earlier than start time ''{1}''."; | |||
|
|||
public static final String REST_NO_SUCH_FORECAST = "No forecast with id [{0}] exists for job [{1}]"; | |||
public static final String REST_BAD_FORECAST_STATE = "Forecast [{0}] for job [{1}] needs to be either FAILED or FINISHED to be deleted"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name could be more descriptive here. Something like REST_CANNOT_DELETE_FORECAST_IN_CURRENT_STATE
.
return; | ||
} | ||
|
||
if (DELETABLE_STATUSES.contains(forecastRequestStats.getStatus())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could reduce the indentation here by checking if the status is not
deletable first, then proceed in doing the deletion. I leave it on your preference whether you want to change it.
DeleteByQueryRequest deleteByQueryRequest = buildDeleteByQuery(jobId, forecastId); | ||
executeAsyncWithOrigin(client, ML_ORIGIN, DeleteByQueryAction.INSTANCE, deleteByQueryRequest, ActionListener.wrap( | ||
response -> { | ||
if (response.getDeleted() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle error paths here. We need to check if the request timed out and we need to check if we got any bulk failures. I noticed we don't do that from the expired data removers, but arguably we should also be doing better error checking in those.
@Override | ||
protected RestChannelConsumer prepareRequest(RestRequest restRequest, NodeClient client) throws IOException { | ||
String jobId = restRequest.param(Job.ID.getPreferredName()); | ||
String forecastId = restRequest.param(Forecast.FORECAST_ID.getPreferredName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the delete may take a while, we should be allowing for timeouts. See datafeed deletion as an example.
public static class Request extends ActionRequest { | ||
|
||
private String jobId; | ||
private String forecastId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contrary to the other delete requests we currently have, this one deletes results instead of resources. I can imagine the UI allowing users to select multiple forecasts and then delete them in bulk. It would be a pain for the UI to have to perform a request per forecast. I think we should consider allowing this to handle deletion of multiple forecasts. Users can pass a comma separated list of forecast IDs.
@@ -276,6 +278,55 @@ public void testOverflowToDisk() throws Exception { | |||
|
|||
} | |||
|
|||
public void testDelete() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add a YML test in forecast.yml
. Ping me if you need a tutorial on how those work.
Also, don't forget to add an entry to add this API to the client once merged in. |
@dimitris-athanasiou most definitely :) |
.minimumShouldMatch(1) | ||
.must(QueryBuilders.termsQuery(Result.RESULT_TYPE.getPreferredName(), ForecastRequestStats.RESULT_TYPE_VALUE)) | ||
.should(QueryBuilders.boolQuery() | ||
.must(QueryBuilders.termQuery(Job.ID.getPreferredName(), jobId)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the forecastId
is not _all
, I think we should add a terms query here with the requested forecast IDs. It makes this more efficient, it saves having to filter the forecast IDs later on and it dodges the 10K limit in the odd case.
.should(QueryBuilders.boolQuery() | ||
.must(QueryBuilders.termQuery(Job.ID.getPreferredName(), jobId)))) | ||
.size(MAX_FORECAST_TO_SEARCH); | ||
SearchRequest searchRequest = new SearchRequest(RESULTS_INDEX_PATTERN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should set AnomalyDetectorsIndex.jobResultsAliasedName
. The aliases contain a filter on the job_id
which means we won't match forecasts of other jobs. This is important to dodge the 10K limit.
ActionListener<SearchResponse> forecastStatsHandler = ActionListener.wrap( | ||
searchResponse -> deleteForecasts(searchResponse, request, listener), | ||
e -> listener.onFailure(new ElasticsearchException("An error occurred while searching forecasts to delete", e))); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the forecast_id is _all
, we need to check the cluster setting action.destructive_requires_name
(see https://www.elastic.co/guide/en/elasticsearch/guide/current/_deleting_an_index.html). If that is true
, we shouldn't allow _all
.
e -> listener.onFailure(new ElasticsearchException("An error occurred while searching forecasts to delete", e))); | ||
|
||
SearchSourceBuilder source = new SearchSourceBuilder(); | ||
source.query(QueryBuilders.boolQuery() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query here should be in a filter context (see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html). When we are not interested in scoring, we should be using the filter context.
.minimumShouldMatch(1) | ||
.must(QueryBuilders.termsQuery(Result.RESULT_TYPE.getPreferredName(), ForecastRequestStats.RESULT_TYPE_VALUE)) | ||
.should(QueryBuilders.boolQuery() | ||
.must(QueryBuilders.termQuery(Job.ID.getPreferredName(), jobId)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be necessary when we use the job results index alias (see below).
new TimeoutException("Unable to delete all requested forecasts. Deleted a total of " + response.getDeleted())); | ||
return; | ||
} | ||
if (response.getBulkFailures().isEmpty() && response.getSearchFailures().isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is checking there are no failures. It should be changed to check there are failures.
DeleteByQueryRequest request = new DeleteByQueryRequest(searchRequest) | ||
.setAbortOnVersionConflict(false) | ||
.setSize(MAX_FORECAST_TO_SEARCH) | ||
.setSlices(5); | ||
|
||
searchRequest.indices(RESULTS_INDEX_PATTERN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, use job results index alias here.
.must(QueryBuilders.termQuery(Job.ID.getPreferredName(), jobId)) | ||
.must(QueryBuilders.termQuery(Forecast.FORECAST_ID.getPreferredName(), forecastId))); | ||
.must(QueryBuilders.termQuery(Forecast.FORECAST_ID.getPreferredName(), forecastToDelete))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use a termsQuery
instead where you pass all the IDs and you don't have to do the for loop.
"allow_no_forecasts": { | ||
"type": "boolean", | ||
"required": false, | ||
"description": "Whether to ignore if `_all` or `*` matches no forecasts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also remove *
from here.
- do: | ||
headers: | ||
Authorization: "Basic eF9wYWNrX3Jlc3RfdXNlcjp4LXBhY2stdGVzdC1wYXNzd29yZA==" # run as x_pack_rest_user, i.e. the test setup superuser | ||
xpack.ml.post_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this setup being necessary for any of the tests. However, it would be nice to test the delete actually removes forecasts. You will need to index a forecast manually (a forecast stats doc plus a couple of forecast docs). We follow a similar approach in the YAML tests of the results.
return; | ||
} | ||
if (response.getBulkFailures().isEmpty() && response.getSearchFailures().isEmpty()) { | ||
if ((response.getBulkFailures().isEmpty() && response.getSearchFailures().isEmpty()) == false) { | ||
Tuple<RestStatus, Throwable> statusAndReason = getStatusAndReason(response); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed that we should ignore version conflict exceptions as they imply the document has already been deleted. Won't those be included in the bulk failures? If yes, I don't see where we filter them out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitris-athanasiou I tested with ignore false and true. When the ignore is true, no exception is thrown due to version conflict, however, if the ignore is false, I do get an exception thrown. So, I don't think any filtering is necessary from my testing.
I added a test that fails when that field is false (as it does not expect an exception), and passes when true so that we can be alerted through regression if this behavior changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which ignore
are you referring to?
|
||
if (MetaData.ALL.equals(forecastsExpression) || Regex.isMatchAllPattern(forecastsExpression)) { | ||
return new HashSet<>(allStats); | ||
XContentParser parser = XContentFactory.xContent(XContentType.JSON).createParser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser should be created in a try-with-resource.
Almost there! Left 2 more comments that for some reason github shows as outdated. |
XContentParser parser = XContentFactory.xContent(XContentType.JSON).createParser( | ||
NamedXContentRegistry.EMPTY, DeprecationHandler.THROW_UNSUPPORTED_OPERATION, hit.getSourceRef().streamInput()); | ||
allStats.add(ForecastRequestStats.STRICT_PARSER.apply(parser, null)); | ||
try (InputStream stream = hit.getSourceRef().streamInput(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we don't need the stream in a separate variable. The parser owns it and will handle closing the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I reverted this commit from 6.x via bf01cda because it broke compilation there. |
* Delete forecast API (elastic#31134)
…e-default-distribution * elastic/master: (213 commits) ML: Fix build after HLRC change Fix inner hits retrieval when stored fields are disabled (_none_) (elastic#33018) SQL: Show/desc commands now support table ids (elastic#33363) Mute testValidateFollowingIndexSettings HLRC: Add delete by query API (elastic#32782) [ML] The sort field on get records should default to the record_score (elastic#33358) [ML] Minor improvements to categorization Grok pattern creation (elastic#33353) [DOCS] fix a couple of typos (elastic#33356) Disable assemble task instead of removing it (elastic#33348) Simplify the return type of FieldMapper#parse. (elastic#32654) [ML] Delete forecast API (elastic#31134) (elastic#33218) Introduce private settings (elastic#33327) [Docs] Add search timeout caveats (elastic#33354) TESTS: Fix Race Condition in Temp Path Creation (elastic#33352) Fix from_range in search_after in changes snapshot (elastic#33335) TESTS+DISTR.: Fix testIndexCheckOnStartup Flake (elastic#33349) Null completion field should not throw IAE (elastic#33268) Adds code to help with IndicesRequestCacheIT failures (elastic#33313) Prevent NPE parsing the stop datafeed request. (elastic#33347) HLRC: Add ML get overall buckets API (elastic#33297) ...
Integration tests added to test the action. Verified that the API is published appropriately and works manually as well.
Addresses feature request: (#31134)