Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Show model plot info for partitions that do not contain anomalies #53839

Conversation

darnautov
Copy link
Contributor

Summary

Resolves #52541. Allow selection of partitions that do not contain anomalies.

In the original solution to populate a partitions dropdown, there were a fetching of the first 500 records from .ml-anomalies-* and extracting partition values. Eventually, many options were omitted. I've created an endpoint to retrieve distinct partitions with aggregations(for now, I limited each aggregation to 100). In the follow-up PR for #52618, I'll extend this endpoint to support wildcard queries for partitions.

Checklist

@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@@ -402,11 +402,111 @@ export function resultsServiceProvider(callWithRequest) {
return definition;
}

/**
* Gets the record of partition filed - values pairs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: small typo field - value

};
}

function getFieldObject(fieldName, values, field) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should probably define this before it gets used as good practice.

Copy link
Contributor

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a question about the structure of the terms aggregations.

* @param latestMs
* @returns {Promise<*>}
*/
async function getPartitionFieldsValues(criteriaFields, earliestMs, latestMs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the structure of the query+aggs here.
For example partition_field_name will return all field names, and partition_field_values will return all values.
Is the query done so we can be sure there will be only one partition_field_name returned? Otherwise the values could refer to different names.
To be more on the safe side, the aggregations could be done nested so the values agg is nested below the names agg. That way we'd have a definitive relationship between names+values defined in the response data.

@@ -124,6 +124,11 @@ declare interface Ml {

results: {
getMaxAnomalyScore: (jobIds: string[], earliestMs: number, latestMs: number) => Promise<any>;
fetchPartitionFieldsValues: (
criteriaFields: Array<{ fieldName: string; fieldValue: any }>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, I think job ID should be passed as a separate parameter, rather than including it in criteriaFields.

@@ -532,3 +532,11 @@ export function getScheduledEventsByBucket(
})
);
}

export function fetchPartitionFieldsValues(
criteriaFields: Array<{ fieldName: string; fieldValue: any }>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, for consistency with the other functions in here, I think job ID should be passed as a separate parameter, rather than including it in criteriaFields.

over_field: overField,
by_field: byField,
} = await mlResultsService
.fetchPartitionFieldsValues(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above here needs amending, as it no longer no populates the dropdowns with the values for the top records.

};
}),
{
range: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented in the follow-up, should consider adding an extra term to the filter here, for result_type, depending on whether model plot has been enabled or not.

@darnautov darnautov closed this Jan 6, 2020
@darnautov
Copy link
Contributor Author

Check #53879

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Single metric model plot info should be shown for partitions that do not contain anomalies
6 participants