[ML] Using a `script_field` as a `partition_field_name` fails validation in AD advanced job wizard #76075

stevedodson · 2020-08-27T11:47:28Z

Kibana version:

7.8.1

Elasticsearch version:

7.8.1

Server OS version:

macOS 10.15.6

Browser version:

Chrome Version 84.0.4147.125

Browser OS version:

macOS 10.15.6

Original install method (e.g. download page, yum, from source, etc.):

Download pags

Describe the bug:

When specifying a script_field as a partition_field_name the ML anomaly detection job fails to validate.

Steps to reproduce:

Load kibana_sample_data_logs dataset (via Kibana)
Create 'Advanced' ML AD job.
Create datafeed with script field e.g.

{
  "datafeed_id": "",
  "job_id": "",
  "indices": [
    "kibana_sample_data_logs"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  },
  "script_fields": {
    "geo_src_dest": {
      "script": {
        "source": "doc['geo.src'].value+' '+doc['geo.dest'].value"
      }
    }
  }
}

Set partition_field_name to geo_src_dest
Progress wizard - fails at validate phase.

Expected behavior:

Successful creation of the job. For advanced jobs, we should be less strict on validation.

Screenshots (if relevant):

Errors in browser console (if relevant):

Provide logs and/or server output (if relevant):

Any additional context:

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-08-27T11:50:57Z

Pinging @elastic/ml-ui (:ml)

peteharverson · 2020-10-05T11:18:27Z

The issue here is that the step to obtain the cardinality of the partitioning and influencer field(s) to pass to the model memory limit estimate endpoint is failing. There are probably three options here to get a figure to use for the cardinality(s):

Use a hard-coded value, say 500, but this could obviously be far too high (if the field was a boolean for example) or too low.
Pop up a dialog to ask the user what they think the cardinality will be.
Do a query that runs the script over the same time range the other field counts are gathered so that we can estimate the cardinality just as well. The UI would have to get the estimate (like how it gets the estimate for the simple fields).

droberts195 · 2020-10-05T11:40:21Z

@benwtrent please can you come up with some steps to generate the necessary cardinality search to find the cardinality of the script's output field(s) based on the datafeed config, the job config and the time range being used for estimating the cardinality of the simple fields. You can work with @darnautov if you need more background about how it works today and what information is available. The UI team can implement the steps, but need help with the process for generating the appropriate search.

benwtrent · 2020-10-05T13:08:13Z

From what I can tell Cardinality agg supports script fields

If the UI team places a predicate to change the cardinality agg from field: value to script: {lang:painless, script: source} they could provide the cardinality for the named script field to the API call.

benwtrent · 2020-10-05T13:17:42Z

It seems to me that if the ML kibana app keeps an internal map of script_field_name: script_source, it could be used to predicate the cardinality agg format.

Then the resulting agg values could be passed to as normal given their field names.

qn895 · 2020-12-10T15:01:58Z

Closing via #81923

peteharverson added :ml Feature:Anomaly Detection ML anomaly detection labels Aug 27, 2020

peteharverson added bug Fixes for quality problems that affect the customer experience v7.10.0 labels Aug 27, 2020

peteharverson assigned jgowdyelastic Oct 6, 2020

peteharverson added v7.11.0 and removed v7.10.0 labels Oct 13, 2020

peteharverson assigned qn895 and unassigned jgowdyelastic Oct 13, 2020

qn895 mentioned this issue Oct 28, 2020

[ML] Improve support for script and aggregation fields in anomaly detection jobs #81923

Merged

6 tasks

qn895 closed this as completed Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Using a `script_field` as a `partition_field_name` fails validation in AD advanced job wizard #76075

[ML] Using a `script_field` as a `partition_field_name` fails validation in AD advanced job wizard #76075

stevedodson commented Aug 27, 2020

elasticmachine commented Aug 27, 2020

peteharverson commented Oct 5, 2020

droberts195 commented Oct 5, 2020

benwtrent commented Oct 5, 2020

benwtrent commented Oct 5, 2020

qn895 commented Dec 10, 2020

[ML] Using a script_field as a partition_field_name fails validation in AD advanced job wizard #76075

[ML] Using a script_field as a partition_field_name fails validation in AD advanced job wizard #76075

Comments

stevedodson commented Aug 27, 2020

elasticmachine commented Aug 27, 2020

peteharverson commented Oct 5, 2020

droberts195 commented Oct 5, 2020

benwtrent commented Oct 5, 2020

benwtrent commented Oct 5, 2020

qn895 commented Dec 10, 2020

[ML] Using a `script_field` as a `partition_field_name` fails validation in AD advanced job wizard #76075

[ML] Using a `script_field` as a `partition_field_name` fails validation in AD advanced job wizard #76075