Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Using a script_field as a partition_field_name fails validation in AD advanced job wizard #76075

Closed
stevedodson opened this issue Aug 27, 2020 · 6 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Anomaly Detection ML anomaly detection :ml v7.11.0

Comments

@stevedodson
Copy link
Contributor

Kibana version:

7.8.1

Elasticsearch version:

7.8.1

Server OS version:

macOS 10.15.6

Browser version:

Chrome Version 84.0.4147.125

Browser OS version:

macOS 10.15.6

Original install method (e.g. download page, yum, from source, etc.):

Download pags

Describe the bug:

When specifying a script_field as a partition_field_name the ML anomaly detection job fails to validate.

Steps to reproduce:

  1. Load kibana_sample_data_logs dataset (via Kibana)
  2. Create 'Advanced' ML AD job.
  3. Create datafeed with script field e.g.
{
  "datafeed_id": "",
  "job_id": "",
  "indices": [
    "kibana_sample_data_logs"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  },
  "script_fields": {
    "geo_src_dest": {
      "script": {
        "source": "doc['geo.src'].value+' '+doc['geo.dest'].value"
      }
    }
  }
}
  1. Set partition_field_name to geo_src_dest
  2. Progress wizard - fails at validate phase.

Expected behavior:

Successful creation of the job. For advanced jobs, we should be less strict on validation.

Screenshots (if relevant):
image
image
image

Errors in browser console (if relevant):

Provide logs and/or server output (if relevant):

Any additional context:

@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@peteharverson peteharverson added bug Fixes for quality problems that affect the customer experience v7.10.0 labels Aug 27, 2020
@peteharverson
Copy link
Contributor

The issue here is that the step to obtain the cardinality of the partitioning and influencer field(s) to pass to the model memory limit estimate endpoint is failing. There are probably three options here to get a figure to use for the cardinality(s):

  1. Use a hard-coded value, say 500, but this could obviously be far too high (if the field was a boolean for example) or too low.
  2. Pop up a dialog to ask the user what they think the cardinality will be.
  3. Do a query that runs the script over the same time range the other field counts are gathered so that we can estimate the cardinality just as well. The UI would have to get the estimate (like how it gets the estimate for the simple fields).

@droberts195
Copy link
Contributor

@benwtrent please can you come up with some steps to generate the necessary cardinality search to find the cardinality of the script's output field(s) based on the datafeed config, the job config and the time range being used for estimating the cardinality of the simple fields. You can work with @darnautov if you need more background about how it works today and what information is available. The UI team can implement the steps, but need help with the process for generating the appropriate search.

@benwtrent
Copy link
Member

From what I can tell Cardinality agg supports script fields

If the UI team places a predicate to change the cardinality agg from field: value to script: {lang:painless, script: source} they could provide the cardinality for the named script field to the API call.

@benwtrent
Copy link
Member

It seems to me that if the ML kibana app keeps an internal map of script_field_name: script_source, it could be used to predicate the cardinality agg format.

Then the resulting agg values could be passed to as normal given their field names.

@qn895
Copy link
Member

qn895 commented Dec 10, 2020

Closing via #81923

@qn895 qn895 closed this as completed Dec 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Anomaly Detection ML anomaly detection :ml v7.11.0
Projects
None yet
Development

No branches or pull requests

7 participants