Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] DF Analytics job creation: Add 'excludes' input field to form #53856

Conversation

alvarezmelissa87
Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 commented Dec 30, 2019

Summary

Related meta issues:

#47890
#51310

Adds Excludes field to creation form that allows user to select fields or create custom patterns to exclude from analysis. All supported fields are included by default. Only supported field types for that type of job are available for selection in Excludes input (as unsupported fields will already be excluded).

Also updates supported types for dependent_variable for each type of job according to the docs: https://www.elastic.co/guide/en/elasticsearch/reference/master/ml-dfa-analysis-objects.html#regression-resources-standard

Adds error message for when a dependent_variable with more than 2 distinct values is selected for a classification job.

image

image

image

image

An initial iteration had both an Includes and an Excludes input but felt more prone to letting the user create an invalid job config since the excludes list has to specify which fields from the includes to exclude.

For example, two inputs allow you to create something like

"analyzed_fields" : {
        "includes" : [
          "Carrier",
          "dayOfWeek",
          "DestWeather",
          "Cancelled"
        ],
        "excludes" : [
          "AvgTicketPrice"
        ]
      },

which will break with an illegal_state_exception with reason: null when the the _start endpoint is hit.

When what you need is

"analyzed_fields" : {
        "includes" : [
          "dayOfWeek",
          "DestWeather",
          "Cancelled"
        ],
        "excludes" : [
          "dayOfWeek"
        ]
      }

That approach could work but it would all have to be handled on the front end as the returned error doesn't really give us a useful message to present to the user as to what the problem is.

It felt a lot more intuitive to just include all by default and only exclude undesired fields. The user can also work with the advanced editor for additional flexibility.

I'm happy to hear any feedback on this choice and update the way this works if anyone feels strongly about having both options in the form itself.

Checklist

Use strikethroughs to remove checklist items you don't feel are applicable to this PR.

For maintainers

- [ ] This was checked for breaking API changes and was labeled appropriately
- [ ] This includes a feature addition or change that requires a release note and was labeled appropriately

@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@kibanamachine
Copy link
Contributor

💔 Build Failed

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just added some minor suggestions. One more thought: As the code gets more complex with alle the different job types and options, it might be a good opportunity to make shouldAddFieldOption() and shouldAddAsDepVarOption() pure functions, move them out of the component into the main module scope and export them so we can create unit tests for them.

'Optionally select fields to be excluded from analysis. All other supported fields will be included',
})}
>
<EuiComboBox
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clear this if the user changes the source index? That seems to be the behavior with the dependent variable input, which is cleared when the source index changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be the behavior. Reset when the sourceIndex changes change made here: 24bdeb3

@alvarezmelissa87 alvarezmelissa87 force-pushed the ml-df-analytics-include-exclude branch from fe2183d to 5fb02c5 Compare January 6, 2020 21:18
Copy link
Contributor

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM!

@darnautov
Copy link
Contributor

LGTM 👍

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested latest changes and LGTM

@alvarezmelissa87
Copy link
Contributor Author

retest

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@alvarezmelissa87 alvarezmelissa87 merged commit 865462a into elastic:master Jan 7, 2020
alvarezmelissa87 added a commit to alvarezmelissa87/kibana that referenced this pull request Jan 7, 2020
…lastic#53856)

* Add exclude fields input to df analytics creation form

* rename explain api endpoint to general explainDataFrameAnalytics

* wip: use explain api for exclude fields

* show error message if classification depVar has cardinality of > 2

* update types

* updates after conflict resolution

* prevent creation if more than 2 distinct classes for class job. create form field validation file
@alvarezmelissa87 alvarezmelissa87 deleted the ml-df-analytics-include-exclude branch January 7, 2020 15:52
alvarezmelissa87 added a commit that referenced this pull request Jan 7, 2020
…53856) (#54136)

* Add exclude fields input to df analytics creation form

* rename explain api endpoint to general explainDataFrameAnalytics

* wip: use explain api for exclude fields

* show error message if classification depVar has cardinality of > 2

* update types

* updates after conflict resolution

* prevent creation if more than 2 distinct classes for class job. create form field validation file
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jan 7, 2020
…le-saved-objects

* 'master' of github.com:elastic/kibana: (86 commits)
  [Uptime] Added date range filter into expanded list query (elastic#52609)
  [SIEM] Add react/display-name eslint rule (elastic#53107)
  [SIEM] Enable eslint prefer-template rule (elastic#53983)
  Elasticsearch snapshots automation (elastic#53706)
  [SIEM] Enable eslint react/no-children-prop (elastic#53985)
  [DOCS][Reporting] Adds info about using a custom reporting index (elastic#54052)
  Changing background color to align with EUI color (elastic#54060)
  Fix linting issues (elastic#54068)
  NP Migration: Move doc views registry and existing doc views into discover plugin (elastic#53465)
  [ML] DF Analytics job creation: Add 'excludes' input field to form (elastic#53856)
  EMT-issue-65: add endpoint list api (elastic#53861)
  [SIEM] Fix doubled drag handles in Timeline (elastic#52679)
  Functional tests: refactor visualize_page (elastic#53845)
  [Dashboard] Redesign empty screen in readonly mode (elastic#54073)
  [ML] Support search for partitions on Single Metric Viewer (elastic#53879)
  update apm index pattern (elastic#54095)
  Add data ui for envoyproxy Metricbeat Module (elastic#53476)
  [ML] persist the brush when expanded to full width (elastic#54020)
  Skip failing test (elastic#54100)
  [APM] Show errors on the timeline instead of under the transaction (elastic#53756)
  ...

# Conflicts:
#	src/legacy/core_plugins/console/public/np_ready/application/contexts/services_context.tsx
#	src/legacy/core_plugins/console/public/np_ready/application/index.tsx
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jan 7, 2020
…/kibana into feature/console-saved-objects

* 'feature/console-saved-objects' of github.com:jloleysens/kibana: (86 commits)
  [Uptime] Added date range filter into expanded list query (elastic#52609)
  [SIEM] Add react/display-name eslint rule (elastic#53107)
  [SIEM] Enable eslint prefer-template rule (elastic#53983)
  Elasticsearch snapshots automation (elastic#53706)
  [SIEM] Enable eslint react/no-children-prop (elastic#53985)
  [DOCS][Reporting] Adds info about using a custom reporting index (elastic#54052)
  Changing background color to align with EUI color (elastic#54060)
  Fix linting issues (elastic#54068)
  NP Migration: Move doc views registry and existing doc views into discover plugin (elastic#53465)
  [ML] DF Analytics job creation: Add 'excludes' input field to form (elastic#53856)
  EMT-issue-65: add endpoint list api (elastic#53861)
  [SIEM] Fix doubled drag handles in Timeline (elastic#52679)
  Functional tests: refactor visualize_page (elastic#53845)
  [Dashboard] Redesign empty screen in readonly mode (elastic#54073)
  [ML] Support search for partitions on Single Metric Viewer (elastic#53879)
  update apm index pattern (elastic#54095)
  Add data ui for envoyproxy Metricbeat Module (elastic#53476)
  [ML] persist the brush when expanded to full width (elastic#54020)
  Skip failing test (elastic#54100)
  [APM] Show errors on the timeline instead of under the transaction (elastic#53756)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants