[ML] Implement new rules design #31110

dimitris-athanasiou · 2018-06-05T13:50:26Z

Rules allow users to supply a detector with domain
knowledge that can improve the quality of the results.
The model detects statistically anomalous results but it
has no knowledge of the meaning of the values being modelled.

For example, a detector that performs a population analysis
over IP addresses could benefit from a list of IP addresses
that the user knows to be safe. Then anomalous results for
those IP addresses will not be created and will not affect
the quantiles either.

Another example would be a detector looking for anomalies
in the median value of CPU utilization. A user might want
to inform the detector that any results where the actual
value is less than 5 is not interesting.

This commit introduces a custom_rules field to the Detector.
A detector may have multiple rules which are combined with or.

A rule has 3 fields: actions, scope and conditions.

Actions is a list of what should happen when the rule applies.
The current options include skip_result and skip_model_update.
The default value for actions is the skip_result action.

Scope is optional and allows for applying filters on any of the
partition/over/by field. When not defined the rule applies to
all series. The filter_id needs to be specified to match the id
of the filter to be used. Optionally, the filter_type can be specified
as either include (default) or exclude. When set to include
the rule applies to entities that are in the filter. When set to
exclude the rule only applies to entities not in the filter.

There may be zero or more conditions. A condition requires applies_to,
operator and value to be specified. The applies_to value can be
either actual, typical or diff_from_typical and it specifies
the numerical value to which the condition applies. The operator
(lt, lte, gt, gte) and value complete the definition.
Conditions are combined with and and allow to specify numerical
conditions for when a rule applies.

A rule must either have a scope or one or more conditions. Finally,
a rule with scope and conditions applies when all of them apply.

elasticmachine · 2018-06-05T13:50:28Z

Pinging @elastic/ml-core

dimitris-athanasiou · 2018-06-05T14:52:02Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportOpenJobAction.java

@@ -700,6 +704,18 @@ void setMaxMachineMemoryPercent(int maxMachineMemoryPercent) {
        }
    }

+    // Visible for testing
+    static void checkJobWithRulesRequiresMinVersionOnAllNodes(String jobId, ClusterState clusterState) {


@droberts195 Actually, I think this is not enough. In the case of a job relocating, validate will not be called, right? So if at that point there's a node running on a version earlier than 6.4 it may be assigned to it. I think I need to add a check in the node selection code as well.

Yes, if we go down the road of banning use of functionality until the entire cluster is on a particular version then we need to do it when the job is created, not when it's opened.

If we can't do this for some reason and need to do a check when opening a job then it needs to be in the allocation decision, so we might as well then use our existing functionality that takes job version into account when allocating to nodes.

The problem with checking when the job is created is we then also need to ban earlier nodes from joining the cluster. It feels too excessive to do that for this. I'll switch to rejecting incompatible nodes from taking on jobs with rules.

davidkyle · 2018-06-05T14:43:40Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/FilterRef.java

+    public static final ConstructingObjectParser<FilterRef, Void> CONFIG_PARSER =
+            new ConstructingObjectParser<>(FILTER_REF_FIELD.getPreferredName(), false,
+                    a -> new FilterRef((String) a[0], (FilterType) a[1]));
+    public static final Map<MlParserType, ConstructingObjectParser<FilterRef, Void>> PARSERS = new EnumMap<>(MlParserType.class);


Do you need this map if the individual parsers are public static?

This follows the same pattern we've been using in other classes to ensure unknown fields are ignored while parsing from cluster state but still be strict when parsing from a user request.

davidkyle · 2018-06-05T14:45:30Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/FilterRef.java

+        return Objects.hash(filterId, filterType);
+    }
+
+    public String getFilterId() {


and public FilterType getFilterType()...

davidkyle · 2018-06-05T15:16:51Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/DetectionRule.java

-            parser.declareField(Builder::setConditionsConnective, p -> {
-                if (p.currentToken() == XContentParser.Token.VALUE_STRING) {
-                    return Connective.fromString(p.text());
+            parser.declareObject(Builder::setScope, (p, c) -> {


It would be neater to create a parser for RuleScope and use that here

I couldn't figure how to do that given that it has no named fields. Any ideas?

I have at least extracted the code into a parser method in RuleScope. Looks cleaner now.

droberts195 · 2018-06-06T09:43:50Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/DetectionRule.java

-            for (RuleAction action : actions) {
-                action.writeTo(out);
-            }
+        if (out.getVersion().onOrAfter(Version.V_6_4_0)) {


There's a lack of symmetry here. The stream input constructor doesn't check the version so either this shouldn't either or both should. Not checking would be reasonable if the protection of old versions is done at a higher level.

This is preventing a rule to be written out to a node that doesn't know how to read it. When it comes to reading there is no issue as it would only read from nodes that are >= 6.4. Am I missing something?

OK, on the input side we're relying on the rules list in 6.3 and below being an empty list. We discussed that and said it was fine because nobody should have used a rule before 6.4, and if they did it was unsupported and hence acceptable for the transport to break.

On the output side I think a change is necessary though. The way it is at the moment if you have a list of 2 rules in 6.4 then that will be serialised as a list of 2 objects each with no fields. Instead I think it needs to be serialised as an empty list, because 6.3 won't understand the objects with no fields. (Let me know if that's wrong.)

If my thinking is correct then this version check can be removed but instead we need one in Detector.java around this line:

out.writeList(rules);

If the remote node is below version 6.4 it needs to write an empty list instead of rules.

You're absolutely right. Will push a fix for this shortly.

droberts195 · 2018-06-06T10:12:36Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/Detector.java

+                    case DIFF_FROM_TYPICAL:
+                        return true;
+                    case TIME:
+                    default:


Should default throw an exception rather than return false? If a future enum value is added we don't know what would be most appropriate to return here.

droberts195 · 2018-06-06T10:26:41Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportOpenJobAction.java

@@ -700,6 +704,18 @@ void setMaxMachineMemoryPercent(int maxMachineMemoryPercent) {
        }
    }

+    // Visible for testing
+    static void checkJobWithRulesRequiresMinVersionOnAllNodes(String jobId, ClusterState clusterState) {


Yes, if we go down the road of banning use of functionality until the entire cluster is on a particular version then we need to do it when the job is created, not when it's opened.

If we can't do this for some reason and need to do a check when opening a job then it needs to be in the allocation decision, so we might as well then use our existing functionality that takes job version into account when allocating to nodes.

droberts195 · 2018-06-06T14:05:29Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/Detector.java

+                    case TYPICAL:
+                    case DIFF_FROM_TYPICAL:
+                        return true;
+                    case TIME:


Now default is throwing an exception, does TIME need a separate return false?

Good catch. Also realised I'm missing tests here. I'll add them in together with the fix.

Rules allow users to supply a detector with domain knowledge that can improve the quality of the results. The model detects statistically anomalous results but it has no knowledge of the meaning of the values being modelled. For example, a detector that performs a population analysis over IP addresses could benefit from a list of IP addresses that the user knows to be safe. Then anomalous results for those IP addresses will not be created and will not affect the quantiles either. Another example would be a detector looking for anomalies in the median value of CPU utilization. A user might want to inform the detector that any results where the actual value is less than 5 is not interesting. This commit introduces a `custom_rules` field to the `Detector`. A detector may have multiple rules which are combined with `or`. A rule has 3 fields: `actions`, `scope` and `conditions`. Actions is a list of what should happen when the rule applies. The current options include `skip_result` and `skip_model_update`. The default value for `actions` is the `skip_result` action. Scope is optional and allows for applying filters on any of the partition/over/by field. When not defined the rule applies to all series. The `filter_id` needs to be specified to match the id of the filter to be used. Optionally, the `filter_type` can be specified as either `include` (default) or `exclude`. When set to `include` the rule applies to entities that are in the filter. When set to `exclude` the rule only applies to entities not in the filter. Conditions can be zero or more. A condition requires `applies_to` and `condition` to be specified. The `applied_to` value can be either `actual`, `typical` or `diff_from_typical` and it specifies the numerical value to which the `condition` applies. The `condition` has an `operator` (`lt`, `lte`, `gt`, `gte`) and a numerical `value`. Conditions are combined with `and` and allow to specify numerical conditions for then a rule applies. A rule must either have a scope or one or more conditions. Finally, a rule with scope and conditions applies when all of them apply.

dimitris-athanasiou · 2018-06-12T16:25:08Z

retest this please

* master: Remove RestGetAllAliasesAction (#31308) Temporary fix for broken build Reenable Checkstyle's unused import rule (#31270) Remove remaining unused imports before merging #31270 Fix non-REST doc snippet [DOC] Extend SQL docs Immediately flush channel after writing to buffer (#31301) [DOCS] Shortens ML API intros Use quotes in the call invocation (#31249) move security ingest processors to a sub ingest directory (#31306) Add 5.6.11 version constant. Fix version detection. SQL: Whitelist SQL utility class for better scripting (#30681) [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (#31299) CCS: don't proxy requests for already connected node (#31273) Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap [test] opensuse packaging turn up debug logging Add unreleased version 6.3.1 Removes experimental tag from scripted_metric aggregation (#31298) [Rollup] Metric config parser must use builder so validation runs (#31159) [ML] Check licence when datafeeds use cross cluster search (#31247) Add notion of internal index settings (#31286) Test: Remove broken yml test feature (#31255) REST hl client: cluster health to default to cluster level (#31268) [ML] Update test thresholds to account for changes to memory control (#31289) Log warnings when cluster state publication failed to some nodes (#31233) Fix AntFixture waiting condition (#31272) Ignore numeric shard count if waiting for ALL (#31265) [ML] Implement new rules design (#31110) index_prefixes back-compat should test 6.3 (#30951) Core: Remove plain execute method on TransportAction (#30998) Update checkstyle to 8.10.1 (#31269) Set analyzer version in PreBuiltAnalyzerProviderFactory (#31202) Modify pipelining handlers to require full requests (#31280) Revert upgrade to Netty 4.1.25.Final (#31282) Use armored input stream for reading public key (#31229) Fix Netty 4 Server Transport tests. Again. REST hl client: adjust wait_for_active_shards param in cluster health (#31266) REST high-level Client: remove deprecated API methods (#31200) [DOCS] Mark SQL feature as experimental [DOCS] Updates machine learning custom URL screenshots (#31222) Fix naming conventions check for XPackTestCase Fix security Netty 4 transport tests Fix race in clear scroll (#31259) [DOCS] Clarify audit index settings when remote indexing (#30923) Delete typos in SAML docs (#31199) REST high-level client: add Cluster Health API (#29331) [ML][TEST] Mute tests using rules (#31204) Support RequestedAuthnContext (#31238) SyncedFlushResponse to implement ToXContentObject (#31155) Add Get Aliases API to the high-level REST client (#28799) Remove some line length supressions (#31209) Validate xContentType in PutWatchRequest. (#31088) [INGEST] Interrupt the current thread if evaluation grok expressions take too long (#31024) Suppress extras FS on caching directory tests Revert "[DOCS] Added 6.3 info & updated the upgrade table. (#30940)" Revert "Fix snippets in upgrade docs" Fix snippets in upgrade docs [DOCS] Added 6.3 info & updated the upgrade table. (#30940) LLClient: Support host selection (#30523) Upgrade to Netty 4.1.25.Final (#31232) Enable custom credentials for core REST tests (#31235) Move ESIndexLevelReplicationTestCase to test framework (#31243) Encapsulate Translog in Engine (#31220) HLRest: Add get index templates API (#31161) Remove all unused imports and fix CRLF (#31207) [Tests] Fix self-referencing tests [TEST] Fix testRecoveryAfterPrimaryPromotion [Docs] Remove mention pattern files in Grok processor (#31170) Use stronger write-once semantics for Azure repository (#30437) Don't swallow exceptions on replication (#31179) Limit the number of concurrent requests per node (#31206) Call ensureNoSelfReferences() on _agg state variable after scripted metric agg script executions (#31044) Move java version checker back to its own jar (#30708) [test] add fix for rare virtualbox error (#31212)

Rules allow users to supply a detector with domain knowledge that can improve the quality of the results. The model detects statistically anomalous results but it has no knowledge of the meaning of the values being modelled. For example, a detector that performs a population analysis over IP addresses could benefit from a list of IP addresses that the user knows to be safe. Then anomalous results for those IP addresses will not be created and will not affect the quantiles either. Another example would be a detector looking for anomalies in the median value of CPU utilization. A user might want to inform the detector that any results where the actual value is less than 5 is not interesting. This commit introduces a `custom_rules` field to the `Detector`. A detector may have multiple rules which are combined with `or`. A rule has 3 fields: `actions`, `scope` and `conditions`. Actions is a list of what should happen when the rule applies. The current options include `skip_result` and `skip_model_update`. The default value for `actions` is the `skip_result` action. Scope is optional and allows for applying filters on any of the partition/over/by field. When not defined the rule applies to all series. The `filter_id` needs to be specified to match the id of the filter to be used. Optionally, the `filter_type` can be specified as either `include` (default) or `exclude`. When set to `include` the rule applies to entities that are in the filter. When set to `exclude` the rule only applies to entities not in the filter. There may be zero or more conditions. A condition requires `applies_to`, `operator` and `value` to be specified. The `applies_to` value can be either `actual`, `typical` or `diff_from_typical` and it specifies the numerical value to which the condition applies. The `operator` (`lt`, `lte`, `gt`, `gte`) and `value` complete the definition. Conditions are combined with `and` and allow to specify numerical conditions for when a rule applies. A rule must either have a scope or one or more conditions. Finally, a rule with scope and conditions applies when all of them apply.

* 6.x: Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360) [ML] Implement new rules design (#31110) (#31294) Remove RestGetAllAliasesAction (#31308) CCS: don't proxy requests for already connected node (#31273) Rankeval: Fold template test project into main module (#31203) [Docs] Remove reference to repository-s3 plugin creating an S3 bucket (#31359) More detailed tracing when writing metadata (#31319) Add details section for dcg ranking metric (#31177)

dimitris-athanasiou added >feature review v7.0.0 :ml Machine learning v6.4.0 labels Jun 5, 2018

dimitris-athanasiou mentioned this pull request Jun 5, 2018

[ML] Implement new rules design elastic/ml-cpp#119

Merged

dimitris-athanasiou commented Jun 5, 2018

View reviewed changes

davidkyle reviewed Jun 6, 2018

View reviewed changes

droberts195 reviewed Jun 6, 2018

View reviewed changes

dimitris-athanasiou force-pushed the implement-new-rules-design branch from 0f0710d to 76d1696 Compare June 12, 2018 10:41

dimitris-athanasiou added 7 commits June 12, 2018 14:33

Flatten rule condition object

b88ed36

Address review comments

2898567

Fix time condition function support check

e8b4282

Write empty rules list to nodes with earlier versions

cebce61

Extract RuleScope parser

cb6f3be

Add comment why Operator.EQ was not added

fde79b3

Fix unit test

4d3ff56

dimitris-athanasiou force-pushed the implement-new-rules-design branch from 76d1696 to 4d3ff56 Compare June 12, 2018 21:21

dimitris-athanasiou merged commit 5c77ebe into elastic:master Jun 13, 2018

dimitris-athanasiou deleted the implement-new-rules-design branch June 13, 2018 10:20

dimitris-athanasiou mentioned this pull request Jun 13, 2018

[ML] Implement new rules design (#31110) #31294

Merged

dimitris-athanasiou added the backport pending label Jun 13, 2018

dimitris-athanasiou removed the backport pending label Jun 15, 2018

This was referenced Jun 29, 2018

[ML] Add ability to create and edit detector rules from the anomalies table elastic/kibana#20339

Closed

[ML] Add ability to view and edit filter lists for use in detector rules elastic/kibana#20341

Closed

lcawl mentioned this pull request Aug 14, 2018

[DOCS] Adds machine learning 6.4.0 highlights #32861

Merged

Mpdreamz mentioned this pull request Sep 21, 2018

[meta] 6.4.0 release elastic/elasticsearch-net#3397

Closed

89 tasks

Mpdreamz mentioned this pull request Oct 22, 2018

[meta] 6.5.0 Release elastic/elasticsearch-net#3457

Closed

codebrain mentioned this pull request Jan 28, 2019

[meta] 6.6.0 Release elastic/elasticsearch-net#3552

Closed

48 tasks

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

codebrain mentioned this pull request Mar 19, 2019

[meta] 6.7.0 Release elastic/elasticsearch-net#3615

Closed

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Implement new rules design #31110

[ML] Implement new rules design #31110

dimitris-athanasiou commented Jun 5, 2018 •

edited

Loading

elasticmachine commented Jun 5, 2018

dimitris-athanasiou Jun 5, 2018

droberts195 Jun 6, 2018

dimitris-athanasiou Jun 6, 2018

davidkyle Jun 5, 2018

dimitris-athanasiou Jun 6, 2018

davidkyle Jun 5, 2018

davidkyle Jun 5, 2018

dimitris-athanasiou Jun 6, 2018

dimitris-athanasiou Jun 7, 2018

droberts195 Jun 6, 2018

dimitris-athanasiou Jun 6, 2018

droberts195 Jun 6, 2018 •

edited

Loading

dimitris-athanasiou Jun 7, 2018

droberts195 Jun 6, 2018

droberts195 Jun 6, 2018

droberts195 Jun 6, 2018

dimitris-athanasiou Jun 7, 2018

dimitris-athanasiou commented Jun 12, 2018

[ML] Implement new rules design #31110

[ML] Implement new rules design #31110

Conversation

dimitris-athanasiou commented Jun 5, 2018 • edited Loading

elasticmachine commented Jun 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droberts195 Jun 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimitris-athanasiou commented Jun 12, 2018

dimitris-athanasiou commented Jun 5, 2018 •

edited

Loading

droberts195 Jun 6, 2018 •

edited

Loading