-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Implement new rules design #31110
[ML] Implement new rules design #31110
Conversation
Pinging @elastic/ml-core |
@@ -700,6 +704,18 @@ void setMaxMachineMemoryPercent(int maxMachineMemoryPercent) { | |||
} | |||
} | |||
|
|||
// Visible for testing | |||
static void checkJobWithRulesRequiresMinVersionOnAllNodes(String jobId, ClusterState clusterState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droberts195 Actually, I think this is not enough. In the case of a job relocating, validate
will not be called, right? So if at that point there's a node running on a version earlier than 6.4 it may be assigned to it. I think I need to add a check in the node selection code as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we go down the road of banning use of functionality until the entire cluster is on a particular version then we need to do it when the job is created, not when it's opened.
If we can't do this for some reason and need to do a check when opening a job then it needs to be in the allocation decision, so we might as well then use our existing functionality that takes job version into account when allocating to nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with checking when the job is created is we then also need to ban earlier nodes from joining the cluster. It feels too excessive to do that for this. I'll switch to rejecting incompatible nodes from taking on jobs with rules.
public static final ConstructingObjectParser<FilterRef, Void> CONFIG_PARSER = | ||
new ConstructingObjectParser<>(FILTER_REF_FIELD.getPreferredName(), false, | ||
a -> new FilterRef((String) a[0], (FilterType) a[1])); | ||
public static final Map<MlParserType, ConstructingObjectParser<FilterRef, Void>> PARSERS = new EnumMap<>(MlParserType.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this map if the individual parsers are public static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the same pattern we've been using in other classes to ensure unknown fields are ignored while parsing from cluster state but still be strict when parsing from a user request.
return Objects.hash(filterId, filterType); | ||
} | ||
|
||
public String getFilterId() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and public FilterType getFilterType()...
parser.declareField(Builder::setConditionsConnective, p -> { | ||
if (p.currentToken() == XContentParser.Token.VALUE_STRING) { | ||
return Connective.fromString(p.text()); | ||
parser.declareObject(Builder::setScope, (p, c) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be neater to create a parser for RuleScope
and use that here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure how to do that given that it has no named fields. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have at least extracted the code into a parser method in RuleScope
. Looks cleaner now.
for (RuleAction action : actions) { | ||
action.writeTo(out); | ||
} | ||
if (out.getVersion().onOrAfter(Version.V_6_4_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lack of symmetry here. The stream input constructor doesn't check the version so either this shouldn't either or both should. Not checking would be reasonable if the protection of old versions is done at a higher level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is preventing a rule to be written out to a node that doesn't know how to read it. When it comes to reading there is no issue as it would only read from nodes that are >= 6.4. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, on the input side we're relying on the rules list in 6.3 and below being an empty list. We discussed that and said it was fine because nobody should have used a rule before 6.4, and if they did it was unsupported and hence acceptable for the transport to break.
On the output side I think a change is necessary though. The way it is at the moment if you have a list of 2 rules in 6.4 then that will be serialised as a list of 2 objects each with no fields. Instead I think it needs to be serialised as an empty list, because 6.3 won't understand the objects with no fields. (Let me know if that's wrong.)
If my thinking is correct then this version check can be removed but instead we need one in Detector.java
around this line:
out.writeList(rules);
If the remote node is below version 6.4 it needs to write an empty list instead of rules
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right. Will push a fix for this shortly.
case DIFF_FROM_TYPICAL: | ||
return true; | ||
case TIME: | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should default
throw an exception rather than return false? If a future enum value is added we don't know what would be most appropriate to return here.
@@ -700,6 +704,18 @@ void setMaxMachineMemoryPercent(int maxMachineMemoryPercent) { | |||
} | |||
} | |||
|
|||
// Visible for testing | |||
static void checkJobWithRulesRequiresMinVersionOnAllNodes(String jobId, ClusterState clusterState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we go down the road of banning use of functionality until the entire cluster is on a particular version then we need to do it when the job is created, not when it's opened.
If we can't do this for some reason and need to do a check when opening a job then it needs to be in the allocation decision, so we might as well then use our existing functionality that takes job version into account when allocating to nodes.
case TYPICAL: | ||
case DIFF_FROM_TYPICAL: | ||
return true; | ||
case TIME: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now default
is throwing an exception, does TIME
need a separate return false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Also realised I'm missing tests here. I'll add them in together with the fix.
0f0710d
to
76d1696
Compare
Rules allow users to supply a detector with domain knowledge that can improve the quality of the results. The model detects statistically anomalous results but it has no knowledge of the meaning of the values being modelled. For example, a detector that performs a population analysis over IP addresses could benefit from a list of IP addresses that the user knows to be safe. Then anomalous results for those IP addresses will not be created and will not affect the quantiles either. Another example would be a detector looking for anomalies in the median value of CPU utilization. A user might want to inform the detector that any results where the actual value is less than 5 is not interesting. This commit introduces a `custom_rules` field to the `Detector`. A detector may have multiple rules which are combined with `or`. A rule has 3 fields: `actions`, `scope` and `conditions`. Actions is a list of what should happen when the rule applies. The current options include `skip_result` and `skip_model_update`. The default value for `actions` is the `skip_result` action. Scope is optional and allows for applying filters on any of the partition/over/by field. When not defined the rule applies to all series. The `filter_id` needs to be specified to match the id of the filter to be used. Optionally, the `filter_type` can be specified as either `include` (default) or `exclude`. When set to `include` the rule applies to entities that are in the filter. When set to `exclude` the rule only applies to entities not in the filter. Conditions can be zero or more. A condition requires `applies_to` and `condition` to be specified. The `applied_to` value can be either `actual`, `typical` or `diff_from_typical` and it specifies the numerical value to which the `condition` applies. The `condition` has an `operator` (`lt`, `lte`, `gt`, `gte`) and a numerical `value`. Conditions are combined with `and` and allow to specify numerical conditions for then a rule applies. A rule must either have a scope or one or more conditions. Finally, a rule with scope and conditions applies when all of them apply.
retest this please |
76d1696
to
4d3ff56
Compare
* master: Remove RestGetAllAliasesAction (#31308) Temporary fix for broken build Reenable Checkstyle's unused import rule (#31270) Remove remaining unused imports before merging #31270 Fix non-REST doc snippet [DOC] Extend SQL docs Immediately flush channel after writing to buffer (#31301) [DOCS] Shortens ML API intros Use quotes in the call invocation (#31249) move security ingest processors to a sub ingest directory (#31306) Add 5.6.11 version constant. Fix version detection. SQL: Whitelist SQL utility class for better scripting (#30681) [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (#31299) CCS: don't proxy requests for already connected node (#31273) Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap [test] opensuse packaging turn up debug logging Add unreleased version 6.3.1 Removes experimental tag from scripted_metric aggregation (#31298) [Rollup] Metric config parser must use builder so validation runs (#31159) [ML] Check licence when datafeeds use cross cluster search (#31247) Add notion of internal index settings (#31286) Test: Remove broken yml test feature (#31255) REST hl client: cluster health to default to cluster level (#31268) [ML] Update test thresholds to account for changes to memory control (#31289) Log warnings when cluster state publication failed to some nodes (#31233) Fix AntFixture waiting condition (#31272) Ignore numeric shard count if waiting for ALL (#31265) [ML] Implement new rules design (#31110) index_prefixes back-compat should test 6.3 (#30951) Core: Remove plain execute method on TransportAction (#30998) Update checkstyle to 8.10.1 (#31269) Set analyzer version in PreBuiltAnalyzerProviderFactory (#31202) Modify pipelining handlers to require full requests (#31280) Revert upgrade to Netty 4.1.25.Final (#31282) Use armored input stream for reading public key (#31229) Fix Netty 4 Server Transport tests. Again. REST hl client: adjust wait_for_active_shards param in cluster health (#31266) REST high-level Client: remove deprecated API methods (#31200) [DOCS] Mark SQL feature as experimental [DOCS] Updates machine learning custom URL screenshots (#31222) Fix naming conventions check for XPackTestCase Fix security Netty 4 transport tests Fix race in clear scroll (#31259) [DOCS] Clarify audit index settings when remote indexing (#30923) Delete typos in SAML docs (#31199) REST high-level client: add Cluster Health API (#29331) [ML][TEST] Mute tests using rules (#31204) Support RequestedAuthnContext (#31238) SyncedFlushResponse to implement ToXContentObject (#31155) Add Get Aliases API to the high-level REST client (#28799) Remove some line length supressions (#31209) Validate xContentType in PutWatchRequest. (#31088) [INGEST] Interrupt the current thread if evaluation grok expressions take too long (#31024) Suppress extras FS on caching directory tests Revert "[DOCS] Added 6.3 info & updated the upgrade table. (#30940)" Revert "Fix snippets in upgrade docs" Fix snippets in upgrade docs [DOCS] Added 6.3 info & updated the upgrade table. (#30940) LLClient: Support host selection (#30523) Upgrade to Netty 4.1.25.Final (#31232) Enable custom credentials for core REST tests (#31235) Move ESIndexLevelReplicationTestCase to test framework (#31243) Encapsulate Translog in Engine (#31220) HLRest: Add get index templates API (#31161) Remove all unused imports and fix CRLF (#31207) [Tests] Fix self-referencing tests [TEST] Fix testRecoveryAfterPrimaryPromotion [Docs] Remove mention pattern files in Grok processor (#31170) Use stronger write-once semantics for Azure repository (#30437) Don't swallow exceptions on replication (#31179) Limit the number of concurrent requests per node (#31206) Call ensureNoSelfReferences() on _agg state variable after scripted metric agg script executions (#31044) Move java version checker back to its own jar (#30708) [test] add fix for rare virtualbox error (#31212)
Rules allow users to supply a detector with domain knowledge that can improve the quality of the results. The model detects statistically anomalous results but it has no knowledge of the meaning of the values being modelled. For example, a detector that performs a population analysis over IP addresses could benefit from a list of IP addresses that the user knows to be safe. Then anomalous results for those IP addresses will not be created and will not affect the quantiles either. Another example would be a detector looking for anomalies in the median value of CPU utilization. A user might want to inform the detector that any results where the actual value is less than 5 is not interesting. This commit introduces a `custom_rules` field to the `Detector`. A detector may have multiple rules which are combined with `or`. A rule has 3 fields: `actions`, `scope` and `conditions`. Actions is a list of what should happen when the rule applies. The current options include `skip_result` and `skip_model_update`. The default value for `actions` is the `skip_result` action. Scope is optional and allows for applying filters on any of the partition/over/by field. When not defined the rule applies to all series. The `filter_id` needs to be specified to match the id of the filter to be used. Optionally, the `filter_type` can be specified as either `include` (default) or `exclude`. When set to `include` the rule applies to entities that are in the filter. When set to `exclude` the rule only applies to entities not in the filter. There may be zero or more conditions. A condition requires `applies_to`, `operator` and `value` to be specified. The `applies_to` value can be either `actual`, `typical` or `diff_from_typical` and it specifies the numerical value to which the condition applies. The `operator` (`lt`, `lte`, `gt`, `gte`) and `value` complete the definition. Conditions are combined with `and` and allow to specify numerical conditions for when a rule applies. A rule must either have a scope or one or more conditions. Finally, a rule with scope and conditions applies when all of them apply.
Rules allow users to supply a detector with domain knowledge that can improve the quality of the results. The model detects statistically anomalous results but it has no knowledge of the meaning of the values being modelled. For example, a detector that performs a population analysis over IP addresses could benefit from a list of IP addresses that the user knows to be safe. Then anomalous results for those IP addresses will not be created and will not affect the quantiles either. Another example would be a detector looking for anomalies in the median value of CPU utilization. A user might want to inform the detector that any results where the actual value is less than 5 is not interesting. This commit introduces a `custom_rules` field to the `Detector`. A detector may have multiple rules which are combined with `or`. A rule has 3 fields: `actions`, `scope` and `conditions`. Actions is a list of what should happen when the rule applies. The current options include `skip_result` and `skip_model_update`. The default value for `actions` is the `skip_result` action. Scope is optional and allows for applying filters on any of the partition/over/by field. When not defined the rule applies to all series. The `filter_id` needs to be specified to match the id of the filter to be used. Optionally, the `filter_type` can be specified as either `include` (default) or `exclude`. When set to `include` the rule applies to entities that are in the filter. When set to `exclude` the rule only applies to entities not in the filter. There may be zero or more conditions. A condition requires `applies_to`, `operator` and `value` to be specified. The `applies_to` value can be either `actual`, `typical` or `diff_from_typical` and it specifies the numerical value to which the condition applies. The `operator` (`lt`, `lte`, `gt`, `gte`) and `value` complete the definition. Conditions are combined with `and` and allow to specify numerical conditions for when a rule applies. A rule must either have a scope or one or more conditions. Finally, a rule with scope and conditions applies when all of them apply.
* 6.x: Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360) [ML] Implement new rules design (#31110) (#31294) Remove RestGetAllAliasesAction (#31308) CCS: don't proxy requests for already connected node (#31273) Rankeval: Fold template test project into main module (#31203) [Docs] Remove reference to repository-s3 plugin creating an S3 bucket (#31359) More detailed tracing when writing metadata (#31319) Add details section for dcg ranking metric (#31177)
Rules allow users to supply a detector with domain
knowledge that can improve the quality of the results.
The model detects statistically anomalous results but it
has no knowledge of the meaning of the values being modelled.
For example, a detector that performs a population analysis
over IP addresses could benefit from a list of IP addresses
that the user knows to be safe. Then anomalous results for
those IP addresses will not be created and will not affect
the quantiles either.
Another example would be a detector looking for anomalies
in the median value of CPU utilization. A user might want
to inform the detector that any results where the actual
value is less than 5 is not interesting.
This commit introduces a
custom_rules
field to theDetector
.A detector may have multiple rules which are combined with
or
.A rule has 3 fields:
actions
,scope
andconditions
.Actions is a list of what should happen when the rule applies.
The current options include
skip_result
andskip_model_update
.The default value for
actions
is theskip_result
action.Scope is optional and allows for applying filters on any of the
partition/over/by field. When not defined the rule applies to
all series. The
filter_id
needs to be specified to match the idof the filter to be used. Optionally, the
filter_type
can be specifiedas either
include
(default) orexclude
. When set toinclude
the rule applies to entities that are in the filter. When set to
exclude
the rule only applies to entities not in the filter.There may be zero or more conditions. A condition requires
applies_to
,operator
andvalue
to be specified. Theapplies_to
value can beeither
actual
,typical
ordiff_from_typical
and it specifiesthe numerical value to which the condition applies. The
operator
(
lt
,lte
,gt
,gte
) andvalue
complete the definition.Conditions are combined with
and
and allow to specify numericalconditions for when a rule applies.
A rule must either have a scope or one or more conditions. Finally,
a rule with scope and conditions applies when all of them apply.