-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add audit warning for 1000 categories found early in job #51146
[ML] Add audit warning for 1000 categories found early in job #51146
Conversation
If 1000 different category definitions are created for a job in the first 100 buckets it processes then an audit warning will now be created. (This will cause a yellow warning triangle in the ML UI's jobs list.) Such a large number of categories suggests that the field that categorization is working on is not well suited to the ML categorization functionality.
Pinging @elastic/ml-core (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Left a few minor suggestions.
@@ -135,6 +135,8 @@ | |||
"Adjust the analysis_limits.model_memory_limit setting to ensure all data is analyzed"; | |||
public static final String JOB_AUDIT_MEMORY_STATUS_HARD_LIMIT_PRE_7_2 = "Job memory status changed to hard_limit at {0}; adjust the " + | |||
"analysis_limits.model_memory_limit setting to ensure all data is analyzed"; | |||
public static final String JOB_AUDIT_EXCESSIVE_EARLY_CATEGORIES = "{0} categories observed in the first [{1}] buckets." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have square brackets about the second number in this message but not the first. Should we make it consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't put the first one in square brackets is that it won't vary (at least for a particular version of the product). The first number will always be 1000 unless somebody changes the code, whereas the second number can vary between different occurrences of the audit message.
@@ -87,7 +90,9 @@ | |||
private final FlushListener flushListener; | |||
private volatile boolean processKilled; | |||
private volatile boolean failed; | |||
private int bucketCount; // only used from the process() thread, so doesn't need to be volatile | |||
private long priorRunsBucketCount; | |||
private int currentRunBucketCount; // only used from the process() thread, so doesn't need to be volatile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we take this chance and make this a long
as well? It seems like something that could hit overflow problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a 1 second bucket span it will take 68 years to overflow, so pretty unlikely. But maybe I can avoid some casting by making it long
, which will make the code cleaner. If so I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There wasn't much casting but I changed it anyway just so both variables have the same type.
@@ -122,6 +127,7 @@ public AutodetectResultProcessor(Client client, | |||
this.bulkResultsPersister = persister.bulkPersisterBuilder(jobId, this::isAlive); | |||
this.timingStatsReporter = new TimingStatsReporter(timingStats, bulkResultsPersister); | |||
this.deleteInterimRequired = true; | |||
this.priorRunsBucketCount = timingStats.getBucketCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering where we'd get this from but it's cool we already pass it in!
@@ -225,6 +231,18 @@ void processResult(AutodetectResult result) { | |||
CategoryDefinition categoryDefinition = result.getCategoryDefinition(); | |||
if (categoryDefinition != null) { | |||
persister.persistCategoryDefinition(categoryDefinition, this::isAlive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that all this has become more complex than a single call to the persister, I'd be tempted to extract this in a handleCategoryDefinition
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If 1000 different category definitions are created for a job in the first 100 buckets it processes then an audit warning will now be created. (This will cause a yellow warning triangle in the ML UI's jobs list.) Such a large number of categories suggests that the field that categorization is working on is not well suited to the ML categorization functionality.
If 1000 different category definitions are created for a job in the first 100 buckets it processes then an audit warning will now be created. (This will cause a yellow warning triangle in the ML UI's jobs list.) Such a large number of categories suggests that the field that categorization is working on is not well suited to the ML categorization functionality.
…c#51146) If 1000 different category definitions are created for a job in the first 100 buckets it processes then an audit warning will now be created. (This will cause a yellow warning triangle in the ML UI's jobs list.) Such a large number of categories suggests that the field that categorization is working on is not well suited to the ML categorization functionality.
In elastic#51146 a rudimentary check for poor categorization was added to 7.6. This change replaces that warning based on a Java-side check with a new one based on the categorization_status field that the ML C++ sets. categorization_status was added in 7.7 and above by elastic#51879, so this new warning based on more advanced conditions will also be in 7.7 and above. Closes elastic#50749
…2195) In #51146 a rudimentary check for poor categorization was added to 7.6. This change replaces that warning based on a Java-side check with a new one based on the categorization_status field that the ML C++ sets. categorization_status was added in 7.7 and above by #51879, so this new warning based on more advanced conditions will also be in 7.7 and above. Closes #50749
…2195) In #51146 a rudimentary check for poor categorization was added to 7.6. This change replaces that warning based on a Java-side check with a new one based on the categorization_status field that the ML C++ sets. categorization_status was added in 7.7 and above by #51879, so this new warning based on more advanced conditions will also be in 7.7 and above. Closes #50749
If 1000 different category definitions are created for a job in
the first 100 buckets it processes then an audit warning will now
be created. (This will cause a yellow warning triangle in the
ML UI's jobs list.)
Such a large number of categories suggests that the field that
categorization is working on is not well suited to the ML
categorization functionality.
Relates #50749