Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Return statistics about forecasts as part of the jobsstats and usage API #31647

Merged
merged 11 commits into from
Jul 4, 2018

Conversation

hendrikmuhs
Copy link
Contributor

@hendrikmuhs hendrikmuhs commented Jun 28, 2018

This change adds stats about forecasts, to the jobstats api as well as xpack/_usage. The following information is collected:

_xpack/ml/anomaly_detectors/{jobid|_all}/_stats:

  • total number of forecasts
  • memory statistics (mean/min/max)
  • runtime statistics
  • record statistics
  • counts by status

_xpack/usage

  • collected by job status as well as overall (_all):
    • total number of forecasts
    • number of jobs that have at least 1 forecast
    • memory, runtime, record statistics
    • counts by status

Fixes #31395

Implementation notes:

  • reused existing functionality but had to move classes to core due to dependencies
  • added functionality to use aggregations for the collection of statistics

Todo:

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

/**
* Helper class to collect min, max, avg and total statistics for a quantity
*/
public class StatsAccumulator implements Writeable {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been moved from o.e.x.ml.utils

@hendrikmuhs
Copy link
Contributor Author

Examples:

_xpack/ml/anomaly_detectors/{job|_all}/_stats

screenshot_20180628_140143

_xpack/ml/anomaly_detectors/{job|_all}/_stats no forecast

screenshot_20180628_140354

Note: No statistics, only the total counter showing 0 forecasts for this jobs

_xpack/usage

screenshot_20180628_140533

Note: jobs (better name anyone?) is the unique count of jobs that have at least 1 forecast. E.g. in this example count == 4, jobs == 3, so 3 out of 4 ML jobs have at least 1 forecast
jobs is omitted if it is 1.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, seems like naming is the hardest thing.

this.jobId = Objects.requireNonNull(jobId);
this.dataCounts = Objects.requireNonNull(dataCounts);
this.modelSizeStats = modelSizeStats;
this.forecastStats = forecastStats;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add this to the JobStats(StreamInput), writeTo(StreamOutput) with BWC checks and hashCode & equals

builder.field(Fields.RUNTIME, runtimeStats.asMap());
builder.field(Fields.STATUSES, statusCounts.asMap());
}
if (jobsWithAtleastOneForecast > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOTAL is the total number of forecasts made and JOBS is the number of jobs that have been forecasted and it's only written if 2 or more jobs have been forecasted? It may not be understood that the absence of this field and the presence of the other fields means only 1 job has been forecasted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this magic is a bit strange. The reason is that I wanted to not show this field for the jobs stats API but only for the usage stats.

@@ -215,24 +219,29 @@ private void addJobsUsage(GetJobsStatsAction.Response response) {
js -> new StatsAccumulator()).add(detectorsCount);
modelSizeStatsByState.computeIfAbsent(jobState,
js -> new StatsAccumulator()).add(modelSize);
forecastStatsByState.computeIfAbsent(jobState,
js -> new ForecastStats()).merge(jobStats.getForecastStats());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not the equivalent to jobStats.getForecastStats()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I get the question, only in case the key does not exist yet it is equivalent to jobStats.getForecastStats(). I think I can simplify this using the Map::merge function in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my mistake I didn't match the brackets properly. I thought the code was new ForecastStats().merge(jobStats.getForecastStats() i.e. merging an empty new object I now see that is not the case

}
total += other.total;
if (other.total > 0) {
++jobsWithAtleastOneForecast;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs = new ForecastStats().merge(a.merge(b.merge(c.merge(someStats))))
After this fs. jobsWithAtleastOneForecast would have a value of 1 regardless of what has been merged into should the value of other. jobsWithAtleastOneForecast be used or have I misunderstood this completely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, it's not used like this right now, but that's a bad excuse for not fixing it. it should simply be jobsWithAtleastOneForecast +=other.jobsWithAtleastOneForecast, the if is superflous.

@davidkyle
Copy link
Member

Note: jobs (better name anyone?) is the unique count of jobs that have at least 1 forecast. E.g. in this example count == 4, jobs == 3, so 3 out of 4 ML jobs have at least 1 forecast
jobs is omitted if it is 1.

I didn't see this before I started the review. How about forecasted_jobs?

@hendrikmuhs
Copy link
Contributor Author

I changed the fieldname as proposed to forecasted_jobs and removed the magic about writing it or not. It's now always written, for stats I still avoid writing them if there isn't any forecast.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@hendrikmuhs
Copy link
Contributor Author

@lcawl

Would you mind quickly reviewing: f9804dc ?

Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few suggestions, but otherwise LGTM

@hendrikmuhs hendrikmuhs merged commit e9f8442 into elastic:master Jul 4, 2018
hendrikmuhs pushed a commit that referenced this pull request Jul 4, 2018
…sage API (#31647)

This change adds stats about forecasts, to the jobstats api as well as xpack/_usage. The following 
information is collected:

_xpack/ml/anomaly_detectors/{jobid|_all}/_stats:

 -  total number of forecasts
 -  memory statistics (mean/min/max)
 -  runtime statistics
 -  record statistics
 -  counts by status

_xpack/usage

 -  collected by job status as well as overall (_all):
     -  total number of forecasts
     -  number of jobs that have at least 1 forecast
     -  memory, runtime, record statistics
     -  counts by status

Fixes #31395
dnhatn added a commit that referenced this pull request Jul 4, 2018
* 6.x:
  Fix not waiting for Netty ThreadDeathWatcher in IT (#31758) (#31789)
  [Docs] Correct default window_size (#31582)
  S3 fixture should report 404 on unknown bucket (#31782)
  [ML] Limit ML filter items to 10K (#31731)
  Fixture for Minio testing (#31688)
  [ML] Return statistics about forecasts as part of the jobsstats and usage API (#31647)
  [DOCS] Add missing get mappings docs to HLRC (#31765)
  [DOCS] Starting Elasticsearch (#31701)
  Fix coerce validation_method in GeoBoundingBoxQueryBuilder (#31747)
  Painless: Complete Removal of Painless Type (#31699)
  Consolidate watcher setting update registration (#31762)
  [DOCS] Adds empty 6.3.1 release notes page
  ingest: Introduction of a bytes processor (#31733)
  [test] don't run bats tests for suse boxes (#31749)
  Add analyze API to high-level rest client (#31577)
  Implemented XContent serialisation for GetIndexResponse (#31675)
  [DOCS] Typos
  DOC: Add examples to the SQL docs (#31633)
  Add support for AWS session tokens (#30414)
  Watcher: Reenable start/stop yaml tests (#31754)
  JDBC: Fix stackoverflow on getObject and timestamp conversion (#31735)
  Support multiple system store types (#31650)
  Add write*Blob option to replace existing blob (#31729)
  Split CircuitBreaker-related tests (#31659)
  Painless: Add Context Docs (#31190)
  Docs: Remove missing reference
  Migrate scripted metric aggregation scripts to ScriptContext design (#30111)
  Watcher: Fix chain input toXcontent serialization (#31721)
  Remove _all example (#31711)
  rest-high-level: added get cluster settings (#31706)
  Docs: Match the examples in the description (#31710)
  [Docs] Correct typos (#31720)
  Extend allowed characters for grok field names (#21745) (#31653) (#31722)
  [DOCS] Check for Windows and *nix file paths (#31648)
  [ML] Validate ML filter_id (#31535)
  Fix gradle4.8 deprecation warnings (#31654)
  Update numbers to reflect 4-byte UTF-8-encoded characters (#27083)
dnhatn added a commit that referenced this pull request Jul 4, 2018
* master:
  [ML] Rate limit established model memory updates (#31768)
  [Docs] Correct default window_size (#31582)
  S3 fixture should report 404 on unknown bucket (#31782)
  Detach Transport from TransportService (#31727)
  [ML] Limit ML filter items to 10K (#31731)
  [ML] Return statistics about forecasts as part of the jobsstats and usage API (#31647)
  Fixture for Minio testing (#31688)
  [DOCS] Add missing get mappings docs to HLRC (#31765)
  [DOCS] Starting Elasticsearch (#31701)
  Painless: Complete Removal of Painless Type (#31699)
  Fix not waiting for Netty ThreadDeathWatcher in IT (#31758)
  Consolidate watcher setting update registration (#31762)
  Build: re-enabled bwc (#31769)
  ingest: Introduction of a bytes processor (#31733)
  Fix coerce validation_method in GeoBoundingBoxQueryBuilder (#31747)
  Add analyze API to high-level rest client (#31577)
  [DOCS] Typos
  DOC: Add examples to the SQL docs (#31633)
  Add support for AWS session tokens (#30414)
  Watcher: Reenable start/stop yaml tests (#31754)
  Implemented XContent serialisation for GetIndexResponse (#31675)
  JDBC: Fix stackoverflow on getObject and timestamp conversion (#31735)
  resolveHasher defaults to NOOP (#31723)
  Account for XContent overhead in in-flight breaker
  Split CircuitBreaker-related tests (#31659)
  Add write*Blob option to replace existing blob (#31729)
  Painless: Add Context Docs (#31190)
  Watcher: Fix chain input toXcontent serialization (#31721)
  Docs: Match the examples in the description (#31710)
  rest-high-level: added get cluster settings (#31706)
  [Docs] Correct typos (#31720)
  Clean up double semicolon code typos (#31687)
  [DOCS] Check for Windows and *nix file paths (#31648)
  [ML] Validate ML filter_id (#31535)
  Revert long lines
  Fix TransportChangePasswordActionTests
hendrikmuhs pushed a commit to elastic/kibana that referenced this pull request Jul 17, 2018
Add forecasts (Number of forecasts for an ml jobs) as column to the ML Jobs monitoring page. Related to: elastic/elasticsearch#31647
hendrikmuhs pushed a commit to hendrikmuhs/kibana that referenced this pull request Jul 17, 2018
…ic#20758)

Add forecasts (Number of forecasts for an ml jobs) as column to the ML Jobs monitoring page. Related to: elastic/elasticsearch#31647
hendrikmuhs pushed a commit to elastic/kibana that referenced this pull request Jul 18, 2018
… (#20889)

Add forecasts (Number of forecasts for an ml jobs) as column to the ML Jobs monitoring page. Related to: elastic/elasticsearch#31647
@hendrikmuhs hendrikmuhs deleted the forecaststats branch October 18, 2018 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants