Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLRC: Adding ML Job stats #33183

Merged
merged 19 commits into from
Sep 1, 2018
Merged

Conversation

benwtrent
Copy link
Member

Sorry for the size of this PR. There were a handful of Pojos that needed to be added before the GET jobs stats action could be completed.

I wanted to put them in separate PRs, but with how much refactoring is being done on the HLRC, coupled with my commits getting all out of order, I just opted to open it as is.

I am still putting things in the Protocol package for now. We can move all the Pojos at once when the new client ML package is ready (don't want to cause unnecessary merge conflicts).

This relates to (#29827)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big one! Left some comments.

@@ -285,4 +288,45 @@ public void getBucketsAsync(GetBucketsRequest request, RequestOptions options, A
listener,
Collections.emptySet());
}

/**
* Gets one or more Machine Learning job statistics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase to Gets usage information for one or more Machine Learning jobs?



/**
* Request object to get {@link org.elasticsearch.client.ml.job.stats.JobStats} objects with the matching `jobId`s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the matching jobId's->for the matching jobId' ?

*/
public class GetJobsStatsRequest extends ActionRequest implements ToXContentObject {

public static final ParseField JOB_ID = new ParseField("job_id");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse Job.ID instead

@SuppressWarnings("unchecked")
public static final ConstructingObjectParser<GetJobsStatsRequest, Void> PARSER = new ConstructingObjectParser<>(
"get_jobs_stats_request",
true, a -> new GetJobsStatsRequest((List<String>) a[0]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't support unknown fields in the request objects, only the response. You can use the constructor that skips this argument all together.

* `_all` explicitly gets all the jobs' statistics in the cluster
* An empty request (no `jobId`s) implicitly gets all the jobs' statistics in the cluster
*/
public class GetJobsStatsRequest extends ActionRequest implements ToXContentObject {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this and the GetJobRequest needs to be consistent. Not sure if plural or singular is best. Potentially should also be consistent with the results (where I've been using plural). @droberts195 Any preference?


@Override
protected boolean supportsUnknownFields() {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it shouldn't, because the internal map is just a map of strings and objects. Consequently, when random fields are added to the serialized object, and it is de-serialized, they are no longer equal as the individual internal attribute fields are not well defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is the case. The unknown fields will only be at the first level, not under the attributes object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown fields are injected recursively into internal objects. Example from a failing test for when this is true

*****OBJECT PRE SHUFFLE + RANDOM FIELDS ********
{
  "id" : "IeAYBTguoP",
  "name" : "LOmZFvZJJN",
  "ephemeral_id" : "pNZZYyuHQC",
  "transport_address" : "oxceWLCGyt",
  "attributes" : {
    "key-1" : "EukbVGneBx",
    "key-0" : "CswxpsJUWa",
    "key-5" : "GEaErpPvvc",
    "key-4" : "aQotGStjzI",
    "key-3" : "juQFiTIlSy",
    "key-2" : "ynlRGrbNtc"
  }
}
****ATRIBUTES POST SHUFFLE + RANDOM FIELDS*****
{
 "key-1" : "EukbVGneBx",
 "key-0" : "CswxpsJUWa",
 "key-5" : "GEaErpPvvc",
 "key-4" : "aQotGStjzI",
 "key-3" : "juQFiTIlSy",
 "NrIvvAzJin" : 5240957203306958078,
 "key-2" : "ynlRGrbNtc"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they?! Sorry Ben, I was really not expecting they would. Not a big fan of this. OK, you can ignore my comments about unknown fields in test. However, I wonder if we should add a manual test adding a top-level unknown field to safe-guard this as it is critical for the client forward compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitris-athanasiou I was thinking for a separate PR adding a "shallow" check that does not recursively inject unknown fields. That way we can reuse that code for other other objects that have this same behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitris-athanasiou figured out how to only create random fields for the toplevel part of the object.

    @Override
    protected Predicate<String> getRandomFieldsExcludeFilter() {
        return field -> !field.isEmpty();
    }

This predicate means that for anything whose field path is not "" we should not add random fields. "" is the same as the root path.

XContentTestUtils.insertRandomFields calls XContentTestUtils.getInsertPaths and then utilized that supplied predicate to filter them.


@Override
protected boolean supportsUnknownFields() {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot be as statusCounts is a map of strings and longs. Since the internal object is not well defined, I am not 100% sure if we should make the entries well defined and ignore stuff on the server side or not.

}

private static SimpleStats createSimpleStats() {
return new SimpleStatsTests().createTestInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather factoring out the creation here and then call that from createTestInstance`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand your comment, but regardless, does not seem like that big of an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that it doesn't make sense to instantiate the tests class here. We could extract the body of createTestInstance into a method createRandom and then call the latter from the former.


@Override
protected boolean supportsUnknownFields() {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar reason to all the other reasons about why unknown fields is false here.

include-tagged::{doc-tests}/MlClientDocumentationIT.java[x-pack-ml-get-job-stats-listener]
--------------------------------------------------
<1> `onResponse` is called back when the action is completed successfully
<2> `onFailure` is called back when some unexpected error occurs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're missing a description of the response

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of description? in the execution section there is an explanation of what a JobsStatsResponse object looks like and contains.

Do you want a description of the JobStats object itself? That is a pretty big object that is probably best served through JavaDocs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked in some other asciidocs and they usually have a section with the response object where they have the description that you've added in the execution (not the JobStats object). See for example get-buckets.asciidoc where I copied that.

private long total;
private long forecastedJobs;
private final long total;
private final long forecastedJobs;
private SimpleStats memoryStats;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest could be final as well, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitris-athanasiou No, because they are optionally set in the constructor if the total is > 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just means we need to explicitly set them to null if the condition is not true. Just informational, it's no big deal whether we do it or not.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent
Copy link
Member Author

Jenkins retest this please

@benwtrent benwtrent merged commit 19b14fa into elastic:master Sep 1, 2018
@benwtrent benwtrent deleted the feature/ml-hlrc-job-stats branch September 1, 2018 18:32
benwtrent added a commit that referenced this pull request Sep 1, 2018
* HLRC: Adding pojos for get job stats

HLRC: Adding pojos for job stats request

* HLRC: Adding job stats pojos

* HLRC: ML job stats

* Minor syntax changes and adding license headers

* minor comment change

* Moving to client package, minor changes

* Addressing PR comments

* removing bad sleep

* addressing minor comment around test methods

* adding toplevel random fields for tests

* addressing minor review comments
dnhatn added a commit that referenced this pull request Sep 2, 2018
* master:
  HLRC: ML Flush job (#33187)
  HLRC: Adding ML Job stats (#33183)
  LLREST: Drop deprecated methods (#33223)
  Mute testSyncerOnClosingShard
  [DOCS] Moves machine learning APIs to docs folder (#31118)
dnhatn added a commit that referenced this pull request Sep 2, 2018
* 6.x:
  HLRC: ML Flush job (#33187)
  Switch more LLREST usage to new style Requests (#33171)
  HLRC: Adding ML Job stats (#33183)
  HLREST: add reindex API (#32679)
  Mute testSyncerOnClosingShard
  [DOCS] Moves machine learning APIs to docs folder (#31118)
jasontedor added a commit to martijnvg/elasticsearch that referenced this pull request Sep 2, 2018
* master: (64 commits)
  HLREST: add update by query API (elastic#32760)
  TEST: Increase timeout testFollowIndexAndCloseNode (elastic#33333)
  HLRC: ML Flush job (elastic#33187)
  HLRC: Adding ML Job stats (elastic#33183)
  LLREST: Drop deprecated methods (elastic#33223)
  Mute testSyncerOnClosingShard
  [DOCS] Moves machine learning APIs to docs folder (elastic#31118)
  Mute test watcher usage stats output
  [Rollup] Fix FullClusterRestart test
  Adjust soft-deletes version after backport into 6.5
  completely drop `index.shard.check_on_startup: fix` for 7.0 (elastic#33194)
  Fix AwaitsFix issue number
  Mute SmokeTestWatcherWithSecurityIT testsi
  drop `index.shard.check_on_startup: fix` (elastic#32279)
  tracked at
  [DOCS] Moves ml folder from x-pack/docs to docs (elastic#33248)
  [DOCS] Move rollup APIs to docs (elastic#31450)
  [DOCS] Rename X-Pack Commands section (elastic#33005)
  TEST: Disable soft-deletes in ParentChildTestCase
  Fixes SecurityIntegTestCase so it always adds at least one alias (elastic#33296)
  ...
jasontedor added a commit to debadair/elasticsearch that referenced this pull request Sep 2, 2018
* master: (283 commits)
  HLREST: add update by query API (elastic#32760)
  TEST: Increase timeout testFollowIndexAndCloseNode (elastic#33333)
  HLRC: ML Flush job (elastic#33187)
  HLRC: Adding ML Job stats (elastic#33183)
  LLREST: Drop deprecated methods (elastic#33223)
  Mute testSyncerOnClosingShard
  [DOCS] Moves machine learning APIs to docs folder (elastic#31118)
  Mute test watcher usage stats output
  [Rollup] Fix FullClusterRestart test
  Adjust soft-deletes version after backport into 6.5
  completely drop `index.shard.check_on_startup: fix` for 7.0 (elastic#33194)
  Fix AwaitsFix issue number
  Mute SmokeTestWatcherWithSecurityIT testsi
  drop `index.shard.check_on_startup: fix` (elastic#32279)
  tracked at
  [DOCS] Moves ml folder from x-pack/docs to docs (elastic#33248)
  [DOCS] Move rollup APIs to docs (elastic#31450)
  [DOCS] Rename X-Pack Commands section (elastic#33005)
  TEST: Disable soft-deletes in ParentChildTestCase
  Fixes SecurityIntegTestCase so it always adds at least one alias (elastic#33296)
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 3, 2018
* master: (197 commits)
  Prevent NPE parsing the stop datafeed request. (elastic#33347)
  HLRC: Add ML get overall buckets API (elastic#33297)
  Core: Fix epoch millis java time formatter (elastic#33302)
  [Docs] Improve tuning for speed advice (elastic#33315)
  [Rollup] Fix Caps Comparator to handle calendar/fixed time (elastic#33336)
  [CI] Mute  IndexShardTests#testIndexCheckOnStartup fails elastic#33345
  [CI] Mute LuceneChangesSnapshotTests#testUpdateAndReadChangesConcurrently
  Security for _field_names field should not override field statistics (elastic#33261)
  Add early termination support to BucketCollector (elastic#33279)
  Fix extractjar task  ci  (elastic#33272)
  Mute testFollowIndexAndCloseNode
  Logging: Drop Settings from some logging ctors (elastic#33332)
  HLREST: add update by query API (elastic#32760)
  TEST: Increase timeout testFollowIndexAndCloseNode (elastic#33333)
  HLRC: ML Flush job (elastic#33187)
  HLRC: Adding ML Job stats (elastic#33183)
  LLREST: Drop deprecated methods (elastic#33223)
  Mute testSyncerOnClosingShard
  [DOCS] Moves machine learning APIs to docs folder (elastic#31118)
  Mute test watcher usage stats output
  ...
@benwtrent benwtrent removed the :ml Machine learning label Oct 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants