Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Removing old per-partition normalization code #32816

Merged
merged 7 commits into from
Aug 15, 2018

Conversation

edsavage
Copy link
Contributor

Per-partition normalization is an old, undocumented feature that was
never used by clients. It has been superseded by per-partition maximum
scoring (see #32748).

This PR removes the now redundant code.

A PR containing the corresponding changes to the ml-cpp code will follow.

Per-partition normalization is an old, undocumented feature that was
never used by clients. It has been superseded by per-partition maximum
scoring.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good but we need to address the BWC issues I raised. Also, to avoid breaking the build we need to follow the process of: 1. adding the version checks against 7 on master to get a green CI, 2. backport and change version to 6.5 but also disable bwc tests 3. once we have successful builds, we can change version check to 6.5 on master and re-enable the bwc tests.

@@ -164,8 +160,6 @@ public AnalysisConfig(StreamInput in) throws IOException {
}
}
}

usePerPartitionNormalization = in.readBoolean();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to check that if we are reading from an older node we consume the boolean (although we do nothing with it).

@@ -194,8 +188,6 @@ public void writeTo(StreamOutput out) throws IOException {
if (out.getVersion().before(Version.V_6_5_0)) {
out.writeBoolean(false);
}

out.writeBoolean(usePerPartitionNormalization);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here we need to check that if we are writing to an older node we write a false.

@@ -143,7 +137,6 @@ public Bucket(StreamInput in) throws IOException {
if (in.getVersion().before(Version.V_5_5_0)) {
in.readGenericValue();
}
partitionScores = in.readList(PartitionScore::new);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can get away without doing anything for BWC for the buckets because they are not being transferred between nodes. But I would like @droberts195 to confirm as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need to consider BWC for these lists. If you look at the implementation of readList() and writeList() they start by reading/writing the list length. So we need to write an empty list to versions before 6.5, and read a list of something. We can replace PartitionScore::new with a function in Bucket that reads the same stuff that PartitionScore::new read but just discards it.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true for when there is a transport client which I didn't think of at the first place. So, yes, we'll need to do the trick of reading the scores. There is another place where I'm doing this: https://github.com/elastic/elasticsearch/blob/6.x/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/Detector.java#L253. You can take a look and follow a similar approach. Note we only need that code in the 6.x branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dimitris-athanasiou ! That all makes sense.

To maintain communication compatibility with nodes prior to 6.5 it is
necessary to maintain/cope with the old wire format
@@ -167,6 +184,10 @@ public void writeTo(StreamOutput out) throws IOException {
if (out.getVersion().before(Version.V_5_5_0)) {
out.writeGenericValue(Collections.emptyMap());
}
// bwc for perPartitionNormalization
if (out.getVersion().before(Version.V_6_5_0)) {
out.writeGenericValue(Collections.emptyList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting this to be out.writeList(Collections.emptyList());. Did you try that out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the change now - you're right, writeList is the better option here

@dimitris-athanasiou
Copy link
Contributor

Also, just realised we should remove partition score from the Bucket class in the high level rest client.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edsavage edsavage merged commit 8ce1ab3 into elastic:master Aug 15, 2018
edsavage added a commit that referenced this pull request Aug 15, 2018
BWC tests disabled while backporting #32816
jasontedor added a commit that referenced this pull request Aug 15, 2018
* elastic/master:
  Revert "cluster formation DSL - Gradle integration -  part 2 (#32028)" (#32876)
  cluster formation DSL - Gradle integration -  part 2 (#32028)
  Introduce global checkpoint listeners (#32696)
  Move connection profile into connection manager (#32858)
  [ML] Temporarily disabling rolling-upgrade tests
  Use generic AcknowledgedResponse instead of extended classes (#32859)
  [ML] Removing old per-partition normalization code (#32816)
  Use JDK 10 for 6.4 BWC builds (#32866)
  Removed flaky test. Looks like randomisation makes these assertions unreliable.
  [test] mute IndexShardTests.testDocStats
  Introduce the dissect library (#32297)
  Security: remove password hash bootstrap check (#32440)
  Move validation to server for put user requests (#32471)
  [ML] Add high level REST client docs for ML put job endpoint (#32843)
  Test: Fix forbidden uses in test framework (#32824)
  Painless: Change fqn_only to no_import (#32817)
  [test] mute testSearchWithSignificantTermsAgg
  Watcher: Remove unused hipchat render method (#32211)
  Watcher: Remove extraneous auth classes (#32300)
  Watcher: migrate PagerDuty v1 events API to v2 API (#32285)
edsavage added a commit that referenced this pull request Aug 16, 2018
[ML] Removing old per-partition normalization code

Per-partition normalization is an old, undocumented feature that was
never used by clients. It has been superseded by per-partition maximum
scoring.

To maintain communication compatibility with nodes prior to 6.5 it is
necessary to maintain/cope with the old wire format
edsavage added a commit that referenced this pull request Aug 16, 2018
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Aug 16, 2018
Per-partition normalization is an old, undocumented feature that was
never used by clients. It has been superseded by per-partition maximum
scoring (see #32748).

This PR removes the now redundant code.

Relates elastic/elasticsearch#32816
edsavage added a commit to edsavage/elasticsearch that referenced this pull request Aug 16, 2018
Re-enable BWC tests for ML now that elastic#32816 has been backported to 6.x
edsavage added a commit that referenced this pull request Aug 16, 2018
[ML] Re-enabling BWC tests

Re-enable BWC tests for ML now that #32816 has been backported to 6.x
jimczi pushed a commit that referenced this pull request Aug 17, 2018
jimczi pushed a commit that referenced this pull request Aug 17, 2018
[ML] Re-enabling BWC tests

Re-enable BWC tests for ML now that #32816 has been backported to 6.x
edsavage added a commit to elastic/ml-cpp that referenced this pull request Aug 20, 2018
Per-partition normalization is an old, undocumented feature that was
never used by clients. It has been superseded by per-partition maximum
scoring (see #32748).

This PR removes the now redundant code.

Relates elastic/elasticsearch#32816
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Aug 20, 2018
Per-partition normalization is an old, undocumented feature that was
never used by clients. It has been superseded by per-partition maximum
scoring (see #32748).

This PR removes the now redundant code.

Relates elastic/elasticsearch#32816
edsavage added a commit to elastic/ml-cpp that referenced this pull request Aug 20, 2018
Per-partition normalization is an old, undocumented feature that was
never used by clients. It has been superseded by per-partition maximum
scoring (see #32748).

This PR removes the now redundant code.

Relates elastic/elasticsearch#32816
@edsavage edsavage deleted the remove_per_partition_normalization branch August 20, 2018 12:13
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants