-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Removing old per-partition normalization code #32816
[ML] Removing old per-partition normalization code #32816
Conversation
Per-partition normalization is an old, undocumented feature that was never used by clients. It has been superseded by per-partition maximum scoring.
Pinging @elastic/ml-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good but we need to address the BWC issues I raised. Also, to avoid breaking the build we need to follow the process of: 1. adding the version checks against 7 on master to get a green CI, 2. backport and change version to 6.5 but also disable bwc tests 3. once we have successful builds, we can change version check to 6.5 on master and re-enable the bwc tests.
@@ -164,8 +160,6 @@ public AnalysisConfig(StreamInput in) throws IOException { | |||
} | |||
} | |||
} | |||
|
|||
usePerPartitionNormalization = in.readBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to check that if we are reading from an older node we consume the boolean (although we do nothing with it).
@@ -194,8 +188,6 @@ public void writeTo(StreamOutput out) throws IOException { | |||
if (out.getVersion().before(Version.V_6_5_0)) { | |||
out.writeBoolean(false); | |||
} | |||
|
|||
out.writeBoolean(usePerPartitionNormalization); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here we need to check that if we are writing to an older node we write a false
.
@@ -143,7 +137,6 @@ public Bucket(StreamInput in) throws IOException { | |||
if (in.getVersion().before(Version.V_5_5_0)) { | |||
in.readGenericValue(); | |||
} | |||
partitionScores = in.readList(PartitionScore::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can get away without doing anything for BWC for the buckets because they are not being transferred between nodes. But I would like @droberts195 to confirm as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do need to consider BWC for these lists. If you look at the implementation of readList()
and writeList()
they start by reading/writing the list length. So we need to write an empty list to versions before 6.5, and read a list of something. We can replace PartitionScore::new
with a function in Bucket
that reads the same stuff that PartitionScore::new
read but just discards it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true for when there is a transport client which I didn't think of at the first place. So, yes, we'll need to do the trick of reading the scores. There is another place where I'm doing this: https://github.com/elastic/elasticsearch/blob/6.x/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/Detector.java#L253. You can take a look and follow a similar approach. Note we only need that code in the 6.x
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dimitris-athanasiou ! That all makes sense.
To maintain communication compatibility with nodes prior to 6.5 it is necessary to maintain/cope with the old wire format
@@ -167,6 +184,10 @@ public void writeTo(StreamOutput out) throws IOException { | |||
if (out.getVersion().before(Version.V_5_5_0)) { | |||
out.writeGenericValue(Collections.emptyMap()); | |||
} | |||
// bwc for perPartitionNormalization | |||
if (out.getVersion().before(Version.V_6_5_0)) { | |||
out.writeGenericValue(Collections.emptyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting this to be out.writeList(Collections.emptyList());
. Did you try that out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make the change now - you're right, writeList is the better option here
Also, just realised we should remove partition score from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BWC tests disabled while backporting #32816
* elastic/master: Revert "cluster formation DSL - Gradle integration - part 2 (#32028)" (#32876) cluster formation DSL - Gradle integration - part 2 (#32028) Introduce global checkpoint listeners (#32696) Move connection profile into connection manager (#32858) [ML] Temporarily disabling rolling-upgrade tests Use generic AcknowledgedResponse instead of extended classes (#32859) [ML] Removing old per-partition normalization code (#32816) Use JDK 10 for 6.4 BWC builds (#32866) Removed flaky test. Looks like randomisation makes these assertions unreliable. [test] mute IndexShardTests.testDocStats Introduce the dissect library (#32297) Security: remove password hash bootstrap check (#32440) Move validation to server for put user requests (#32471) [ML] Add high level REST client docs for ML put job endpoint (#32843) Test: Fix forbidden uses in test framework (#32824) Painless: Change fqn_only to no_import (#32817) [test] mute testSearchWithSignificantTermsAgg Watcher: Remove unused hipchat render method (#32211) Watcher: Remove extraneous auth classes (#32300) Watcher: migrate PagerDuty v1 events API to v2 API (#32285)
[ML] Removing old per-partition normalization code Per-partition normalization is an old, undocumented feature that was never used by clients. It has been superseded by per-partition maximum scoring. To maintain communication compatibility with nodes prior to 6.5 it is necessary to maintain/cope with the old wire format
Per-partition normalization is an old, undocumented feature that was never used by clients. It has been superseded by per-partition maximum scoring (see #32748). This PR removes the now redundant code. Relates elastic/elasticsearch#32816
Re-enable BWC tests for ML now that elastic#32816 has been backported to 6.x
[ML] Re-enabling BWC tests Re-enable BWC tests for ML now that #32816 has been backported to 6.x
[ML] Re-enabling BWC tests Re-enable BWC tests for ML now that #32816 has been backported to 6.x
Per-partition normalization is an old, undocumented feature that was never used by clients. It has been superseded by per-partition maximum scoring (see #32748). This PR removes the now redundant code. Relates elastic/elasticsearch#32816
Per-partition normalization is an old, undocumented feature that was never used by clients. It has been superseded by per-partition maximum scoring (see #32748). This PR removes the now redundant code. Relates elastic/elasticsearch#32816
Per-partition normalization is an old, undocumented feature that was never used by clients. It has been superseded by per-partition maximum scoring (see #32748). This PR removes the now redundant code. Relates elastic/elasticsearch#32816
Per-partition normalization is an old, undocumented feature that was
never used by clients. It has been superseded by per-partition maximum
scoring (see #32748).
This PR removes the now redundant code.
A PR containing the corresponding changes to the ml-cpp code will follow.